Energy News
ROBO SPACE
New datasets aim to teach AI models cross-disciplinary scientific thinking
illustration only
New datasets aim to teach AI models cross-disciplinary scientific thinking
by Clarence Oxford
Los Angeles CA (SPX) Dec 03, 2024

What can exploding stars reveal about blood flow in arteries, or how might swimming bacteria inform our understanding of ocean dynamics? Researchers from leading institutions have taken a major step forward in training artificial intelligence (AI) models to draw insights across disciplines to unlock scientific discoveries.

The initiative, known as Polymathic AI, leverages advanced technology similar to large language models like ChatGPT, but instead of processing text, it uses datasets from fields such as astrophysics, biology, chemistry, and fluid dynamics. This approach equips the models with cross-disciplinary scientific capabilities.

"These groundbreaking datasets are by far the most diverse large-scale collections of high-quality data for machine learning training ever assembled for these fields," said Michael McCabe, a research engineer at the Flatiron Institute in New York City and a member of Polymathic AI. "Curating these datasets is a critical step in creating multidisciplinary AI models that will enable new discoveries about our universe."

The Polymathic AI team has released two open-source datasets, collectively comprising 115 terabytes of data sourced from dozens of contributors. This massive resource is available to the public and is expected to accelerate the development of AI models capable of solving complex scientific problems. For comparison, GPT-3 required only 45 terabytes of unfiltered data during its training phase.

"The freely available datasets are an unprecedented resource for developing sophisticated machine learning models that can then tackle a wide range of scientific problems," added Ruben Ohana, a research fellow at the Flatiron Institute's Center for Computational Mathematics. "Open-sourcing this data benefits both the machine learning and scientific communities, creating a win-win situation."

The datasets are hosted on HuggingFace, a popular platform for AI models and data, and detailed in papers accepted for presentation at the prestigious NeurIPS conference in Vancouver, Canada.

"We've seen again and again that the most effective way to advance machine learning is to take difficult challenges and make them accessible to the wider research community," said McCabe. "When a new benchmark is released, it initially seems insurmountable. But opening access accelerates progress far beyond what any individual group could achieve."

Polymathic AI is a collaborative effort involving researchers from institutions such as the Simons Foundation, Flatiron Institute, New York University, and the Lawrence Berkeley National Laboratory.

The first dataset, named the Multimodal Universe, focuses on astrophysics and includes hundreds of millions of observations, such as images from NASA's James Webb Space Telescope and stellar data from ESA's Gaia spacecraft. "Machine learning has been happening for around 10 years in astrophysics, but it's still very hard to use across instruments, missions, and disciplines," said Polymathic AI researcher Francois Lanusse. "Datasets like the Multimodal Universe allow us to create models that natively understand this data and act as a Swiss Army knife for astrophysics."

The second dataset, dubbed the Well, spans 15 terabytes of data across 16 diverse datasets. It features simulations of biological systems, fluid dynamics, supernovae, and more, all rooted in mathematical equations called partial differential equations. These equations appear in a wide array of scientific problems but are notoriously difficult to solve. "This dataset encompasses a diverse range of physics simulations designed to address key limitations of current machine learning models," said Polymathic AI member Rudy Morel.

Building these datasets required extensive collaboration. "The creators of numerical simulations are sometimes skeptical of machine learning because of the hype, but they're curious about how it can benefit their research," Ohana explained.

The team is now using the datasets to train AI models, with early results showing promise. "Understanding how machine learning models generalize and interpolate across datasets from different physical systems is an exciting research challenge," said Polymathic AI member Regaldo-Saint Blancard.

Shirley Ho, project lead and group leader at the Flatiron Institute, noted, "Just like the Protein Data Bank spawned AlphaFold, I'm excited to see what the Well and the Multimodal Universe will help create." Ho will present Polymathic AI's findings at NeurIPS.

Related Links
Polymathic AI
Simons Foundation
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
Altman says Trump will keep US in AI lead; as Musk trolls OpenAI with profiteering suit
Washington (AFP) Dec 1, 2024
OpenAI CEO Sam Altman on Sunday expressed confidence that US President-elect Donald Trump's administration would support the artificial intelligence sector to ensure the United States and its allies continue to lead it. Speaking to conservative US broadcaster Fox News on Sunday, Altman said AI technology needed massive infrastructure support and that he believed Trump would be good at providing it. "We need to build that here and we need to be able to have the best AI infrastructure in the world ... read more

ROBO SPACE
Brazil trumpets emission cut plans at UN top court

Earning money while supporting power grid stability

Ukraine says energy sector 'under massive enemy attack'

Contentious COP29 deal casts doubt over climate plans

ROBO SPACE
Burned rice hull ash offers sustainable boost to battery performance

Fusion advances with innovative stellarator research

Battery-like memory withstands extreme heat for future applications

DOE UK DESNZ and Tokamak Energy invest in fusion facility upgrade for fusion pilot plant

ROBO SPACE
BP to 'significantly reduce' renewables investment

Baltic Sea wind farms impair Sweden's defence, says military

Sweden blocks 13 offshore wind farms over defence concerns

Sweden's defence concerned by planned offshore wind power

ROBO SPACE
A new protocol to enhance flexible solar technology durability

A new method boosts efficiency of organic solar cells

So you want to build a solar or wind farm? Here's how to decide where

How efficient solar cells can be made with non-toxic processes

ROBO SPACE
Kazakhstan holds talks with France on 1st nuclear power plant

Teletrix launches commercial AR platform for advanced radiation training

Framatome partners with Japan on sodium-cooled fast reactor development

Europe's oldest nuclear power plant to shut in 2033: Swiss operator

ROBO SPACE
A new catalyst can turn methane into something useful

From chip shop grease to efficient fuel alternative

Liquid Sun secures funding to scale sustainable aviation fuel production

Turning emissions into renewable methane fuel

ROBO SPACE
SynMax expands partnership with Satellogic to advance energy production monitoring

Artificial photosynthesis advances with novel solar hydrogen technology

Experts outline potential for hydrogen fuel production using sunlight

UK methane emissions detected via satellite resolved swiftly

ROBO SPACE
Landmark Drought Atlas calls for action to address global risks

'Scary' drought empties one of Bosnia's largest lakes

France urges top UN court to 'clarify' international climate law

Stick to current climate change laws, US tells top UN court

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.