Navigating Chemical Space with Simulations and Machine Learning: How Scientists Discover New Materials


Have you ever wondered how scientists discover new materials? You may picture researchers toiling in their lab day and night until one of them has a eureka moment that changes everything. Everyday reality looks a bit different, though.

In our high-tech era, more and more of the experimental work chemists and materials scientists do happens inside computers. Chemical simulations have earned their place as the third pillar of R&D, together with theory and experiments.  

Research into new materials has brought us incredible developments — some seem taken straight out of a sci-fi movie. Think of materials that self-heal after being damaged, or that can be programmed to change their properties in response to their environment.  

However, discovering a brand-new material with unique properties is (still) not as easy as just asking a computer and waiting for the results. To make the materials of the future, researchers need to deal with a key challenge: navigating chemical space. 

Modeling carbon capture | QuantistryLab

So, what is chemical space? 
When scientists talk about chemical space, they are referring to all the possible permutations of atoms that can form a molecule. Turns out, the chemical space is as vast or more than actual outer space.  

Don’t believe us? Let’s talk numbers. There are more possible combinations of a small organic molecule (between 1022 and 1060) than there are stars in the observable universe (between 1022 and 1024). 

The numbers get even bigger when it comes to larger and more complex chemical structures. Think of the chemicals used in batteries, polymers, alloys or pharmaceuticals.  

How can we possibly navigate the chemical space if it is that big? 

Until now, not so effectively.  

Scientists have traditionally looked for new materials using what’s called the Edisonian approach. As you may have guessed, it is named after Thomas Edison, who is known for painstakingly going through thousands of tests until he made the first commercially viable light bulb.  

This trial-and-error approach can be great when the goal is to make slight changes that improve a material’s performance. But the main limitation of the Edisonian approach is that it focuses on what’s familiar rather than on forging a new path. Game-changing discoveries often happen by chance rather than by design. 

Take graphene, a revolutionary material that today is used everywhere from electronics to medicine. Andre Geim and Kostya Novoselov discovered graphene while playing around with pencils and Scotch tape. A few years later, they were picking up a Nobel Prize. 

We can’t deny that, with a sprinkle of serendipity, the Edisonian approach has paid off in the past. But looking for brand-new molecules in a chemical space larger than the universe is way more challenging than looking for the proverbial needle in the haystack. 

If you were to search for the molecule you need by testing all possibilities one by one, it would take longer than the universe’s lifespan.  

We need results a bit sooner than that.  

Lubricant system | Molecular Dynamics simulations in QuantistryLab 

Enter machine learning 
Computer simulations are the first step to finding our way within the vastness of chemical space. By simulating chemical structures and reactions, scientists can save huge amounts of time in the lab and test only the candidates that their computer calculations deem promising.  

Computational chemists are constantly working on developing better and faster computer models to efficiently scan the chemical space; A fresh addition to their toolbox is machine learning. When properly implemented, a machine learning algorithm will significantly improve and speed up our ability to navigate through the deep waters of the chemical space in the search for the next superstar material. 

There are three key steps to materials discovery where tools like machine learning can make a difference. Let’s have a closer look. 

The first step is identifying new material candidates. By combining chemical simulations with machine learning algorithms, we can screen through enormous amounts of combinations of atoms and select a few promising material candidates. This transforms an overwhelming number of possibilities into a much more manageable dataset, ready to be studied further in simulations or in real world experiments. 

The second is rationalizing the chemical properties of a molecule. In the past, chemical simulations were limited to describing the behavior of a molecule. Thanks to advances in both hardware and software, these simulations are now able to predict the behavior of a molecule and give scientists invaluable insights into why some of the material candidates (that were identified during the screening step) may perform better than others.  

These insights can then be used to understand the laws of chemistry and physics that the materials follow and build datasets to refine the accuracy of machine learning models.  

The third step — and the ‘holy grail’ we’re moving toward — is called inverse design. It involves a complete shift in the way new materials are discovered. Instead of finding a new material and studying its properties to see what it could be used for, inverse design allows us to first choose the properties we would like our material to have and then come up with the molecular system that best matches the desired outcome.  

At the heart of inverse design lies machine learning. With a well-trained algorithm that can capture the relationship between the structure of a material and its performance, researchers can efficiently fish for promising materials out of the vastness of the chemical space. 

Thermodynamic properties of alloy materials | QuantistryLab 

What’s next? 
Despite its many promises, machine learning is a relatively new tool in materials research. Scientists are still figuring out how to integrate it effectively into their research toolbox.  

How we do that will be the key to success. To discover the next generation of materials, we need to embrace a holistic computational approach that incorporates a wide range of tools.  

Transitioning into a new way of R&D will involve not just new tools, but also moving beyond trial and error and embracing data-driven research instead.  

If we were to ask Nikola Tesla, he would be on our side. He considered Edison’s method inefficient, saying that “just a little theory and calculation would have saved him 90% of the labor." 

The final goal is to simulate not just molecules, but the complex network of interactions, production processes and environmental conditions that altogether determine a material’s properties and performance. This will ensure that computer simulations translate into truly innovative materials with tangible real-world applications. 

If you're eager to delve further into the world of simulations, be sure to explore our article, "Computer-Aided Electrode Design for Next-Gen Batteries," or the interview with Christophe on "Computational Modelling of Polymers."

Explore our Blog for more awesome science, and connect with us on LinkedIn to stay up to date on all things Quantistry.

With QuantistryLab, all you need to run chemical simulations is a web browser. Our cloud-native platform redefines R&D with a holistic computational approach, from quantum to AI. Our customized Use-Case Modules offer tailored solutions to overcome your specific R&D challenges. Get in touch and start simulating today.