Benefits And Limits Of Using ML For Materials Discovery


SOURCE: SEMIENGINEERING.COM
DEC 18, 2025

December 18th, 2025 - By: Katherine Derbyshire

popularity

Machine learning tools can accelerate all stages of materials discovery, from initial screening to process development. Whether the goal is to identify new applications for known materials or to design new molecules for a particular task, these tools help materials scientists find correlations in large data libraries.

Still, machine learning tools are not magic. “Software tools are only as good as the humans using them, and how well they understand the context of the problem they are trying to solve,” said Audra Koch, data scientist at Brewer Science. “AI is still highly dependent on human understanding and cannot replace human judgment.”

The need for human expertise starts at the very beginning, with development of the training dataset. “The pre-processing steps before the data even starts the training process is even more important than the model itself,” Koch said.

For example, Edward Pyzer-Knapp and colleagues at IBM demonstrated a suite of tools to extract materials information from technical papers and similar sources. This is a challenging task, with many opportunities for selection bias. Each individual paper might include tables of material properties or process conditions, images of crystal structures and deposited films, as well as graphs showing the dependence of a property on time, temperature, or applied electric field. Results stored in tables or described in text might be relatively easy to extract, while images or graphical data might be less accessible.

Pyzer-Knapp said the information supporting the paper’s conclusions is diffused through all of these elements. For example, describing pattern collapse behavior in photoresists might require a full focus/exposure matrix. A knowledge graph that doesn’t capture all the available information will necessarily yield inaccurate results.

The problem of selection bias is further compounded because not all materials have been studied equally thoroughly, and not all studies are available under open licenses. Whether particular publications or authors contribute to open-source data libraries will affect their content, and therefore the behavior of models trained on them.

Finding better photoacid generators
Materials discovery efforts that use the existing technical literature as a starting point are obviously more likely to succeed in well-studied domains. The IBM group, for example, examined photoacid generators (PAGs) for chemically amplified photoresists. Sulfonium and iodonium-based PAGs are common in the industry, but are both toxic and bioaccumulative. The IBM group wanted to identify more environmentally friendly PAG cations. To start, they built a knowledge graph from about 6,000 patents, papers, and other sources, from which they extracted the chemical structures of about 5,000 sulfonium PAGs.[?1]

For most of the molecules they studied, the source materials included only limited information about key material properties. As a next step, they used machine learning-assisted simulation tools to calculate UV absorption and selected sustainability parameters for several hundred promising sulfonium compounds. They used the resulting set of structure-property relationships to train a generative model.

The generative model produced 3,000 candidate sulfonium cations. However, Pyzer-Knapp noted that the training dataset did not incorporate the many constraints that apply to PAGs for semiconductor lithography. To identify promising candidates, the group used a combination of expert-defined rules and expert-in-the-loop machine learning algorithms. Those expert-in-the-loop algorithms learn by observing a human expert’s ranking of candidate materials.

Next, a Bayesian optimization process prioritized more than 400 candidates based on their expected excitation energy and oscillator strength, both key PAG characteristics. The optimized workflow was able to find the likely best-performing PAG molecules for the 193nm wavelength while only screening on average half the candidates.

Finally, a human toxicology expert selected the most promising candidate materials for synthesis and experimental studies. Overall, the machine learning-assisted methodology enabled a 100-fold reduction in the number of candidates at this last stage.

Another challenge is that model results alone are insufficient to understand the material system in question. “The best we can do right now is to rank feature importances and build interactive profiling tools or correlation plots to show how changing a few inputs at a time can influence the model’s prediction,” said Brewer Science’s Koch.

A generative model might produce a list of candidate materials that score better than an incumbent on some important parameter, but it cannot explain the physical mechanism responsible for their behavior. It therefore offers only limited guidance for process development and device integration.

Generating better magnets with less supply-chain risk
Permanent magnets are a critical component of electric motors, and are therefore essential for electric vehicles, power-generating windmills, and other elements of a post-fossil fuel industrial base. Unfortunately, the rare earth minerals commonly used in these magnets come with substantial supply-chain risks.

Claudio Zeni and colleagues at Microsoft Research used the company’s MatterGen software to generate candidate magnet materials with more readily available component materials. They used 605,000 known structures with DFT magnetic density labels to fine-tune a general-purpose model, which generated structures with the target magnetic density value. They used the Herfindahl-Hirschman index, a measure of the amount of competition in an industry, to quantify potential supply chain risk. Constraining the model to use only materials with HHI below 1,250 completely eliminated elements like cobalt and gadolinium from the generated structures.[?2]

Zeni noted, though, that the model preferentially generated structures with triclinic space groups, which have no reflectional symmetry, a bias that was not present in the training data. Without further experimental validation, it’s impossible to say whether less symmetrical materials really do make better permanent magnets, or whether the structure of the model itself limited the generated results.

The emergence of GPUs and advanced machine learning tools can greatly accelerate simulations. “Our latest version, which just came out in June, has between 5X and 10X speed-up on GPUs,” said Anders Blom, principal solutions engineer at Synopsys. “We used to celebrate getting 20% faster a couple of years ago.”

Still, not everything can be computed in a reasonable amount of time. For that reason, screening approaches use multiple filters to narrow the list of candidate materials in stages.

Screening crosspoint memory selectors
Crosspoint memories are an emerging high-density memory option. Although the name suggests they can rely on row and column activation alone to select individual cells, practical devices use a transistor or diode selector element to prevent sneak path activation. Most designs combine the selector with a phase-change material that serves as the actual storage cell. According to Ha-Jun Sung and colleagues at Samsung, selector-only memories reduce manufacturing complexity and increase device density by eliminating the phase change material. However, the Ge-As-Se compounds used for conventional selectors may not be the ideal choice for selector-only memories.

In trying to identify a better selector material, the Samsung group focused on ternary compounds of the form AxByX100-x-y, where A and B are drawn from Al, Si, P, Ga, Ge, As, In, Sn, and Sb; and X is one of S, Se, or Te. They identified 3,888 potential materials from this group, and used a series of ab initio computational screens. First, they looked at bonding characteristics like orbital hybridization and ionicity, which measures the degree of charge localization. Using devices reported in the literature to define selection criteria, they narrowed the list of candidate materials to 991. DFT simulations of amorphous structures allowed them to evaluate the cohesive energy, which is related to thermal stability. A total of 427 candidates emerged from this screen.[?3]

Fig. 1: The four-stage screening process for identifying suitable amorphous chalcogenide materials for selector-only memory applications. Source: IEDM

Fig. 1: The four-stage screening process for identifying suitable amorphous chalcogenide materials for selector-only memory applications. Source: IEDM

The next stage used density of states calculations to extract properties like activation energy and trap density, which allowed them to simulate I-V characteristics. Sixty-eight candidates, all of them Se-based chalcogenides, passed this screen as well.

Finally, they simulated the electric field behavior of the remaining candidates to identify the ones that were likely to behave as selector-only memories. In these memories, interface defects create a space charge region in the presence of a non-uniform electric field. A larger space charge region means a wider memory window. Increasing the selenium content decreases the memory window. This final screen identified 35 candidates with better characteristics than Ge20As30Se50, an incumbent material.

As these examples demonstrate, “AI” for materials discovery is not a single tool, but a rapidly evolving toolkit. To decide what tools to deploy, human engineers need to know what training data is available and relevant to the task, and to have a thorough grasp of the problem to be solved.

  1. Pyzer-Knapp, E.O., Pitera, J.W., Staar, P.W.J. et al. “Accelerating materials discovery using artificial intelligence, high performance computing and robotics.” npj Comput Mater 8, 84 (2022). https://doi.org/10.1038/s41524-022-00765-z
  2. Zeni, C., Pinsler, R., Zügner, D. et al. A generative model for inorganic materials design. Nature 639, 624–632 (2025). https://doi.org/10.1038/s41586-025-08628-5
  3. H. -J. Sung et al., “Ab-Initio Screening of Amorphous Chalcogenides for Selector-Only Memory (SOM) through Electrical Properties and Device Reliability,” 2024 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2024, pp. 1-4, doi: 10.1109/IEDM50854.2024.10873326.

Related Reading
Machine Learning Tools Accelerate Materials Discovery
But only if the data is in a format and context that machines can understand.


facebook sharing button
twitter sharing button
linkedin sharing button
sharethis sharing button

Tags: AI Brewer Science chemically amplified photoresists IBM machine learning magnets materials materials discovery Microsoft ML permanent magnets photoacid generators photoresists Samsung Synopsys

Alternative Text

Katherine Derbyshire

(all posts)
Katherine Derbyshire is a technical editor at Semiconductor Engineering.