AI4ChemMat Hands-On Series

Full day hands-on workshop (Online)

Full day hands-on workshop


Zoom will be used for this virtual hands-on training session.
Alvaro Vazquez Mayagoitia (Argonne National Laboratory)

We will be holding a virtual hands-on training series to create a common ground in the chemistry and materials science community to discuss and learn basic elements of AI/ML and prepare the participants to identify opportunities for adoption of this technology. Overall, this workshop would foster the discussion of AI/ML among ANL and external collaborators, and, more importantly, spark new inter divisional collaborations. This workshop would promote teamwork by solving small projects in groups. This workshop would be divided into about 6 presentations during the summer of 2023. Each lecture would be composed of an introductory presentation and a hands-on session, for about 90 min total.

 It will be an interactive event with various invited speakers.


Here is a link outlining the program.

The seminar series is organized by:

  • Alvaro Vazquez Mayagoitia, Computational Scientist - Chemistry and Materials Science, CPS Division, Argonne National Laboratory.
  • Ganesh Sivaraman, University of Illinois.
  • Murat Keçeli, CPS Division, Argonne National Laboratory.


The virtual seminars will tentatively begin the second week of July, 2023.

Dates:   Wednesdays August - November at 10:30 am CT.


Agenda and Zoom links will be sent to registered participants two days before the meeting.


  • Esther Heid, Technical University of Vienna
  • Hyun Park, Argonne National Laboratory & University of Illinois at Urbana - Champaign Campus.
  • Daniil Boiko, Carnegie Mellon University.
  • Lars Leon Schaaf, University of Cambridge.
  • Aikaterini Vriz, Argonne National Laboratory
  • Venkata Surya Chaitanya Kolluru, Argonne National Laboratory


Participants are expected to have basic working knowledge on basics of Machine Learning, scientific computing, among others.

AI4ChemMat Hands-On Series
    • 10:30 AM 12:00 PM
      End-to-end AI Framework for Interpretable Prediction of Molecular and Crystal Properties 1h 30m

      Abstract: In this talk, I will target audience who are interested in understanding the basics of how to train, perform hyperparameter search and infer an ML model for molecular structures such as small molecules or MOFs. Moreover, with a bit of Python programming knowledge, my AI framework can help users who want to learn how to visualize learned molecular representations and highlight atoms important for prediction. I also demonstrate how ML potential can be used to perform molecular dynamics (MD). Overall, the audience can expect to learn comprehensive ML techniques developed and applied for molecular studies.

      Speaker: Hyun Park
    • 10:30 AM 12:00 PM
      Emergent autonomous scientific research capabilities of large language models 1h 30m

      Abstract: In this talk, we will discuss an intelligent agent system that integrates multiple large language models for autonomous design, planning, and execution of scientific experiments. We will demonstrate the Agent's scientific research abilities using several examples, with the most complex one involving the successful execution of catalyzed cross-coupling reactions. Lastly, we address the safety concerns related to such systems and suggest measures to prevent their potential misuse.

      Bio: Daniil Boiko obtained his MSc in organic chemistry from Lomonosov Moscow State University, researching machine learning applications in chemistry, including electron microscopy, mass spectrometry, and reaction discovery. He also worked at VK, developing machine learning models for web search. Now, he's pursuing a PhD in Chemical Engineering at Carnegie Mellon University, focusing on molecular machine learning, biocatalyst discovery recommender systems, and language model applications in natural sciences.

      Speaker: Daniil Boiko (Carnegie Mellon University)
    • 10:30 AM 12:00 PM
      Deep learning of reaction properties via graph-convolutional neural nets 1h 30m

      Abstract: Machine learning models are very successful in predicting various chemical properties. Graph-convolutional neural networks (GCNNs) are routinely used for the prediction of molecular properties, but their application to chemical reactions is largely unexplored. GCNNs allow for a learned extraction of important characteristics of a molecule and enable end-to-end learning, instead of relying on expert, system-dependent knowledge. However, the properties of chemical reactions, i.e. the combination of reactant and product molecules, are not readily accessible with current GCNNs which are designed to take molecular graphs as input. Recently, GCNNs based on the condensed graph of reaction (CGR) were shown to unlock the full potential of GCNNs also for reactions, where reactants and products are merged into a single pseudo-molecular graph, i.e. an artificial graph transition state. In this workshop, the anatomy of molecular GCNNs will be discussed in detail, as well as the changes necessary to encode reactions instead of molecules, including hands-on exercises to build your own reaction GCNN. Compared to previous approaches, GCNNs on CGRs offer a comparable or better performance with a lower number of parameters. We showcase the performance on different tasks, such as the prediction of barrier heights or rate constants, as well as the chemo- and regioselectivity of reactions.

      Bio: Esther Heid obtained her Bachelor’s (2014), Master’s (2016) and PhD (2019) degree in Chemistry from the University of Vienna, Austria. Her thesis focused on the molecular dynamics simulation of soft matter, as well as quantum mechanical calculations for obtaining force field parameters. In 2020 she joined the Massachusetts Institute of Technology, holding an Erwin-Schroedinger Postdoctoral Fellowship from the Austrian Science Fund, which enables her to conduct research on the development of computer-aided tools for finding novel multi-enzyme networks which yield a specified target molecule. The project utilizes recent developments in machine learning, bioretrosynthesis, and cheminformatics, and aims toward a more efficient, selective and environmentally favorable synthesis of compounds through the inclusion of biocatalytic transformations. A major part of the project is concerned with developing new machine learning methods for molecular and reaction property predictions. Her postdoc fellowship includes a one-year return-phase in Austria to finish up the project, which she started in 2022 at the Vienna University of Technology.

      Speaker: Dr Esther Heid ( Technical University of Vienna)
    • 10:30 AM 12:00 PM
      Machine Learning Force Fields for Heterogeneous Catalysis 1h 30m

      Machine learning force fields (MLFFs) are set to become an indispensable tool in computational catalysis. In this talk, we provide a detailed walkthrough on how to train an MLFF to accurately predict energy barriers for catalytic reaction pathways. We demonstrate the capabilities of the resulting interatomic potential that offers near ab-initio accuracy at a fraction of the cost. Specifically, we illustrate that MLFFs not only speed up routine catalytic tasks by orders of magnitude but also allow for a more realistic treatment of catalytic systems, identifying lower energy barriers and capturing finite temperature effects. We also present a Jupyter notebook that highlights the simplicity of training a state-of-the-art many-body equivariant graph neural network, namely MACE. The capacity of MLFFs to deepen our understanding of extensively studied catalysts emphasizes the importance of fast and accurate alternatives to direct ab-initio simulations. Automated training procedures are paramount in enhancing the accessibility of MLFFs for both academic and industrial applications, and for effective use of HPC resources.
      Lars is a 4th-year PhD student specializing in machine learning force fields with an emphasis on catalysis and non-local effects. His academic background is rooted in theoretical physics, which he studied at the University of Birmingham with a concentration on Astrophysics. During his internship at the Max Plank Institute for Nuclear Physics, Lars made his first contact with scientific computing while working on a high energy camera that is set to observe x-rays emitted by cosmic particle accelerators. Changing to the University of Cambridge for his masters, Lars started focusing on condensed matter physics with his thesis on quantum information. Here he discovered his passion for computational modelling at the atomic scale.

      Speaker: Lars Leon Schaaf (University of Cambridge)
    • 10:30 AM 12:00 PM
      Structure determination of nanoscale materials using theory and experimental characterization data 1h 30m

      The atomistic structure determines the stability and properties of a material and its potential use in applications. We develop software tools such as Ingrained and FANTASTX (Fully Automated Nanoscale To Atomistic Structure from Theory and eXperiments) to find the atomistic structure from experimental data. Ingrained software can construct a grain boundary structure or a surface structure based on the experimentally obtained TEM or STM images, respectively. And FANTASTX is a multi-objective evolutionary algorithm that helps find the thermodynamically or kinetically stabilized structures observed experimentally. In this talk, we will show examples of – the Ingrained-STM simulation tool with (111) Cu2O and CdTe grain boundary structures created using Ingrained-TEM. We also show the FANTASTX tool to search for the tellurene atomistic structure at the interface of CdTe grain boundary system. These tools provide a path to understand complex mechanisms in experimental systems using theory and further allow to tailor the local structure to the required effect.

      Venkata Surya Chaitanya Kolluru is a Postdoc at the Center for Nanoscale Materials at Argonne, working with Dr. Maria Chan. He completed his Ph.D. in Materials Science and Engineering at the University of Florida in 2021. His research focuses on combining atomistic simulation methods with AI/ML and computer vision tools to address fundamental materials challenges such as structure inversion from experimental characterization data, materials discovery, and theoretical characterization of complex nanoscale materials systems.

      Joshua Paul is a Postdoc joint appointed at Northwestern University and Argonne National Laboratory under Dr. Maria Chan. After graduating from the University of Florida in 2020 with a Ph.D. in Materials Science and Engineering, he joined the Center for Nanoscale Materials. His research focuses on high-throughput computational methods for materials discovery and characterization. By utilizing Density Functional Theory and experimental results, the conditions of materials interfaces and surfaces are better understood, characterizing them with greater certainty than either approach alone.

      Speakers: Joshua Paul (Argonne National Laboratory) , Venkata Surya Chaitanya Kolluru (ANL)
    • 10:30 AM 12:00 PM
      Extracting and utilizing multimodal datasets of images and text with large language models 1h 30m

      With the recent exponential growth in publication rates, it has become impossible for a scientist to keep up with all publications related to a specific topic. Although there are notable efforts to automate text parsing from literature, there are many instances where important information is communicated through images or tables in papers.1 In this talk, I will present the latest developments in two software tools developed at the Center of Nanoscale Materials (CNM): i) EXSCLAIM! for data mining from scientific literature2, and ii) Plot2Spectra for image segmentation related to spectral images, with the aim of creating metadata.3 EXSCLAIM! has been enhanced with Large Language Models (LLMs), i.e., ChatGPT and appropriate prompt engineering to extract image-text pairs from scientific journals, which can be foundational for creating multimodal models and advancing semantic searches. In this presentation, I will demonstrate various applications of the extracted multimodal datasets in building knowledge graphs, conducting semantic searches, and performing topic modelling. Additionally, I will illustrate how to utilize the image segmentation workflow in Plot2Spectra to extract additional metadata and create datasets suitable for machine learning (ML) and high-throughput experimentation.

      (1) Olivetti, E. A.; Cole, J. M.; Kim, E.; Kononova, O.; Ceder, G.; Han, T. Y.-J.; Hiszpanski, A. M. Data-Driven Materials Research Enabled by Natural Language Processing and Information Extraction. Appl Phys Rev 2020, 7 (4), 041317.
      (2) Schwenker, E.; Jiang, W.; Spreadbury, T.; Ferrier, N.; Cossairt, O.; Chan, M. K. Y.; Chan, M. EXSCLAIM!-An Automated Pipeline for the Construction of Labeled Materials Imaging Datasets from Literature. Patterns (2023).
      (3) Jiang, W., Li, K., Spreadbury, T., Schwenker, E., Cossairt, O., & Chan, M. K. Y. (2022). Plot2Spectra: an automatic spectra extraction tool. Digital Discovery, 1(5), 719–731.

      Aikaterini Vriza is a postdoctoral appointee at the Center of Nanoscale Materials at Argonne National Laboratory. She obtained her PhD from the Material Innovation Factory at the University of Liverpool in 2022 and a Master in Green Chemistry and Sustainable Industrial Technology from the University of York. Prior to that she was an Aviation engineer in the Hellenic Airforce. Her research expertise lies between AI/ML, ‘green’ chemistry, and laboratory automation and has worked on several related projects in both industrial and academic settings.

      Speaker: Aikaterini Vriza (ANL)