The research was titled "Excited state non-adiabatic dynamics of large photoswitchable molecules using a chemically transferable machine learning potential" and was published in "Nature Communications" on June 15, 2022.

2024/05/2306:08:33 science 1708

Editor | Radish peel

Light-induced chemical processes are ubiquitous in nature and have a wide range of technological applications. For example, photoisomerization can enable drugs with photoswitchable scaffolds to be photoactivated. In principle, photoswitches with desired photophysical properties, such as high isomerization quantum yields, can be identified through virtual screening of reaction simulations.

In practice, however, these simulations are rarely used for screening because they require hundreds of trajectories and expensive quantum chemistry methods to account for nonadiabatic excited state effects.

Here, Researchers at Harvard University and MIT , have developed an adiabatic artificial neural network (DANN) based on adiabatic states to accelerate simulations of azobenzene derivatives and such molecules . The network is six orders of magnitude faster than the quantum chemistry method used for training. DANN transfers to azobenzene molecules outside the training set, predicting experimentally relevant quantum yields for unseen species.

Researchers used the model to virtually screen 3,100 hypothetical molecules and identify "new species" with high predicted quantum yields. Confirm model predictions using high-accuracy nonadiabatic dynamics. The results pave the way for rapid and accurate virtual screening of photoactive compounds.

The research, titled "Excited state non-adiabatic dynamics of large photoswitchable molecules using a chemically transferable machine learning potential", was published in "Nature Communications" on June 15, 2022.

The research was titled

Light is a powerful tool for manipulating molecular systems. It can be controlled with high spatial, spectral, and temporal precision to facilitate a variety of processes, including energy transfer, intermolecular reactions, and photoisomerization. These processes are used in areas as diverse as synthesis, energy storage, display technology, bioimaging, diagnostics, and medicine.

For example, photoactive drugs are photoswitchable compounds whose biological activity can be switched by light-induced isomerization. Precise spatiotemporal control of biological activity allows the delivery of photoactive drugs at high doses with minimal off-target activity and side effects. This therapy is a promising avenue for treating cancer, neurodegenerative diseases, bacterial infections, diabetes, and blindness.

theory plays a key role in explaining and predicting photochemistry because empirical heuristics learned from thermally activated ground state processes generally do not apply to the excited state . Computer simulations based on quantum mechanics can achieve impressive accuracy in predicting experimental observations. These include the isomerization efficiency and absorption spectra of photoswitchable compounds, which are key to designing photoactive drugs.

However, ab initio methods in photochemistry are severely limited by their computational cost. To collect meaningful statistics for a molecule, hundreds of repeated simulations are required, each involving thousands of electronic structure calculations performed in series with sub-femtosecond time steps. Quantum chemistry calculations alone are particularly demanding, requiring some treatment of excited state gradients and multiple reference effects. In some cases, ground state gradients and excited state gradients are required at each time step. Calculating the photochemical properties of tens or hundreds of molecules using ab initio methods is impractical, and photodynamic simulations have not yet been used for large-scale virtual screening.

Among the most accurate and expensive electronic structure methods are multi-reference perturbation techniques, but their cost and the requirement for manual active spatial selection limit their use in virtual screening.

Over the years, the photochemistry community has developed some exciting methods to overcome these two obstacles. For example, downscaling technology and graphics processing units can significantly accelerate multi-reference calculations. Density Matrix Renormalization Group (DMRG) and multi-reference density functional theory (DFT) methods expand the scale of systems that can be processed with high accuracy. DMRG has also been used to automatically select the active space for multi-reference methods.and less accurate, but more affordable black-box methods, including spin-flip time-dependent DFT (SF-TDDFT) and holey Tamm-Dancoff DFT, among others.

Despite these developments, the cost of non-adiabatic simulation remains high. Even the relatively affordable SF-TDDFT is prohibitively expensive for virtual screening. Semi-empirical methods are currently the only affordable large-scale screening methods. They provide qualitatively correct results in many systems, but are ultimately limited by their approximation, with an average energy error of 15 kcal/mol.

Another approach is to use data-driven models instead of quantum chemistry (QC) calculations. Machine learning (ML) models trained on quantum chemistry data can now routinely predict ground state energies and forces with subchemical accuracy and can make predictions in just milliseconds. These models have been successfully used in various ground state simulations. They are also used to speed up non-adiabatic simulations in many model systems.

However, excited-state ML has yet to provide affordable photodynamics for hundreds of realistic-sized molecules, which is the ultimate goal of predictive simulations of photopharmacology. Furthermore, excited-state interatomic potentials that can be transferred to different compounds have not yet been developed. Therefore, they need thousands of QC calculations for each "new species" as training data.

Here, Harvard and MIT researchers have made significant progress in using ML for affordable large-scale photochemical simulations and virtual screening. To exploit the transferable potential, they focused on molecules from the same chemical family, studying derivatives of azobenzene , a prototype photoswitch.

The research was titled

Illustration: Description of potential energy surfaces in azobenzene derivatives. (Source: paper)

The derivatives studied here contain up to 100 atoms , making it the largest system to date consistent with an excited-state ML potential. Combining equivariant neural networks and physics-based adiabatic models, along with data generated from combinatorial exploration of chemical space, and configuration sampling through active learning, they generated a model, DANN, that is transferable to large, unseen events. Nitrobenzene derivatives.

The research was titled

Illustration: neural network architecture and active learning loop. (Source: Paper)

This yields computational savings of over six orders of magnitude. Predicted isomerization quantum yields of unknown species are correlated with experimental values. The model was used to predict the quantum yields of more than 3100 hypothetical species, revealing rare molecules with high cis-trans and trans-cis quantum yields.

The research was titled

Illustration: Speed ​​and accuracy of DANN-NAMD. (Source: paper)

DANN model shows high accuracy and transferability between azobenzene derivatives. One limitation is that unseen species contain functional groups that are present to some extent in the training set. Model performance is generally higher for more highly represented functional groups, although some groups are highly represented but difficult to fit, while other groups are weakly represented and fit well.

Furthermore, the model cannot be applied to other chemical families without additional training data. For example, it greatly overestimates the excited state lifetime of many trans derivatives.

Semi-empirical methods, on the other hand, provide qualitatively correct predictions across a wide range of chemicals, but cannot match the within-domain accuracy of DANNs and cannot be improved with more reference data. Adding features from semi-empirical calculations, as was done in the OrbNet model, may prove useful in the future. Recent developments taking into account non-local effects and spin states that improve the transferability of neural networks may also be beneficial for excited states. The model can be further improved by high-precision multi-reference calculations, solvent effects and the inclusion of bright S2 states.

Especially the use of spin complete methods is critical because spin contamination hinders fine-tuning of the underlying compound model. It may also have affected the accuracy of the DANN model in general. Therefore, spin-finished, affordable alternatives are of particular interest.Active learning can be accelerated by differentiable sampling with adversarial uncertainty attacks, which will improve the lifetime of excited states. Transfer learning can also be used to improve the performance of specific molecules. Only a small amount of ab initio calculations are required to fine-tune models for individual species.

Diabatization may also prove useful for reactive ground states. A reaction barrier can generally be understood as the transition from one adiabatic state to another. Non-adiabatic foundations can make reactive surfaces more easily adaptable to neural networks.

The research was titled

Illustration: the results of virtual screening. (Source: Paper)

In summary, the researchers introduced a non-adiabatic polymorphic neural network potential, trained on more than 630,000 geometries at the SF-TDDFT BHHLYP/6-31G* theoretical level, covering more than 8000 unique azobenzene molecules. They used DANN-NAMD to predict the isomerization quantum yields of derivatives outside the training set and correlated the results with experiments.

The team also identified several hypothetical compounds with high quantum yields, red-shifted excitation energies, and inversion stability. The network architecture, non-decompositional approach, and chemical and configurational diversity of the training data enable the model to generate powerful and transferable potential. The model can be readily applied to new molecules, producing results that approximate those of SF-TDDFT at orders of magnitude lower computational cost.

Paper link: https://www.nature.com/articles/s41467-022-30999-w

science Category Latest News