"The proposal of the DPA-1 model proves the feasibility of realizing the 'pre-training + small amount of data fine-tuning' process based on large models. This is a new paradigm for potential energy function production and the starting point for a series of future work." Regarding recent research results Of great significance, a Chinese team expressed this.
Pre-trained deep potential energy models based on machine learning are playing an increasingly important role in the field of molecular simulation. At the same time, due to the limited migration capabilities of existing models, high training costs, and strong dependence on training data, so Its performance in practical applications is still unsatisfactory. Although relevant researchers have made a lot of exploration and practice to solve these problems, the results are not yet obvious.
In order to overcome the adverse effects caused by the above factors and solve the key problem of still needing to generate a large amount of data to train the model from scratch in the face of new complex systems in the context of molecular simulation, DP Technology and Beijing Institute of Scientific Intelligence Based on the new Gated Attention Machanism, a Chinese team composed of researchers and collaborators from the AI for Science Institute, Beijing, launched a model that is highly versatile and can accommodate most elements in the periodic table of elements. DPA-1.
Recently, a related paper titled "Pretraining of Attention-based Deep Potential Model for Molecular Simulation" (DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation) was pre-published on arXiv [1].
(Source: arXiv)
DPA-1 model is a comprehensive upgrade based on the DP series model and has the following advantages.
First of all, this model uses a gated attention mechanism that is similar to the attention mechanism in the field of natural language processing to fully model the interaction between atoms , which enables the model to perform well under existing data conditions. Learning more implicit atomic interaction information can effectively improve the model's transfer ability between different data sets and the sampling efficiency during data generation.
Secondly, the model contains encoded elements, and different elements use the same network parameters, which is beneficial to expanding the element capacity of the model.
At the same time, because the model is pre-trained on a large data set with 56 elements and completes transfer learning on multiple downstream tasks, it can greatly reduce the training cost and the amount of training data while ensuring prediction accuracy. .
In addition, the model also has ultra-high inference efficiency and can perform large-scale molecular dynamics simulations.
▲Picture | DPA-1 model diagram (Source: arXiv)
In order to effectively avoid the limitations of traditional models, developers have carried out several targeted experiments.
developers first divide different training sets into multiple subsets, and then train some subsets while testing other subsets. It should be noted that the conformation and composition of each subset here are different. For example, in the AlMgCu data set, there is only single element data in the single subset, only binary data in the binary subset, and only ternary data in the ternary subset. .
Finally, the developers tested the two models DPA-1 and DeepPot-SE on three types of data sets: AlMgCu alloy, solid state electrolyte (SSE, solid state electrolyte) and high-entropy alloy (HEA, High-entropy alloys). The performance was tested. The results show that compared with DeepPot-SE, the test accuracy of DPA-1 can be improved by one or two orders of magnitude, which fully illustrates the latter's powerful migration ability.
▲Picture | Results tested on different training sets (source: arXiv)
Under the model production paradigm of "pre-training + small amount of data fine-tuning", the developers planned a transfer learning solution for DPA-1. First, carry out model pre-training on large-scale data, and then use the statistical results of the new data set to modify the energy deviation of the last layer and use it as the starting point for training new tasks.
For example, first perform pre-training on the one- and two-element data in the AlMgCu data set, and complete the test on the three-element data. Next, perform pre-training work on the OC2M data set, and then migrate to the HEA and AlCu data sets respectively. The results show that DPA-1 can not only achieve higher accuracy in scenarios with only ternary data, but also effectively reduce dependence on downstream training data.
▲Figure | On different data sets, learning curves of DPA-1 and DeepPot-SE (source: arXiv)
developers also performed PCA dimensionality reduction and visualization on the encoded element parameters in DPA-1. The results show that all elements in the hidden space are distributed in a spiral shape, and elements of the same period are distributed along a spiral downward trend, and elements of the same family are distributed perpendicular to the spiral. This distribution pattern is ingenious with its position in the periodic table of elements. Correspondence can well prove the interpretability of the model.
▲Figure | PCA dimensionality reduction and visualization performance graph (Source: arXiv)
Currently, the team has completed the open source work of DPA-1 on its scientific computing cloud platform Bohrium. DPA-1 has training and molecular dynamics simulation functions. Open source has also been implemented under the DeePMD-kit project of the DeepModeling open source community.
The team said: “In the future, we will continue to work on automated production and automated testing of potential energy functions, and will continue to focus on operations such as multi-task training, unsupervised learning, model compression and distillation. In addition, larger and more Complete database, downstream tasks and dflow The combination of workflow frameworks is also a focus of development. "
Reference:
1. Duo, Z., Hang, B. et al. DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation. arXiv (2022 ). https://arxiv.org/abs/2208.08236