Introduction Finding high affinity ligands is the goal of small molecule drug development while ensuring safety and biological functions. Therefore, precise prediction of combined free energy has always been an important direction in computer-aided drug design, and the use of Mon

Introduction

Looking for high-affinity ligands is the goal of small molecule drug development under the premise of ensuring safety and biological functions. Therefore, precise prediction of combined free energy has always been an important direction in computer-aided drug design, and the use of Monte Carlo or molecular dynamics simulation methods based on rigorous physics has always been considered the most rigorous approach to this problem. However, high computational costs, sampling algorithms and force field limitations still hinder the widespread application of such methods. In the past decade, with the continuous development of hardware and algorithms, more and more pharmaceutical companies have included relatively free energy computing tools (such as Schrödinger's FEP+) into their drug development processes.

Recently, researchers from the Merck group in Germany demonstrated a large prospective study of FEP+ since 2016. In this study, the authors aim to apply FEP+ to all appropriate internal drug discovery projects to achieve three purposes: (1) Prospective blind review of computing tools, (2) Evaluate the impact of time, resource and information limitations prevalent in drug development on the utility of the method, and (3) Benchmarking new features previously added to FEP+ since 2015. From 2016 to 2019, the authors prospectively applied FEP+ to 12 targets and 23 compound series, performing more than 35,000 independent perturbation calculations. Ultimately, the authors obtained effective predictions from more than 6,000 chemical entities, synthesized and tested more than 400 predicted novel molecules, generating a large number of prospective data, providing a detailed assessment of the accuracy of the method in the actual process of typical small molecule drug discovery.

Free energy calculation workflow in the project

In the past three years, the authors have established a workflow for deploying free energy calculations in the project (Figure 1). First, the authors evaluated the general feasibility of a given target and series of compounds of interest using FEP by collecting available protein structure data and experimentally measured binding affinity. At this stage, the authors usually need at least one high resolution eutectic structure and their ligand should belong to the target compound analog. This strict requirement stems from the author's failed experience in three projects, and the author attempted to model homology without X-ray diffraction structures, with no satisfactory results. Although the author failed to successfully use homologous modeling, when the author used protein crystal structure and docking techniques to obtain the complex structure, one of the two projects succeeded and the other failed. In that project that was inconsistent with the experimental results, the later acquisition of the eutectic structure showed that the protein binding site was quite flexible. In the first project, the authors were unable to achieve good consistency with the experimental data. The later obtained co-crystal structures showed considerable protein flexibility at the binding site, although the predicted binding posture is relatively similar to the crystal structure. In successful projects, the subsequent eutectic structure has a strong consistency with the prediction.

Figure 1 Free energy calculation process in Merck, Germany.

Image source: Journal of Chemical Information and Modeling

Once sufficient structural data are obtained, the authors will collect data sets of similar ligands with experimental binding affinity (at least 10 ligands, preferably 20), as well as all available information about biochemical and biophysical assays. Note that the recommendations for ligand dataset size are a "rule of thumb" and it is usually difficult to obtain larger datasets in early projects. If the ligand data set is large enough, it can be split according to the R-group of the molecule being modified, as different sites may have different accuracy. Retrospective free energy calculation evaluation was then performed based on these data sets, and the predicted values ​​were compared with experimental values. The authors call these retrospective computational experiments a System Validation. At this stage, different input structures and system settings are often evaluated to find the best parameters for future prospective calculations. In actual process, due to time constraints, this stage can usually only be evaluated for 3 possible models.

requires detailed analysis to understand the causes of the large outliers (| ΔGpred - ΔGexp | 2 kcal / mol). If the obtained RMSE pw 1.3 kcal/mol and the presence of large outliers can be fully explained, it can usually be considered that the verification study was successful. Although the exact accuracy required for FEP depends on the specific application scenario, such as the accuracy of 2 kcal/mol is generally believed to be used to score large compound libraries, the authors found that the use of stricter thresholds (RMSE pw 1.3 kcal/mol) in validation studies can ensure a greater probability that sufficient accuracy (RMSE 2 kcal/mol) can be obtained in prospective predictions.

Ideally, if the dynamic range of the dataset is appropriate, the FEP prediction should also produce a good ranking. But in reality, often only datasets with limited dynamic range are available. In this case, the author entered the forward-looking prediction stage in a "trial" manner. The authors predict all molecules being synthesized and evaluate the accuracy of these prospective predictions after synthesis. Finally, the author decides whether to apply FEP to the project in Production mode.

After successfully completing the verification stage, the FEP project entered production mode and conducted prospective calculations of the composite conformation. These new compounds must be similar enough to the compounds in the verification stage. New verification studies must be conducted for new skeletons and new crystal structure information. The authors closely monitor the accuracy of prospective predictions throughout the project and track which compounds have been synthesized. All data exists in the database, using automated workflows. In the author's experience, this continuous monitoring of synthetic molecules and timely updates of prospective prediction accuracy is essential for building initial trust in the project team and subsequent monitoring of when derivative compounds are beyond the scope of the model. Feasibility and confirmability of FEP in

FEP in

FEP in internal drug R&D projects

Over the course of three years, the authors evaluated the feasibility of FEP on 28 targets (Figure 2A). The authors conducted validation studies on 18 targets and 44 compound series, and conducted prospective calculations on 14 targets and 25 compound series. The main reason for most targets that cannot be verified research is the lack of relevant structural data (7 targets). Overall, once sufficient structural and binding affinity data are available for verification studies, the authors observed a relatively low failure rate for FEP verification studies. Figure 2B shows the accuracy of 18 targets in validation studies. Overall, the authors obtained predictions for high accuracy (RMSE pw 1 kcal/mol) and acceptable accuracy (RMSE pw 1.3 kcal/mol) against 14 targets and 21 compound series. In the early stages of the program, the authors judged that the criteria for validating the success of the research were wider, so some series with RMSE pw greater than 1.3 kcal/mol also entered production mode. It was subsequently found that due to the low accuracy rate in the verification study, it always led to lower accuracy in prospective predictions. The prediction accuracy varies not only between different targets but also between different compound series of the same target protein. Furthermore, when using FEP prospectively in research projects, authors often face a variety of challenges that may affect the accuracy of the method. Figure 2C shows a qualitative assessment of these challenges. Almost all projects have at least one aspect that may affect the application of free energy calculations in it. There is no doubt that real-life drug discovery projects are not ideal case plans.

In the project the authors tried to conduct validation studies, the most common challenge encountered by the authors was the uncertainty of the binding pattern of at least a portion of the ligands and the uncertainty of the protein structure due to suspected protein conformation changes (66% and 44%, respectively). In six projects, the authors found that the source of experimental data would affect the judgment of whether the verification study was successful. In one case, the authors initially compared the predicted affinity with the results of the functional analysis and found that the deviation was large. However, when comparing the same predictive affinity with the SPR data, the authors found good consistency and therefore decided to push the series toward production mode. In four projects, the authors found that parameters of small molecule force fields may not accurately describe the interaction.In two of these projects, the use of a later version of the OPLS3e force field improved the accuracy of the ligand set used in the validation study. In one project, the change in the force field is related to partial charge. In another project, the compound has a substituted fat ring. The recent version of Force Field Builder reparameterized the torsional potential in the ring, improving the accuracy of prediction. However, for three projects with larger outliers, the authors' recalculation using a newer version of the force field did not improve the results.

Figure 2 FEP feasibility, verification results and challenges of internal projects. (A) Results of FEP evaluation of 28 targets. (B) Results of validation studies using FEP+. (C) Challenges in prediction accuracy may be encountered in all projects.

Data source: Journal of Chemical Information and Modeling

Prospective FEP+ prediction results for internal project

For 19 compound series of 12 targets, the authors obtained a prospective predictive dataset containing at least five data points (Figure 3). Compared with validation studies, there are several common reasons that explain the larger error in prospectiveness. First, the prospective dataset is larger than the original validation set. The larger the sample size, the more reliable the estimation of RMSE. Second, throughout the project, newly designed compounds tend to be less similar to those used in the verification studies, nor are they very similar to ligands in the crystal structure. This leads to higher uncertainty in the way ligands bind and protonated state. Third, the author found that multiple molecules in the author's internal compound library are still challenges facing small molecule force fields. For almost every new chemical series, even if a new OPLS3e force field with very high torsion potential is used, the authors must reset some torsion potential. Fourth, when using prospective free energy predictions, the authors tend to focus on extreme predictions (e.g., the highest ranking compounds). The bias generated by focusing on extreme predicted values ​​can be mitigated by using selection bias correction. However, the authors found that this had little effect on the maximum outliers. Finally, chemical structural modifications that many authors are very interested in have inherent challenges to the method (e.g., from aromatic ring systems to fat chains, charge changes, introduction of new functional groups using flexible linking structures, etc.). In particular, the last one poses a huge challenge to conformational sampling, but it is actually often present in early signs of compound optimization and fragment optimization.

Figure 3 Prospective FEP+ results from 19 chemical series from 12 targets. The upper and lower corners of the value are marked as 90% confidence interval.

Data source: Journal of Chemical Information and Modeling

Comparing the rankings obtained from FEP+ with those obtained from Glide docking and Prime MMGB-SA scoring, the authors found that FEP+ overall performance was better than these conventional structure-based drug design methods. In four cases (target 1/series 4,target 4/series 1,target 5/series 3 andtarget 6/series 1), Prime MM-GBSA seems to produce better results than the ranking based on FEP+. Even in two of these cases (target 1/series 4 andtarget 4/series 1), Glide scores are better than FEP+. However, since the datasets are very small (10 ligands), the confidence interval is very large, it is difficult to draw a final conclusion on the relative performance of different methods. Nevertheless, this may still imply the opportunity to use simpler scoring methods in some cases, and computationally expensive methods like FEP+ can be used when other methods cannot accurately rank ligands. The authors also compare FEP+ performance on prospective internal datasets with rankings by simple descriptors such as molecular weight and log P, which also outperforms these "invalid models". Construction and evaluation of the new benchmark dataset of

Based on the author's extensive experience in free energy calculations in internal projects that cannot be disclosed, the author decided to build a new benchmark dataset consisting of 8 challenging, recently published related targets and small molecules with drug potential, including a total of 264 ligands.The protein targets and ligand chemical spaces in this benchmark dataset can represent the situation of the author's internal project, but the ligand does not overlap with the internal dataset. Overall, this dataset is a good illustration of many of the challenges the authors face in their internal projects. Compared with the previously published benchmark dataset, ligand structural changes include changes in total and charge distribution of molecules, as well as open loop and parent nucleus transitions (Fig. 4). At the same time, the ligand group also showed a slight increase in structural diversity. Overall, the authors achieved good correlations in these datasets (Figure 5). This is an amazing achievement given the challenges contained in this dataset. When analyzing different types of structural changes, the authors found that the transformation involved in net charge or charge position changes or molecular core/skeleton changes showed lower accuracy. FEP+ software has already considered these changes more difficult to calculate by nature and uses special sampling settings. Nevertheless, when these changes occur, the calculation results still show a large deviation. Interestingly, for the rest of the types of structural changes, the authors did not find a strong correlation between the error and the degree of change (i.e., the number of changed heavy atoms).

Figure 4 Examples of different types of conversions in the new benchmark set. (A) Add flexible chains to Eg5. (B) Closed-loop conversion in HIF-2α. (C) Movement of charged amines in SHP-2.

Data source: Journal of Chemical Information and Modeling

Figure 5 FEP + results of the new benchmark set.

Data source: Journal of Chemical Information and Modeling

Interesting and shockingly, according to the author's previous verification experiment success criteria, the author can only judge one of the 8 compound series in the benchmark as successful. The authors found that increasing λ from 5 ns to 20 ns reduced RMSEpw, which could expand the successful series to 3, but this had no effect on the correlation between predicted affinity and experimental affinity. To study whether the latest version of FEP+ will have better performance on the new benchmark dataset, the author used Schrödinger 2020-1 to recalculate the PFKFB3 and SYK datasets. For PFKFB3, the authors obtained slightly improved performance and had lower accuracy for SYK. Next, the author will discuss the FEP+ results of c-Met and SYK in more detail. The accuracy on the

c-Met dataset is moderate. For c-Met, FEP+ prediction reproduced the structure-activity relationship of changing from carbamate structure to various aromatic heterocycles (Fig. 6). It successfully predicted that pyrimidine is more active than the two thiazoles, imidazole, oxadiazoles and pyridazines. However, when pyridine is changed from to pyrimidine, it fails to successfully reproduce the enhancement of activity. The authors initially assumed that the presence of protonation conditions of pyridine compounds would negatively affect binding to the hinge region, so the difference in the protonation states of pyridine and pyrimidine compounds may be responsible for the differences in activity. However, using Jaguar to calculate the pKa of the pyridine compound is 4.5, which makes protonation unlikely to occur. Another possible explanation for the deviation of the predicted value of the pyridine compound from the experimental value may be the sampling of rotary isomers lacking heterocyclic rings at the binding site. Due to its symmetry, the pyrimidine compound can bind in two rotational states, contacting the hinge region of the protein through N atoms. Instead, pyridine can only bind in one conformation to establish this interaction. Therefore, if only one conformation is sampled during the simulation, the binding force of the pyrimidine is underestimated. Indeed, the trajectory of pyrimidine and pyridine compounds showed that rotation was hindered in complex simulations, while both states were sampled in the solvent.

Figure 6 FEP+ results of c-Met benchmark case. (A) Predictive affinity is related to good experimental affinity. (B) | ΔΔGpred- ΔΔGexp | Error histogram. (C) Net charge changes in the data set. (D) A series of different aromatic heterocyclic substitutions.

Data source: Journal of Chemical Information and Modeling

For SYK benchmark cases, the authors found that the accuracy was low, which led to poor correlation between predicted affinity and experimental affinity. A set of outliers was associated with compound CHEMBL3265015. The authors observed that four of the six perturbations involving the compound showed an absolute error of 1 kcal/mol (one example is shown in Figure 7 C).Compound CHEMBL3265015 has a methyl group on the ortho-position benzene ring, which may affect the rotation of the ring. Similar to the c-Met case, sampling along this torsion angle in the complex is incomplete. The authors also noticed a large outlier when transforming from compound CHEMBL3265009 to CEMBL3265003 (Figure 7C). Here, the molecules are grown through two aromatic rings extending into the solvent. Similarly, according to the experiment, the predicted relative affinity from CHEMBL3264999 to CHEMBL3265003 was overestimated by more than 2 kcal/mol. The same result was observed when computed with 20 ns sampling for each λ window. However, FEP+ accurately predicts perturbations from CHEMBL3265003 to similar-sized molecule CHEMBL3265004. The authors also observed similar overestimation in several other projects. This overestimation of groups that grow and replace solvents at the inlet of the binding bag may be due to the water model overestimating the water flowability or underestimating the water-protein interaction.

Figure 7 FEP+ results of SYK benchmark case. (A) The correlation between predicted affinity and experimental affinity is poor. (B) | ΔΔGpred- ΔΔGexp | Error histogram. (C) Example of error 1.5 kcal/mol.

Data Source: Journal of Chemical Information and Modeling

FEP+ Effects on Projects and Challenges in Practical

Although overall encouraging prospective FEP+ results were obtained and have obvious advantages over the simple SBDD method, throughout the plan, the authors have clearly found that the effectiveness of the prediction and the prediction accuracy required to obtain a meaningful ranking depend heavily on the range of compounds studied and the experimentally measured affinity distribution range. For example, for Target 3/Series 1 and Target 3/Series 3, the authors found that FEP+ had a good ranking despite the RMSE being greater than 1.5 kcal/mol. This helps determine the synthesis priorities of compounds and is considered valuable by the chemists on the project. On the other hand, in Target 1/Series 1-3, the authors did not obtain any predictive rankings despite similar accuracy in terms of RMSE. According to the author's experience, relatively low prediction accuracy can be allowed in projects with a large range of activity distributions corresponding to the chemical space explored. Once the optimization reaches the affinity "canyon" in the chemical space, small structural changes will not have a great impact on the affinity at this time, then the value of the application of free energy calculation to the project is very limited (like Target 1/Series 1-3 and Target 5/Series 1). Interestingly, the authors found that this situation (Affinity Canyon) often occurs in (latest) pilot optimization projects. At the beginning of this plan, the authors believed that pilot optimization should be the main application scenario for FEP, as small-scale chemical modifications usually performed at this stage are most suitable for the use of this method. However, contrary to the author's initial expectations, FEP performs better when the chemical space to be explored is wider and activity remains the main optimization target, such as hit-to-lead optimization, fragment optimization. Furthermore, in these cases, the synthesis of these compounds is often the "shortest plate in a wooden barrel", and the synthesis challenges faced by chemists limit the number of molecules that can be tested experimentally. In this case, it is considered very valuable to sort candidate molecules to focus on the most promising molecules, so using these computationally expensive methods at this time will not be considered unnecessary.

Strong communication is also crucial for the successful implementation of free energy calculations in projects. The author personally experienced how important it is to fully understand the functions and limitations of the method in the project team. This helps select compounds within the scope of calculation, sorted by FEP+. Initially, when the author uses FEP+ in the project, he always encounters that the molecule the drug designer wants to calculate is not within the scope of calculation application; in the later stages of the plan implementation, the author focuses on ranking the calculation of customized libraries, which take into account the scope of application at the beginning of the design. The disadvantage of this approach is that in some cases the calculation results of such libraries will be used by the project team to design new molecules. However, in this way, since the exact molecules predicted by FEP are not synthesized, it is difficult to evaluate the impact of this method.To avoid this problem, the author then uses FEP+ to calculate such molecules so that the quality of the prediction can be evaluated through an automated workflow (see above). Overall, the authors found that chemists have a higher acceptance of FEP predictions when the results can be interpreted or rationalized, , for example, analyzing interactions or ligand flexibility. Therefore, the authors recommend providing such analyses while conducting FEP predictions, with particular attention to those compounds that are predicted to be the best and worst.

In short, to make full use of free energy calculations in the project, it is necessary to carefully weigh it to balance prediction accuracy, scope of application, key optimization goals, synthesis accessibility of the series of compounds of interest, and project time. The authors found that filtering large custom libraries is an effective way to provide an effective way to increase the added value of FEP. For these libraries, the authors screened at least 50–100 new compounds of design, with a general screening quantity of 5-10 times the actual synthesisable number.

Conclusion

free energy computing is becoming more and more common in the pharmaceutical industry and has become a powerful feature in the computing chemist toolbox. Here, the authors describe a general workflow established to use free energy calculations in projects, reporting valuable data on prospective use of FEP+ calculations in multiple internal drug discovery projects. The authors also provide a new benchmark dataset that is available to other researchers and may promote further development of related methods. In addition to the accuracy of the prediction, the authors identified several important practical factors that influenced the effectiveness of the method in the project. The author looks forward to FEP+ as an expert tool to support related drug research and development projects through large-scale calculations in the near future.

benchmark dataset obtain address

www.github.com/MCompChem/fep-benchmark

References

Schindler, Christina EM, et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. Journal of Chemical Information and Modeling (2020). (Article ASAP). DOI: 10.1021/acs.jcim.0c00900

Zhongda Weixin headline account, Zhongda Weixin official account, and Weixin computing subscription account are both operated by Zhongda Weixin Technology Co., Ltd. You are welcome to follow and forward. Reproduction of

is prohibited without authorization.