Ultra-large virtual molecular libraries throw open chemical space

A library of 350 million drug-like molecules points to potential drugs.

Woman looking at computer displays — Libraries of virtual compounds could help uncover new drugs. Credit: Laurence Dutton/Getty

Drug discovery is a notoriously tough process. Pharmaceutical companies tend to prize efficiency, so many potential lead compounds are merely iterations of what the companies already have, dictated by what they already know, and rely on already exploited molecular scaffolds (the core structure of a molecule).

The need to diversify molecular scaffolds to improve the chances of success in drug discovery has been referred to as escaping from ‘flatland’ — the reliance on synthetic methods that build flat molecules. Another way to investigate the unexplored potential in the molecular universe is to find a way to reveal what is hidden in the shadows. Some estimates say that there are at least 10⁶⁰ different drug-like molecules: a novemdecillion of possibilities. How, then, to open up more of this dark chemical space?

A paper published this week demonstrates the power of ultra-large virtual libraries in helping researchers to look into the unknown (J. Lyu et al. Nature https://doi.org/10.1038/s41586-019-0917-9; 2019). In it, the authors built a virtual library of around 350 million drug-like molecules. They used this to simulate the ways that these molecules could interact with two therapeutically relevant proteins — AmpC β-lactamase, a target for antibiotics, and the D₄ dopamine receptor, linked to several neurological disorders and a member of the pharmacologically important family of G protein-coupled receptors.

After this virtual screening, the team synthesized the top-scoring compounds and tested them against the two targets. One of the compounds turned out to be the most potent inhibitor of AmpC β-lactamase known, and is chemically distinct from all other known inhibitors.

Of the 500 or so molecules the group made that targeted the D₄dopamine receptor, one had an unprecedented ability to stimulate it. This compound’s selectivity over other dopamine receptor types, and its preferential activation of the G protein signalling pathway, are both important properties that might help to minimize unwanted side effects when, and if, it’s used as a drug.

Others have already demonstrated the potential of smaller virtual libraries to aid drug design. But as the accompanying News and Views (D. E. Gloriam Nature https://doi.org/10.1038/d41586-019-00145-6; 2019) shows, the increased library size makes an important difference. The publicly available library (http://zinc15.docking.org) is anticipated to increase to more than a billion molecules within two years.

Going from a promising compound to an approved drug is still a tortuous and uncertain process. But by having access to a greater portion of the chemical universe, the chances of discovering a star should be greater too.

Nature 566, 7 (2019)

doi: 10.1038/d41586-019-00482-6

(원문: 여기를 클릭하세요~)

Bigger is better in virtual drug screens

A system has been devised that computationally screens hundreds of millions of drug candidates — all of which can be made on demand — against biological targets. This could help to make drug discovery more efficient.

Screening for effective drugs is tremendously expensive and inefficient. High-throughput screens can cover up to a few million compounds, but this is just a minute fraction of the total number (10⁶³) of ‘drug-like’ molecular structures thought to exist¹^,². Moreover, typically, less than 0.5% of compounds tested in screens turn out to have activity at the chosen biological target³. There is therefore much interest in expanding the number of molecules that can be explored in the early screening stages of drug-discovery programmes, while limiting the number that need to be synthesized and assayed in the laboratory. Writing in Nature, Lyu et al.⁴ achieve both these goals by computationally screening ultra-large compound libraries to prioritize compounds to be synthesized and assayed.

Physical drug-screening libraries are predominantly limited to compounds that are available in-house or off-the-shelf from commercial catalogues. By contrast, Lyu et al.docked — computationally simulated the binding of — 170 million compounds that could be made on demand by a commercial supplier. More than 97% of these compounds were not available from other vendors’ collections. The number of compounds in the authors’ make-on-demand library has since grown, and is projected to contain 1 billion within 2 years. The authors have made this library available as a public database of 3D molecular structures (see go.nature.com/2sywxlt), which can be used by any researchers for virtual screening.

To evaluate how well virtual screening works with this extremely high number of chemical structures, Lyu and colleagues first investigated whether a few tens to hundreds of known ligand molecules could be distinguished within the full library of 170 million members using docking scores — which quantify how strongly compounds bind to given biological targets. The authors virtually screened the libraries against two targets: the enzyme AmpC β-lactamase and the D₄ dopamine receptor. The top-scoring molecules did indeed include known ligands for these targets and their close structural analogues.

The authors went on to synthesize the top-scoring compounds that had not previously been identified as ligands, as well as some analogues of these compounds. Many of these were found to be pharmacologically active in assays. Impressively, one of the compounds is the most potent AmpC inhibitor known among those that do not bind irreversibly to the enzyme (potency describes the biological response of a target to its ligand, rather than the binding affinity of the ligand for the target).

One of the D₄-stimulating compounds has unprecedented affinity for D₄and selectivity for it over the related D₂ and D₃ receptor subtypes. Moreover, some of the other identified D₄ ligands were functionally selective — they preferentially activated either the G_i protein or the β-arrestin cellular signalling pathways that lead from D₄. It has been proposed that drugs that exhibit such functional selectivity might be safer than currently available ones⁵. Altogether, these results show how compounds that have biological activities comparable to those of highly optimized drugs can be identified simply by expanding the size and structural diversity of screening libraries.

Lyu et al. found that it was essential to screen the full library to discover the most biologically active compounds, thereby raising the bar for future virtual-screening studies — the best compounds will be missed if smaller libraries are screened, and the identified compounds will not be as good. Huge virtual screens have high computational demands, but can be fast, given sufficient calculation resources (screening for D₄ took just 1.2 days using processing power roughly equivalent to 190 desktop computers).

One of the main obstacles in docking studies is that compounds cannot be ranked as precisely as in real assays, which raises the question of how well docking scores correlate with biological activity. Lyu and colleagues’ study provides the largest and one of the most systematic and useful assessments of this issue. The authors show that experimentally determined hit rates — the percentage of assayed compounds that have activity at the target — range from about 25% for the highest-scoring compounds to 0% for the bottom scorers. On this basis, Lyu et al. estimated the total number of D₄ ligands (compounds that have a minimum affinity of 1 micromolar for the receptor) in their database to be an impressive 158,000.

Nearly all virtual-screening studies supplement machine selection of compounds with a human visual inspection — scientists ‘eyeball’ the high-scoring compounds and use their own knowledge of drug discovery to prioritize which ones to test. Unexpectedly, Lyu and colleagues’ study revealed that, although the hit rates of compounds selected with human input (about 24%) were similar to those selected using docking scores alone, the human-inspected compounds had higher affinities, efficacies and potencies. This demonstrates the benefits of human expertise. Nevertheless, users with limited expertise would benefit from the virtual-screening approach, because scoring alone identified good compounds that, in many cases, enabled even better compounds to be found through subsequent testing of analogues.

Although Lyu and co-workers’ computational approach is powerful, it does have some limitations. First, screening of the 1 billion molecules expected soon to be in their database will inevitably demand substantial computational resources. Cloud computing and grid computing (the use of a widely distributed network of computers) can solve this problem (see ref. 6, for example), but might not be affordable for all laboratories. The screening time will depend on the docking software used, and some research groups might not be able to use their own software because of licensing issues associated with its use (the authors used a freely available docking program).

Patenting of drugs developed from virtual screening could also become an issue — Lyu and colleagues’ database effectively makes public all the synthetically accessible structural analogues of library members, which might prevent certain types of patent from being filed for drugs developed from compounds identified in the screens. More generally, it is unclear whether it would be possible for a company to obtain exclusive rights to produce compounds listed in make-on-demand libraries, because there are few precedents for this.

Furthermore, the 24% and 11% experimental hit rates obtained for compounds physically tested in the D₄ and AmpC assays, respectively, show that it is still necessary to synthesize a large number of compounds to find potent molecules straight away. The lower hit rate for AmpC reflects the general difficulty of finding compounds that have high affinity for binding sites that lack a confined cavity. In principle, this problem can be overcome by screening peptides — but peptide docking has significantly lower throughput and accuracy than does small-molecule virtual screening⁷.

Future development of docking methods should explore whether aspects of human prioritization can be incorporated into the automated scoring function, to improve the quality of hits. The scientists who eyeballed the top-scoring compounds mainly filtered out molecules whose docking conformations were strained, and favoured compounds that formed certain interactions with their targets — an approach that can also be used to distinguish between molecules that stimulate or inhibit certain receptors⁸. Studies are also needed to test the feasibility of using other docking software in huge screens, and how well Lyu and colleagues’ method works for a wide variety of other molecular targets.

**Figure 1 | Virtual screening of ultra-large libraries can improve the efficiency of drug discovery.** a, In conventional drug discovery, libraries containing a few million compounds are screened in high-throughput assays to find compounds that can be used as leads for further development. These compounds typically have low potency and specificity for the biological target. Analogues of the lead compounds are then iteratively made and assayed in a medicinal-chemistry programme, to develop advanced compounds that have high potency and specificity. The whole process typically takes several years. b, Lyu *et al*.⁴ computationally screened 170 million compounds in a virtual library in about a day. Several of the resulting virtual lead compounds were then synthesized and tested in physical assays, and some were found to have biological profiles as good as those of advanced compounds developed conventionally. The overall process took only a few weeks.

Nonetheless, the new work clearly demonstrates the advantages of using ultra-large libraries to discover potent and selective molecules suitable as leads for drug development. Notably, the compounds identified by Lyu et al. are on a par with those that have been highly optimized in drug-discovery programmes or for use in biological experiments (Fig. 1). Moreover, the number of experimentally determined structures of target–ligand complexes is on the rise — obtained, for example, using cryo-electron microscopy⁹, or by using crystallography to capture series of complexes formed by repeatedly exchanging the ligand in a protein for another one¹⁰. These structures will provide many templates for use in docking studies, thereby increasing the utility of virtual screening as a key tool for drug discovery.

doi: 10.1038/d41586-019-00145-6