{"id":3775,"date":"2019-06-17T14:03:04","date_gmt":"2019-06-17T05:03:04","guid":{"rendered":"http:\/\/163.180.4.222\/lab\/?p=3775"},"modified":"2019-06-17T14:03:04","modified_gmt":"2019-06-17T05:03:04","slug":"the-digitization-of-organic-synthesis","status":"publish","type":"post","link":"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=3775","title":{"rendered":"The digitization of organic synthesis"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<section aria-labelledby=\"Abs1\">\n<div id=\"Abs1-section\" class=\"c-article-section js-article-section\">\n<p id=\"Abs1\" class=\"c-article-section__title js-section-title js-c-reading-companion-sections-item\"><strong>Abstract<\/strong><\/p>\n<div id=\"Abs1-content\" class=\"c-article-section__content js-collapsible-section\">\n<p>Organic chemistry has largely been conducted in an ad hoc manner by academic laboratories that are funded by grants directed towards the investigation of specific goals or hypotheses. Although modern synthetic methods can provide access to molecules of considerable complexity, predicting the outcome of a single chemical reaction remains a major challenge. Improvements in the prediction of \u2018above-the-arrow\u2019 reaction conditions are needed to enable intelligent decision making to select an optimal synthetic sequence that is guided by metrics including efficiency, quality and yield. Methods for the communication and the sharing of data will need to evolve from traditional tools to machine-readable formats and open collaborative frameworks. This will accelerate innovation and require the creation of a chemistry commons with standardized data handling, curation and metrics.<\/p>\n<\/div>\n<\/div>\n<\/section>\n<div class=\"c-article-access-provider\" data-component=\"provided-by-box\">\n<p class=\"c-article-access-provider__text\">\n<\/div>\n<section aria-labelledby=\"Sec1\">\n<div id=\"Sec1-section\" class=\"c-article-section js-article-section\">\n<p id=\"Sec1\" class=\"c-article-section__title js-section-title js-c-reading-companion-sections-item\"><strong>Main<\/strong><\/p>\n<div id=\"Sec1-content\" class=\"c-article-section__content js-collapsible-section\">\n<div id=\"i1\" class=\"c-article-section__illustration-right c-article-section__illustration\" data-test=\"illustration\">\n<div class=\"c-article-section__figure c-article-section__figure--1-border\"><span style=\"color: #82868b; font-size: 1rem;\">The preparation of oxalic acid and urea by W\u00f6hler almost 200 years ago established the field that we call organic synthesis<\/span><sup style=\"color: #82868b;\"><a id=\"ref-link-section-d44411e340\" title=\"W\u00f6hler, F. Ueber k\u00fcnstliche bildung des harnstoffs. Ann. Phys. 88, 253\u2013256 (1828).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR1\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 1\">1<\/a><\/sup><span style=\"color: #82868b; font-size: 1rem;\">. Human insight from reactivity explored in the interim can now lead to beautifully organized campaigns of complex natural products and bioactive molecules, which represent the pinnacle of synthetic design<\/span><sup style=\"color: #82868b;\"><a id=\"ref-link-section-d44411e344\" title=\"Whitesides, G. M. Complex organic synthesis: structure, properties, and\/or function? Isr. J. Chem. 58, 142 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR2\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\">2<\/a><\/sup><span style=\"color: #82868b; font-size: 1rem;\">. The idea of a synthesis machine that can build any molecule dates from the 1960s. However, although the first computer programs to design organic syntheses emerged around this time<\/span><sup style=\"color: #82868b;\"><a id=\"ref-link-section-d44411e348\" title=\"Corey, E. J. &amp; Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178\u2013192 (1969).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR3\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\">3<\/a>,<a id=\"ref-link-section-d44411e351\" title=\"Corey, E. J., Wipke, W. T., Cramer, R. D. III &amp; Howe, W. J. Computer-assisted synthetic analysis. Facile man\u2013machine communication of chemical structure by interactive computer graphics J. Am. Chem. Soc. 94, 421\u2013430 (1972).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR4\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\">4<\/a><\/sup><span style=\"color: #82868b; font-size: 1rem;\">, they failed to capture the imagination of chemists. Synthesis laboratories have remained sceptical of the ability of computer programs to learn the \u2018art\u2019 of organic chemistry, and have continued their tried and true approaches in their laboratories.<\/span><\/div>\n<\/div>\n<p>Now, the scepticism of synthetic chemists seems to be on the verge of changing. Using computer-aided synthesis planning (CASP), it is now possible to take the molecular structure of a desired product and output a detailed list of reaction schemes that connect the target molecule to known and often purchasable starting materials through a sequence of intermediates that are likely to be unknown<sup><a id=\"ref-link-section-d44411e365\" title=\"Szymku\u0107, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904\u20135937 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR5\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\">5<\/a>,<a id=\"ref-link-section-d44411e368\" title=\"Coley, C. W., Green, W. H. &amp; Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281\u20131289 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR6\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\">6<\/a><\/sup>\u00a0(Box\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Sec2\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\">1<\/a>). For example, the decision-tree-like search engine\u00a0<i>Chematica<\/i>\u2014which has a user-friendly graphical user interface and has been coded with human-curated rules over the past decade\u2014has received laboratory validation of the predicted synthesis of medicinally relevant targets<sup><a id=\"ref-link-section-d44411e378\" title=\"Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522\u2013532 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR7\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\">7<\/a><\/sup>. Approaches towards such programs usually reflect the priorities and prejudices of the programmers, and others have used different approaches\u2014for example, using machine-learning algorithms or Monte Carlo Tree Search (as in AlphaGo<sup><a id=\"ref-link-section-d44411e382\" title=\"Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354\u2013359 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR8\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\">8<\/a><\/sup>) to guide the search, and a filter network to pre-select the most promising retrosynthetic steps that is trained on essentially all reactions ever published in organic chemistry<sup><a id=\"ref-link-section-d44411e387\" title=\"Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C. &amp; Laino, T. \u201cFound in translation\u201d: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091\u20136098 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+nature%2Frss%2Fcurrent+%28Nature+-+Issue%29#ref-CR9\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">9<\/a>,<a id=\"ref-link-section-d44411e387_1\" title=\"Segler, M. H. S. &amp; Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118\u20136128 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+nature%2Frss%2Fcurrent+%28Nature+-+Issue%29#ref-CR10\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">10<\/a>,<a id=\"ref-link-section-d44411e390\" title=\"Segler, M. H. S., Preuss, M. &amp; Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604\u2013610 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR11\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\">11<\/a><\/sup>. In the future, it will be substantially faster for such programs to learn automatically from the primary data rather than rely on extracted rules and hand-designed heuristics, in analogy to the differences in strategy between Stockfish and AlphaZero in learning chess<sup><a id=\"ref-link-section-d44411e394\" title=\"Kasparov, G. Chess, a Drosophila of reasoning. Science 362, 1087 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR12\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 12\">12<\/a><\/sup>.<\/p>\n<p>The digitization of multistep organic synthesis is fast approaching, and the automation of the synthesis planning is just the first component that must be considered before automated reaction prediction can become a reality. The selection of reaction conditions is a key element of automated reaction prediction and is potentially a far more challenging task<sup><a id=\"ref-link-section-d44411e401\" title=\"Cernak, T. A machine with chemical intuition. Chem 4, 401\u2013403 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR13\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 13\">13<\/a><\/sup>\u00a0(Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig1\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">1<\/a>). This Perspective surveys the current prospects for the prediction of above-the-arrow conditions and addresses the challenges that are involved in integrating them into optimal methods of synthesis. For one, it has been stated that \u201csyntheses are reported in prose\u201d<sup><a id=\"ref-link-section-d44411e408\" title=\"Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR14\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\">14<\/a><\/sup>. Not only are the reactions conditions often poorly communicated, but details are also omitted when explaining exactly how operations were carried out, meaning that many assumptions are made about the skills of the researcher repeating the synthesis. The prediction problem must then consider an even broader range of variables in order to master or fully execute a synthesis or optimization, depending on the context of academic research and medicinal or process chemistry.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"figure-1\" class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\">\n<figure><figcaption><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 1: Above-the-arrow conditions and the digitization of organic synthesis.<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/1\" rel=\"nofollow\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\"><picture><source srcset=\"\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig1_HTML.png?as=webp\" type=\"image\/webp\" \/><img decoding=\"async\" src=\"https:\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig1_HTML.png\" alt=\"figure1\" aria-describedby=\"figure-1-desc\" \/><\/picture><\/a><\/div>\n<div id=\"figure-1-desc\" class=\"c-article-section__figure-description\" data-test=\"bottom-caption\">\n<p>To perform an organic chemical reaction in a laboratory, the conditions listed above the arrow are required to run the synthesis and isolate the desired product.<\/p>\n<\/div>\n<\/div>\n<div class=\"u-text-right u-hide-print\"><a class=\"c-article__pill-button\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/1\" rel=\"nofollow\" data-test=\"article-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"button\" data-track-action=\"view figure\" data-track-dest=\"link:Figure1 Full size image\">Full size image<\/a>&nbsp;<\/p>\n<\/div>\n<\/figure>\n<\/div>\n<aside>\n<div class=\"c-article-box\" data-expandable-box-container=\"true\">\n<div id=\"box-Sec2\" class=\"c-article-box__container\" data-expandable-box=\"true\" aria-hidden=\"true\">\n<p id=\"Sec2\" class=\"c-article-box__container-title js-expandable-title\"><strong>Box 1 Computer-aided synthesis planning<\/strong><\/p>\n<div class=\"c-article-box__content\">\n<p>Computer-aided synthesis planning software was first described in the late 1960s<sup><a id=\"ref-link-section-d44411e442\" title=\"Corey, E. J. &amp; Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178\u2013192 (1969).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR3\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\">3<\/a>,<a id=\"ref-link-section-d44411e445\" title=\"Corey, E. J., Wipke, W. T., Cramer, R. D. III &amp; Howe, W. J. Computer-assisted synthetic analysis. Facile man\u2013machine communication of chemical structure by interactive computer graphics J. Am. Chem. Soc. 94, 421\u2013430 (1972).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR4\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\">4<\/a><\/sup>. Recently, machine-learning-based tools have been developed that provide information on route planning for a target molecule<sup><a id=\"ref-link-section-d44411e449\" title=\"Szymku\u0107, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904\u20135937 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR5\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\">5<\/a>,<a id=\"ref-link-section-d44411e452\" title=\"Coley, C. W., Green, W. H. &amp; Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281\u20131289 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR6\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\">6<\/a><\/sup>. These algorithms are trained on the chemical literature, learning the \u2018rules and reasoning\u2019 of synthesis, and then predict a suitable synthetic route. They have been shown to be comparable to suggested routes from trained chemists towards medicinally relevant targets<sup><a id=\"ref-link-section-d44411e456\" title=\"Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522\u2013532 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR7\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\">7<\/a><\/sup>.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"figure-a\" class=\"c-article-section__figure c-article-section__figure--no-border\" data-test=\"figure\" data-container-section=\"figure\">\n<figure>\n<div id=\"Figa\" class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure--no-border-item\">\n<div class=\"c-article-section__figure--no-border-item-content\"><picture><source srcset=\"\/\/media.springernature.com\/full\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Figa_HTML.jpg?as=webp\" type=\"image\/webp\" \/><img decoding=\"async\" src=\"https:\/\/media.springernature.com\/lw900\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Figa_HTML.jpg\" alt=\"figurea\" aria-describedby=\"figure-a-desc\" \/><\/picture><\/div>\n<\/div>\n<div id=\"figure-a-desc\" class=\"c-article-section__figure-description\" data-test=\"bottom-caption\">\n<p>A route is predicted from commercial materials to give the desired target molecule.<\/p>\n<\/div>\n<\/div>\n<\/figure>\n<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>These critical advances in machine-aided synthesis are still limited in their application to more complex molecules such as natural products, as well as in dealing with the intricacies of medicinal and process chemistry. They rely on the datasets published in journal articles, which represent only a fraction of the raw data collected in a given research project or company portfolio. The continued advancement and proliferation of machine learning requires that methods of sharing and communicating information change and move to open collaborative frameworks with fully published machine readable datasets that are more transparent, contextualized and traceable.<\/p>\n<\/div>\n<\/div>\n<div class=\"c-article-box__controls\">\n<div class=\"c-article-box__fade\"><\/div>\n<p><button aria-expanded=\"false\" aria-controls=\"box-Sec2\"><span class=\"c-article-box__button-text\" data-expandable-label=\"\">Show more<\/span><\/button><\/div>\n<\/div>\n<\/aside>\n<\/div>\n<\/div>\n<\/section>\n<section aria-labelledby=\"Sec3\">\n<div id=\"Sec3-section\" class=\"c-article-section js-article-section\">\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p id=\"Sec3\" class=\"c-article-section__title js-section-title js-c-reading-companion-sections-item\"><strong>Challenges in culture and data reporting<\/strong><\/p>\n<div id=\"Sec3-content\" class=\"c-article-section__content js-collapsible-section\">\n<p>Proposing specific reactions to a given target on the basis of the literature and canonical rules may seem to be a mysterious and daunting task to most, but it is considered a routine activity for practitioners of organic synthesis who begin to grasp the principles as chemistry undergraduates<sup><a id=\"ref-link-section-d44411e492\" title=\"Garg, N. K. Empowering students to innovate: engagement in organic chemistry teaching. Angew. Chem. Int. Ed. 57, 15612\u201315613 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR15\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\">15<\/a><\/sup>. Throughout a career these skills are improved, and the well-trained chemist often uses rules and patterns of chemical reactivity that they have developed by immersion in the field. With a new synthetic problem at hand, the chemist tries to compare it to a known one before making sense of it\u2014a similar concept to that used by deep-learning algorithms. Historically, having spent a day reading the literature or conducting database searches, the chemist absorbs the precedents and sets off for the laboratory. Within a modern chemistry setting, predicting the starting point for experimentation\u2014especially for complex molecular environments\u2014is now challenging for even the best-educated of chemists. The yield and the selectivity (chemo-, regio-, diastereo- and enantioselectivity) of any transformation in the field of catalysis can be controlled by millions of permutations\u2014including temperature, solvent, ligand and ancillary reagents\u2014even before other metrics of quality are applied. Simply using a large number of experiments in (electronic) notebooks to select the above-the-arrow conditions has been unsuccessful so far, as the data are fractured and collected without diversity of starting materials\u2014often because organizations have experience with molecules that were influenced by a target area in biology. Another obstacle related to human nature is that when reactions fail, the experimentalist is often not concerned with complete documentation and moves onto another task. In the area of medicinal chemistry, in which enormous numbers of experiments are performed, it has been stated that there are only two yields that matter: enough and not enough<sup><a id=\"ref-link-section-d44411e496\" title=\"Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203\u20131218 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR16\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\">16<\/a><\/sup>. Overall, the current approaches used to record experiments fail to capture the \u2018messiness\u2019 of organic synthesis, as well as the continuous nature of the solutions in the real world.<\/p>\n<p>To advance the field of machine learning in organic synthesis, enormous improvements will be required to enable the prediction of the discrete and continuous variables in the reaction conditions that appear above the arrow (Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig1\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">1<\/a>). This will be possible only if it is accompanied by advances in the reporting of cases in which syntheses are captured in the form of digital code that can be published, versioned and transferred flexibly between platforms to enhance reproducibility. Despite the abundant incentives for academic and industrial scientists to share synthetic data via publication, the data published in most journal articles represents only a fraction of the raw data collected in a given research project. As a community we rely on outdated means that are mere facsimiles rather than machine-readable formats. A stumbling block is not only how to uniformly collect, clean and label data that are of use for training inside an organization or laboratory, but also how to align incentives to make data broadly available via new data intermediaries.<\/p>\n<p>Further challenges for machine learning concern the identification and scoring of the criteria for the efficiency of the overall synthetic sequence, as there are currently no clear criteria on which this can be judged. It is already impossible for a human to assess all available options from either the recalling of synthetic methods or searching online. The formulation of such rules has primarily occurred in an academic setting around the definition of an ideal synthesis<sup><a id=\"ref-link-section-d44411e509\" title=\"Gaich, T. &amp; Baran, P. S. Aiming for the ideal synthesis. J. Org. Chem. 75, 4657\u20134673 (2010).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+nature%2Frss%2Fcurrent+%28Nature+-+Issue%29#ref-CR17\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">17<\/a>,<a id=\"ref-link-section-d44411e509_1\" title=\"Trost, B. M. The atom economy\u2014a search for synthetic efficiency. Science 254, 1471\u20131477 (1991).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+nature%2Frss%2Fcurrent+%28Nature+-+Issue%29#ref-CR18\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">18<\/a>,<a id=\"ref-link-section-d44411e512\" title=\"Burns, N. Z., Baran, P. S. &amp; Hoffmann, R. W. Redox economy in organic synthesis. Angew. Chem. Int. Ed. 48, 2854\u20132867 (2009).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR19\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\">19<\/a><\/sup>. The \u2018fit-for-purpose\u2019 rule of academia or medicinal chemistry will certainly be unacceptable in the fine- and commodity-chemical sectors of the industry, in which efficiency, quality and safety are all a necessity. The reaction steps, time to a workable answer, speed and throughput, availability of diverse raw materials, process economics, sustainability and energy consumption all need to be included in assessing digitization of the multistep synthesis to define an answer that is beyond the output of a detailed list of potential reaction schemes.<\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/section>\n<section aria-labelledby=\"Sec4\">\n<div id=\"Sec4-section\" class=\"c-article-section js-article-section\">\n<p id=\"Sec4\" class=\"c-article-section__title js-section-title js-c-reading-companion-sections-item\"><strong>Complexity in the execution of synthesis<\/strong><\/p>\n<div id=\"Sec4-content\" class=\"c-article-section__content js-collapsible-section\">\n<p>The total synthesis of maoecrystal V (Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig2\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">2<\/a>) is a good illustration of the level of above-the-arrow complexity in contemporary natural-product synthesis. In the preparation of this compound, which was completed by the Baran laboratory<sup><a id=\"ref-link-section-d44411e528\" title=\"Cernijenko, A., Risgaard, R. &amp; Baran, P. S. 11-step total synthesis of (\u2212)-maoecrystal V. J. Am. Chem. Soc. 138, 9425\u20139428 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR20\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 20\">20<\/a><\/sup>, what is essentially an aldol reaction\u2014taught in first-year organic chemistry classes\u2014proved to be the most challenging step.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"figure-2\" class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\">\n<figure><figcaption><b id=\"Fig2\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 2: Optimizing one step in the total synthesis of maoecrystal V.<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/2\" rel=\"nofollow\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\"><picture><source srcset=\"\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig2_HTML.png?as=webp\" type=\"image\/webp\" \/><img decoding=\"async\" src=\"https:\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig2_HTML.png\" alt=\"figure2\" aria-describedby=\"figure-2-desc\" \/><\/picture><\/a><\/div>\n<div id=\"figure-2-desc\" class=\"c-article-section__figure-description\" data-test=\"bottom-caption\">\n<p>The natural product is prepared in a longest linear sequence of 11 steps. Step 7 is a reaction of enone\u00a0<b>1<\/b>\u00a0and formaldehyde to provide hydroxymethylketone\u00a0<b>2<\/b>. In order to perform this reaction in a laboratory, at least 16 conditions\u2014including workup procedures\u2014are listed above the arrow.<\/p>\n<\/div>\n<\/div>\n<div class=\"u-text-right u-hide-print\"><a class=\"c-article__pill-button\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/2\" rel=\"nofollow\" data-test=\"article-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"button\" data-track-action=\"view figure\" data-track-dest=\"link:Figure2 Full size image\">Full size image<\/a><\/div>\n<\/figure>\n<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>The enolate-based installation of the hydroxymethyl group overcame the challenges of chemo- and regioselectivity. Over 1,000 experiments were carried out in order to optimize the reaction conditions, changing every conceivable variable possible; as a result, the optimized reaction has at least 16 conditions listed above the arrow. Conditions such as solvent and temperature changes and those used in workups are rarely considered in this context, but are essential for the successful repetition of the experiment. The desired product\u00a0<b>2<\/b>\u00a0was obtained with complete chemoselectivity, although the diastereoselectivity (2:1) and the yield (84%) remained intransigent to further improvement. Although far from optimal, the intermediate hydroxymethylketone\u00a0<b>2<\/b>\u00a0was processed onto maoecrystal V to provide sufficient material to answer the key biological questions presented by this molecule.<\/p>\n<p>Different challenges prevail in the field of medicinal chemistry, in which molecules are designed to engage with increasingly more complex biological targets. Hundreds or thousands of molecules are required to advance from a hit compound to a drug candidate, and the synthetic route provides a platform from which to optimize for molecular function and explore biology. A consideration for any reaction used in medicinal chemistry is its level of tolerance to the polar functional groups and nitrogen heteroatoms that are typically found in biologically active molecules. As artificial intelligence and big data are increasingly used in medicinal chemistry for compound prediction and prioritization, it will become even more important to make the right compound the first time<sup><a id=\"ref-link-section-d44411e572\" title=\"Griffen, E. J., Dossetter, A. G., Leach, A. G. &amp; Montague, S. Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence? Drug Discov. Today 23, 1373\u20131384 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR21\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\">21<\/a><\/sup>. It is clear that even for well-precedented reactions and obvious retrosynthetic disconnections (that is, breaking a molecule up into simpler starting materials), there are fundamental practical limitations when considering the conditions needed to make sufficient material for biological testing<sup><a id=\"ref-link-section-d44411e576\" title=\"Kutchukian, P. S. et al. Chemistry informer libraries: a chemoinformatics enabled approach to evaluate and advance synthetic methods. Chem. Sci. 7, 2604\u20132613 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR22\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\">22<\/a><\/sup>. Even within the context of the late-stage functionalization of a drug-like molecule, the individual conditions in that single step can still profoundly affect selectivity<sup><a id=\"ref-link-section-d44411e580\" title=\"Yao, H. et al. Enabling efficient late-stage functionalization of drug-like molecules with LC-MS and reaction-driven data processing. Eur. J. Org. Chem. 2017, 7122\u20137126 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR23\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\">23<\/a><\/sup>.<\/p>\n<p>As with natural-product synthesis, process chemistry has often been described as an \u2018art\u2019<sup><a id=\"ref-link-section-d44411e587\" title=\"Yasuda, N. (ed.) The Art of Process Chemistry (Wiley-VCH, 2010).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR24\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\">24<\/a><\/sup>. Well-trained organic chemists read literature and generate the reaction sequence that in their best estimate meets their goals; however, these estimates are often biased by cultural- and company-based specific information on route selection, approaches to reject impurities, and the preparation of salts to improve the crystallinity, solubility and stability of intermediates or the active pharmaceutical compound. Process chemists have developed an intuition as to how well a reaction is likely to scale to obtain a high yield, high concentration, and low catalyst loading with good impurity rejection, and this informs the choice of a synthesis. This informal knowledge, acquired over many years by real-world reinforcement, is rarely captured in any form besides institutional knowledge.<\/p>\n<p>Additionally, only a few well-conceived ideas can currently be pursued by process chemists in the laboratory. Commercial and regulatory pressures ensure that, among the range of potential routes identified early on, a single approach will be taken forward for validation and commercialization. These decisions are made largely with contradictory\u2014or, at best, missing\u2014data concerning the future potential efficiency of the route. This critical selection process is performed in the absence of quantitative efficiency data, and is often influenced by judgements on risk mitigation to product filing, or by broad assumptions around supply chain and tax and treasury. Although a considerable financial impact can be achieved by minimizing the costs of reagents and solvents and by optimizing the conditions for small improvements in yield or product quality, this impact cannot overcome the selection of a suboptimal route. It is highly desirable to understand all of the viable options before beginning full-scale development<sup><a id=\"ref-link-section-d44411e595\" title=\"Li, J., Albrecht, J., Borovika, A. &amp; Eastgate, M. D. Evolving green chemistry metrics into predictive tools for decision making and benchmarking analytics. ACS Sustainable Chem. Eng. 6, 1121\u20131132 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR25\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\">25<\/a><\/sup>.<\/p>\n<p>Further predictions of conditions that are not historically included above the arrow must be used to narrow the range of options for further exploration. In process chemistry, the crystallinity and solubility, physical attributes of crystallization kinetics, particle-size reduction, flow ability, and solid-state stability are all key to understanding chemical intermediates and pharmaceutical properties. Thus, machine-learning algorithms should ideally be tailored to different criteria than those in other areas of synthesis. We will need to advance our ability to predict the organic-solvent solubility, the crystal phase and the morphology of compounds if we are to develop viable options without a priori knowledge.<\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/section>\n<section aria-labelledby=\"Sec5\">\n<div id=\"Sec5-section\" class=\"c-article-section js-article-section\">\n<p id=\"Sec5\" class=\"c-article-section__title js-section-title js-c-reading-companion-sections-item\"><strong>Emerging examples of innovation using enhanced data<\/strong><\/p>\n<div id=\"Sec5-content\" class=\"c-article-section__content js-collapsible-section\">\n<p>The goal of building a synthesis machine that can provide high-quality reagents for biology\u2014beyond peptides and oligonucleotides\u2014has been championed as a way of freeing up chemists for creative thinking by removing the bottleneck of synthesis<sup><a id=\"ref-link-section-d44411e611\" title=\"Trobe, M. &amp; Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192\u20134214 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR26\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\">26<\/a><\/sup>. However, a general commoditization of synthetic medicinal chemistry is not likely to emerge until we have made these orders-of-magnitude improvements in above-the-arrow prediction. Ultimately, machine learning will enable the field to predict individual conditions by moving along the spectrum of individual chemistry experiments, run one at a time, through large data assimilation and then back to individual conditions. A chemist can then, with a high degree of confidence, guarantee that sufficient product will be obtained in a single experiment to test the function of a molecule.<\/p>\n<p>Scientists at Merck recognized this problem and systematically built tools, using high-throughput experimentation and analysis, to address the gaps in data<sup><a id=\"ref-link-section-d44411e618\" title=\"Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49\u201353 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR27\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\">27<\/a><\/sup>. Using the ubiquitous palladium-catalysed Suzuki\u2013Miyaura cross-coupling reaction as a test case, they developed automation-friendly reactions that could operate at room temperature by using robotics employed in biotechnology coupled with emerging high-throughput analysis techniques. More than 1,500 chemistry experiments can be carried out in a day with this setup, using as little as 0.02 mg of starting material per reaction. This has since been expanded to allow for the in situ analysis of structure\u2013activity relationships (nanoSAR)<sup><a id=\"ref-link-section-d44411e622\" title=\"Gesmundo, N. et al. Nanoscale synthesis and affinity ranking. Nature 557, 228\u2013232 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR28\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\">28<\/a><\/sup>. The authors note that, in the future, machine learning may aid the navigation of both reaction conditions and biological activity. Complementary approaches, such as inverse molecular design using machine learning, may also generate models for the rational design of prospective drugs<sup><a id=\"ref-link-section-d44411e626\" title=\"Sanchez-Lengeling, B. &amp; Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360\u2013365 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR29\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\">29<\/a>,<a id=\"ref-link-section-d44411e629\" title=\"Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97\u2013113 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR30\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\">30<\/a><\/sup>.<\/p>\n<p>In order to reduce analysis time, ultra-high-throughput chemistry can be coupled to an advanced mass spectrometry method (such as matrix-assisted laser desorption ionization\u2013time-of-flight spectrometry; MALDI\u2013TOF) to enable the classification of thousands of experiments in minutes<sup><a id=\"ref-link-section-d44411e636\" title=\"Lin, S. et al. Mapping the dark space of chemical reactions with extended nanomole synthesis and MALDI-TOF MS. Science 361, eaar6236 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR31\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\">31<\/a><\/sup>. This classification approach may at first be slightly uncomfortable for synthetic chemists who hold stock in obtaining a hard yield, but it will surely become commonplace as more statistical methods and predictive models are deployed.<\/p>\n<p>Machine learning has recently been used to predict the performance of a reaction on a given substrate in the widely used Buchwald\u2013Hartwig C\u2013N coupling reaction<sup><a id=\"ref-link-section-d44411e643\" title=\"Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. &amp; Doyle, A. G. Predicting reaction performance in C\u2013N cross-coupling using machine learning. Science 360, 186\u2013190 (2018). This article demonstrates machine learning in prediction of the performance of a catalytic reaction using data obtained via high-throughput experimentation.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR32\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\">32<\/a><\/sup>. The Doyle laboratory used a robot-enabled simultaneous evaluation method with three 1,536-well plates that consisted of a full matrix of aryl halides, Buchwald ligands, bases and additives, giving a total of 4,608 reactions. The yields of these reactions were used as the model output and provided a clean, structured dataset containing substantially more reaction dimensions than have previously been examined with machine learning. Approximately 30% of the reactions failed to deliver any product, with the remainder spread relatively evenly over the range of non-zero yields. Using concepts popularized by the Sigman group<sup><a id=\"ref-link-section-d44411e647\" title=\"Zhao, S. et al. Enantiodivergent Pd-catalyzed C\u2013C bond formation enabled through ligand parameterization. Science 362, 670\u2013674 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR33\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\">33<\/a><\/sup>, scripts were built to compute and extract atomic, molecular and vibrational descriptors for the components of the cross-coupling. Using these descriptors as inputs and reaction yield as the output, a random forest algorithm was found to afford high predictive performance. This model was also successfully applied to sparse training sets and out-of-sample reaction outcome prediction, suggesting that a systematic reaction-profiling capability and machine learning will have general value for the survey and navigation of reaction space for other reaction types.<\/p>\n<p>It has been suggested by Chuang and Keiser that this experimental design failed classical controls in machine learning, as it cannot distinguish chemically trained models from those trained on random features<sup><a id=\"ref-link-section-d44411e655\" title=\"Chuang, K. V. &amp; Keiser, M. J. Comment on \u201cPredicting reaction performance in C\u2013N cross-coupling using machine learning\u201d. Science 362, eaat8603 (2018). This article illustrates the need to incorporate random-control procedures when applying machine learning to new scientific domains and the importance of experimental design.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR34\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\">34<\/a><\/sup>. As they noted, flexible and powerful machine-learning models have become widespread, and their use can become problematic without some understanding of the underlying theoretical frameworks behind the models. The ability to distinguish peculiarities of the layout of an experiment from those that extract meaningful and actionable patterns also need to developed. Regardless, it is clear that the approach taken by Doyle\u2014publishing a complete dataset and aligned code on GitHub\u2014enables a clear demonstration of the scientific method of testing and generating hypotheses in independent laboratories.<\/p>\n<p>The application of machine learning to the prediction of reactions has also been demonstrated for the conversion of alcohols to fluorides, the products of which are high-value targets in medicinal chemistry<sup><a id=\"ref-link-section-d44411e662\" title=\"Nielsen, M. K., Ahneman, D. T., Riera, O. &amp; Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004\u20135008 (2018). This paper demonstrates the use of machine learning on a relatively small dataset obtained by traditional laboratory experimentation.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR35\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\">35<\/a><\/sup>\u00a0(Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig2\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">2<\/a>). In order to train a model for this reaction, descriptors for the substrates and reagents used in 640 screening reactions were tabulated. These included computed atomic and molecular properties as well as binary categorical identifiers (such as primary, secondary, cyclic). A random forest algorithm was used and was trained on 70% of the screening entries. The model was evaluated using a test set comprising the remaining 192 reactions and was validated on five structurally different substrates from outside the training set. The yields of these reactions were predicted with reasonable accuracy, which is more than sufficient to enable synthetic chemists to evaluate the feasibility of a reaction and to select initial reaction conditions. In comparison to previous studies, this training set was 80% smaller, encompassed much broader substrate diversity and incorporated multiple mechanisms. The expansion of the training set for this deoxyfluorination reaction to include additional variables (that is, stoichiometry, concentration, solvent and temperature) could lead to more accurate and comprehensive coverage of the complex reaction space.<\/p>\n<p>Flow chemistry presents another opportunity for accelerated reaction development<sup><a id=\"ref-link-section-d44411e672\" title=\"Reizman, B. J. &amp; Jensen, K. F. Feedback in flow for accelerated reaction development. Acc. Chem. Res. 49, 1786\u20131796 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR36\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\">36<\/a><\/sup>. A recent publication by a Pfizer team<sup><a id=\"ref-link-section-d44411e676\" title=\"Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429\u2013434 (2018). This article illustrates that a flow apparatus can accelerate reaction optimization earlier in the drug-discovery process and also provides reliable data that enables other laboratories to build machine-learning algorithms.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR37\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\">37<\/a><\/sup>\u00a0demonstrated high-throughput reaction screening of the Suzuki\u2013Miyaura coupling with multiple discrete (catalyst, ligand and base) and continuous (temperature, residence time and pressure) variables (5,760 reactions in total), overcoming a common problem in which limited amounts of material do not allow for the application of flow reaction screening in medicinal chemistry (Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig3\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">3a, b<\/a>). Quinolines (<b>3a<\/b>\u2013<b>g<\/b>) and indazole acids (<b>4a<\/b>\u2013<b>d<\/b>) were used to validate the platform. In an important demonstration of the capability of the platform for the preparation of useful quantities of material, the team programmed the injection of 100 consecutive segments based on optimal conditions from screening, enabling the preparation of approximately 100 mg of a target molecule per hour.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"figure-3\" class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\">\n<figure><figcaption><b id=\"Fig3\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 3: Reaction prediction of a deoxyfluorination, a high-value transformation in medicinal chemistry, using machine learning.<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/3\" rel=\"nofollow\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\"><picture><source srcset=\"\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig3_HTML.png?as=webp\" type=\"image\/webp\" \/><img decoding=\"async\" src=\"https:\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig3_HTML.png\" alt=\"figure3\" aria-describedby=\"figure-3-desc\" \/><\/picture><\/a><\/div>\n<div id=\"figure-3-desc\" class=\"c-article-section__figure-description\" data-test=\"bottom-caption\">\n<p>Six hundred and forty screening reactions were performed to train a machine-learning model (yields presented as a heat map). This was used for the successful prediction of the yield and conditions for structurally different substrates that do not appear in the training set. This figure was adapted with permission from ref.\u00a0<sup><a id=\"ref-link-section-d44411e710\" title=\"Nielsen, M. K., Ahneman, D. T., Riera, O. &amp; Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004\u20135008 (2018). This paper demonstrates the use of machine learning on a relatively small dataset obtained by traditional laboratory experimentation.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR35\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\">35<\/a><\/sup>, copyright 2018 American Chemical Society.<\/p>\n<\/div>\n<\/div>\n<div class=\"u-text-right u-hide-print\"><a class=\"c-article__pill-button\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/3\" rel=\"nofollow\" data-test=\"article-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"button\" data-track-action=\"view figure\" data-track-dest=\"link:Figure3 Full size image\">Full size image<\/a><\/div>\n<\/figure>\n<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>The Jamison and Jensen groups have described an automated flow-based platform<sup><a id=\"ref-link-section-d44411e725\" title=\"Bedard, A.-C. et al. Reconfigurable system for automated optimization of diverse chemical reactions. Science 361, 1220\u20131225 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR38\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\">38<\/a><\/sup>\u00a0to optimize above-the-arrow conditions to improve the yield, selectivity and reaction scope of a diverse range of reactions; this is typically a tedious and labour-intensive task in the laboratory. By using feedback from online analytics, the system converges on optimal conditions that can then be repeated or transferred with high fidelity as needed. These automated systems in academic laboratories may also play a part in the rapid collection of large, standardized datasets<sup><a id=\"ref-link-section-d44411e729\" title=\"Caramelli, D. et al. Networking chemical robots for reaction multitasking. Nat. Commun. 9, 3406 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR39\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\">39<\/a><\/sup>.<\/p>\n<p>Chemical synthesis may no longer be solely a human activity. In a recent study, the Cronin laboratory demonstrated that a robotic reaction-handling system controlled by a machine-learning algorithm might be able to explore organic reactions an order of magnitude faster than a manual process<sup><a id=\"ref-link-section-d44411e736\" title=\"Granda, J. M., Donina, L., Dragone, V., Long, D.-L. &amp; Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377\u2013381 (2018). This article predicts the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset and, notably, the approach was also used to calculate the reactivity of published datasets.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR40\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\">40<\/a><\/sup>. The robotic approach enabled the capture of information on failed or non-reactive experiments in a structured fashion, making it useful for reaction mapping. The powerful machine-learning algorithm was able to predict the reactivity of 1,000 reaction combinations from the above Pfizer dataset (Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig4\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">4a<\/a>), with greater than 80% accuracy, after considering the outcomes of around 10% of the dataset.<\/p>\n<p>In this machine-learning analysis of the Pfizer work, one-hot encoding of the reaction conditions\u2014in which the variables were assigned binary representations\u2014and the clean standardized yield data were used to explore the prediction of yields by a neural network (catalyst loading and temperature were not included). In this approach, a random selection of 10% (<i>n<\/i>\u00a0=\u00a0576) of the Suzuki\u2013Miyaura reactions is used to train the neural net, and the remaining reactions are then scored by the model (Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig4\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">4b<\/a>). The candidates with the highest predicted yield are then added to the performed reactions, and the performance of the neural network is evaluated by calculating the mean of the true yield and the standard deviation of the yield. The neural network is then retrained, and the whole cycle is repeated until the entire space is explored in panels of 100 to demonstrate the alignment with the high-throughput experimentation as well as to evaluate the performance of the neural net. Such rapid evaluation is markedly enabled by the publication of reliable clean data.<\/p>\n<p>A common theme in these three machine-learning examples is that predictions can be made with relatively small datasets: in some cases, with only 10% of the total number of reactions it is possible to predict the outcomes of the remaining 90%, without the need to physically conduct the experiments (Fig.\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#Fig4\" data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\">4<\/a>). The high-fidelity data can originate from ultra-high-throughput screening, from flow chemistry or from an individual scientist, but the most important feature is the contextualized, internally consistent source that provides effective, secure and accurate data. This is important because it is currently not known how large these datasets need to be in order to predict across the molecules that represent drug-like space. Naturally, some reactivity trends may be reflective of how the individual experiments are conducted and not truly informative of a particular catalyst or ligand. A diagnostic approach using small libraries of curated drug-like molecules\u2014known as \u2018informer libraries\u2019\u2014has been presented as a way to better capture reaction scope and evolve synthetic models, but this should be viewed as an intermediary step as the field moves forward<sup><a id=\"ref-link-section-d44411e759\" title=\"Kutchukian, P. S. et al. Chemistry informer libraries: a chemoinformatics enabled approach to evaluate and advance synthetic methods. Chem. Sci. 7, 2604\u20132613 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR22\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\">22<\/a><\/sup>.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"figure-4\" class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\">\n<figure><figcaption><b id=\"Fig4\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 4: Accelerated reaction development in flow and reaction prediction.<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/4\" rel=\"nofollow\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\"><picture><source srcset=\"\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig4_HTML.png?as=webp\" type=\"image\/webp\" \/><img decoding=\"async\" src=\"https:\/\/media.springernature.com\/m685\/springer-static\/image\/art%3A10.1038%2Fs41586-019-1288-y\/MediaObjects\/41586_2019_1288_Fig4_HTML.png\" alt=\"figure4\" aria-describedby=\"figure-4-desc\" \/><\/picture><\/a><\/div>\n<div id=\"figure-4-desc\" class=\"c-article-section__figure-description\" data-test=\"bottom-caption\">\n<p><b>a<\/b>, A Suzuki\u2013Miyaura reaction optimized in flow. A heat map of yields of the 5,760 reactions run is shown (<b>3a<\/b>\u2013<b>d<\/b>\u00a0with\u00a0<b>4a<\/b>\u2013<b>c<\/b>\u00a0and the reaction of\u00a0<b>3e<\/b>\u2013<b>g<\/b>\u00a0with\u00a0<b>4d<\/b>), evaluated across a matrix of 11 ligands (plus one blank)\u00a0\u00d7\u00a07 bases (plus one blank)\u00a0\u00d7\u00a04 solvents (ref.\u00a0<sup><a id=\"ref-link-section-d44411e801\" title=\"Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429\u2013434 (2018). This article illustrates that a flow apparatus can accelerate reaction optimization earlier in the drug-discovery process and also provides reliable data that enables other laboratories to build machine-learning algorithms.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR37\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\">37<\/a><\/sup>).\u00a0<b>b<\/b>, These data were used for one-hot encoding of reactants\u00a0<b>3<\/b>, reactants\u00a0<b>4<\/b>, ligands, bases and solvents as a test set for prediction of yield from the test set (30% of the reactions). Predictions for the full dataset are also shown. Panel\u00a0<b>a<\/b>\u00a0is adapted from ref.\u00a0<sup><a id=\"ref-link-section-d44411e818\" title=\"Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429\u2013434 (2018). This article illustrates that a flow apparatus can accelerate reaction optimization earlier in the drug-discovery process and also provides reliable data that enables other laboratories to build machine-learning algorithms.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR37\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\">37<\/a><\/sup>, reprinted with permission from AAAS; panel\u00a0<b>b<\/b>\u00a0is adapted from ref.\u00a0<sup><a id=\"ref-link-section-d44411e825\" title=\"Granda, J. M., Donina, L., Dragone, V., Long, D.-L. &amp; Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377\u2013381 (2018). This article predicts the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset and, notably, the approach was also used to calculate the reactivity of published datasets.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR40\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\">40<\/a><\/sup>.<\/p>\n<\/div>\n<\/div>\n<div class=\"u-text-right u-hide-print\"><a class=\"c-article__pill-button\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y\/figures\/4\" rel=\"nofollow\" data-test=\"article-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"button\" data-track-action=\"view figure\" data-track-dest=\"link:Figure4 Full size image\">Full size image<\/a><\/div>\n<\/figure>\n<\/div>\n<p>There have also been important advances in predictive catalysis<sup><a id=\"ref-link-section-d44411e840\" title=\"Harper, K. C. &amp; Sigman, M. S. Predicting and optimizing asymmetric catalyst performance using the principles of experimental design and steric parameters. Proc. Natl Acad. Sci. USA 108, 2179\u20132183 (2011).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR41\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\">41<\/a>,<a id=\"ref-link-section-d44411e843\" title=\"Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR42\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\">42<\/a><\/sup>. This is an exciting, emerging field that uses parameterization and analysis of catalysts to enable the forecast of an attainable improvement\u2014for example, the enantioselectivity of a transformation or improved turnover in a biocatalytic reaction<sup><a id=\"ref-link-section-d44411e847\" title=\"Matsuda, T. (ed.) Future Directions in Biocatalysis 2nd edn (Elsevier, 2017).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+nature%2Frss%2Fcurrent+%28Nature+-+Issue%29#ref-CR43\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">43<\/a>,<a id=\"ref-link-section-d44411e847_1\" title=\"Kan, S. B. J., Russell, D., Lewis, R. D., Chen, K. &amp; Arnold, F. H. Directed evolution of cytochrome c for carbon\u2013silicon bond formation: bringing silicon to life. Science 354, 1048\u20131051 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+nature%2Frss%2Fcurrent+%28Nature+-+Issue%29#ref-CR44\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\">44<\/a>,<a id=\"ref-link-section-d44411e850\" title=\"Arnold, F. H. Innovation by evolution: bringing new chemistry to life \u2013 Nobel lecture. Nobel Media AB 2019 \n                    https:\/\/www.nobelprize.org\/prizes\/chemistry\/2018\/arnold\/lecture\/\n\n                   (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR45\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\">45<\/a><\/sup>\u2014to provide confidence for route selection. For example, in the synthesis of letermovir<sup><a id=\"ref-link-section-d44411e854\" title=\"Mets\u00e4nen, T. T. et al. Combining traditional 2D and modern physical organic-derived descriptors to predict enhanced enantioselectivity for the key aza-Michael conjugate addition in the synthesis of Prevymis\u2122 (letermovir). Chem. Sci. 9, 6922\u20136927 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR46\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\">46<\/a><\/sup>, a series of new catalysts was identified that provided the desired product in improved enantioselectivity and facilitated faster route optimization. The models are currently limited in scope, requiring a focused solvent screen on the best-performing catalysts, and process optimization had already taken place for the desired starting material. However, these models will greatly improve with the availability of enhanced datasets, which encompass a full range of activity from diverse sources<sup><a id=\"ref-link-section-d44411e858\" title=\"Gedeck, P., Skolnik, S. &amp; Rodde, S. Developing collaborative QSAR models without sharing structures. J. Chem. Inf. Model. 57, 1847\u20131858 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR47\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 47\">47<\/a><\/sup>.<\/p>\n<p>Extending these early successes to the prediction of the impurity profile of a reaction becomes especially difficult for catalysis, because many on-cycle and off-cycle events can markedly alter the optimum yield and because impurities do not always track with conversion. The current machine-learning systems do not yet take the mechanism of byproduct formation into account. However, process chemists will need information in order to predict and understand both the fate of impurities formed during each step in the process and where impurities are removed in the overall sequence; this is necessary not only to improve performance but also, and often more importantly, to meet regulatory requirements. Almost all of this information currently resides with corporations and is elusive internally and hidden externally. The messiness of data in our broad field of organic synthesis remains a challenge, and we should seek more engagement and demand more focused attention than we have in the past 50 years<sup><a id=\"ref-link-section-d44411e865\" title=\"Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745\u2013766 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR48\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 48\">48<\/a><\/sup>.<\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/section>\n<section aria-labelledby=\"Sec6\">\n<div id=\"Sec6-section\" class=\"c-article-section js-article-section\">\n<p id=\"Sec6\" class=\"c-article-section__title js-section-title js-c-reading-companion-sections-item\"><strong>Accelerating future innovation<\/strong><\/p>\n<div id=\"Sec6-content\" class=\"c-article-section__content js-collapsible-section\">\n<p>There is a recent trend for organic chemists to publish ever larger numbers of examples in methodology papers. However, these reports remain focused on the knowledge and the dataset published in a journal article, which represents only a small portion of the raw data collected. These data have not yet been collected in a standardized manner, and highly complex substrates are often not included. In more general terms, in the 200-year history of organic synthesis, we have not yet developed methods to collect, clean and label data in a way that makes it useful for training in the context of new reaction optimization, especially in the areas of catalysis design and development. Existing datasets in the public or private domain have simply not been built with this in mind.<\/p>\n<p>We have seen that large datasets or even ultra-high-throughput experimentation are not a prerequisite to machine learning. Biopharma deals with hundreds of millions of documents\u2014including laboratory data and clinical trial reports, publications and patent filings, as well as billions of database records. Companies and not-for-profit alliances are working to provide solutions to data management. Despite the quantity of data, chemical structure information is essentially captured as an image in a book\u2014it is essentially unusable, whereas above-the-arrow and other data are currently considered out of scope and the vast amounts of historic data in paper and electronic notebooks remains orphaned. Consequently, to avoid repeating current synthetic methods in the field we need to embrace modern approaches and pay attention to future needs. This will avoid simply restating the master data problem. Metrics for similarity calculations<sup><a id=\"ref-link-section-d44411e881\" title=\"Bajusz, D., Racz, A. &amp; Heberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminf. 7, 20 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR49\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\">49<\/a><\/sup>\u00a0use the fingerprints of molecules to compare how similar they are to each other and will ensure that we avoid bias introduced by human-curated examples for machine learning. We need our data to emerge beyond the positive results and the publication- or career-driven biases. A published data point should be one click away from raw experimental data, all the way from the weighing of materials to analytical data, enabled by the Internet of Things<sup><a id=\"ref-link-section-d44411e885\" title=\"Martinot, T. Could Internet-of-Things be the next step in the evolution of chemistry. TetraScience Blog \n                    https:\/\/blog.tetrascience.com\/blog\/could-internet-of-things-be-the-next-step-in-the-evolution-of-chemistry\/\n\n                   (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR50\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\">50<\/a><\/sup>.<\/p>\n<p>Before we do that, we need to provide a framework in which to enable the collection and publication of new data as it is generated, much like the Bermuda Accord<sup><a id=\"ref-link-section-d44411e892\" title=\"Contreras, J. L. Bermuda\u2019s legacy: policy, patents, and the design of the genome commons. Minn. J. Law Sci. Technol. 12, 61\u2013125 (2011).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR51\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\">51<\/a><\/sup>. This established that all the DNA sequence information from large-scale human genomic projects should be freely available and in the public domain. With increasing exploration of new research areas that cross disciplines\u2014for example, chemical biology or proteomics\u2014it is becoming common for very different traditions towards data sharing to coexist in the same laboratory<sup><a id=\"ref-link-section-d44411e896\" title=\"Amann, R. I. et al. Toward unrestricted use of public genomic data. Science 363, 350\u2013352 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR52\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\">52<\/a><\/sup>. In the field of organic synthesis the intensity of the work, the amount of capital allocation required and the degree of specialization in data rather than \u2018art\u2019 will lead to the creation of a new kind of chemist\u2014one whose principal objective is the generation of high-quality datasets. These datasets will go on to be the foundation for a new partnership of hypothesis-driven and hypothesis-free discovery based on big data in chemistry<sup><a id=\"ref-link-section-d44411e900\" title=\"Lander, E. S. The heroes of CRISPR. Cell 164, 18\u201328 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR53\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\">53<\/a><\/sup>. This distinction exists today in biology as the number of data-generating projects advances, in which medical breakthroughs such as CRISPR often emerge from unpredictable origins. The field has adapted, and enables academic data-generating researchers to continue obtaining grant funding for their work as well as advancing their careers through publication and peer recognition. Governmental agencies and large independent global charities can clearly influence the funding of the new data-generation projects, science policy, intellectual property and regulation.<\/p>\n<p>Synthetic chemistry has emerged relatively unscathed from the narrative of poor reproducibility in science and has not yet faced a crisis of confidence<sup><a id=\"ref-link-section-d44411e907\" title=\"Baker, M. Is there a reproducibility crisis? Nature 533, 452\u2013454 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR54\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\">54<\/a><\/sup>. There have been important calls regarding reproducibility<sup><a id=\"ref-link-section-d44411e911\" title=\"Bergman, R. G. &amp; Danheiser, R. L. Reproducibility in chemical research. Angew. Chem. Int. Ed. 55, 12548\u201312549 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR55\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\">55<\/a><\/sup>\u00a0and the discussion will remain contemporary as it essential that the quality, reproducibility and traceability of the raw data and models. As in several of the machine-learning examples discussed above<sup><a id=\"ref-link-section-d44411e915\" title=\"Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. &amp; Doyle, A. G. Predicting reaction performance in C\u2013N cross-coupling using machine learning. Science 360, 186\u2013190 (2018). This article demonstrates machine learning in prediction of the performance of a catalytic reaction using data obtained via high-throughput experimentation.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR32\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\">32<\/a>,<a id=\"ref-link-section-d44411e918\" title=\"Granda, J. M., Donina, L., Dragone, V., Long, D.-L. &amp; Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377\u2013381 (2018). This article predicts the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset and, notably, the approach was also used to calculate the reactivity of published datasets.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR40\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\">40<\/a><\/sup>, the availability of reliable data and the code enables others to verify and retest alternative hypotheses. This helps to demonstrate the effect of data that is findable, accessible, interoperable and reusuable (FAIR)<sup><a id=\"ref-link-section-d44411e922\" title=\"Brock, J. \u201cA love letter to your future self\u201d: what scientists need to know about FAIR data. Nature Index \n                    https:\/\/www.natureindex.com\/news-blog\/what-scientists-need-to-know-about-fair-data\n\n                   (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR56\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\">56<\/a><\/sup>. For example, the Cronin group was able to rapidly model the data for the Suzuki\u2013Miyaura reaction presented from the Pfizer flow platform<sup><a id=\"ref-link-section-d44411e926\" title=\"Granda, J. M., Donina, L., Dragone, V., Long, D.-L. &amp; Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377\u2013381 (2018). This article predicts the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset and, notably, the approach was also used to calculate the reactivity of published datasets.\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR40\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\">40<\/a><\/sup>. As we report more complete datasets that include reactions that fail to give products in expected yield or quality, we need to be cautious. A failure may not represent a true reflection of the reactivity profile of a current method, and we need to ensure that it does not limit exploration or utilization of a newly developed reaction. In the future, machine learning will therefore need to become a partner in order to elucidate reaction concepts, elusive high-value transformations and problems of which chemists are not currently aware (unknown unknowns), as well as to rapidly identify unanticipated observations or spare events.<\/p>\n<p>It is exciting to consider the potential societal impact of innovations similar to AlphaGo Zero in the chemical space. Commercial software packages are emerging and, although it is clear that these approaches will advance in sophistication, it is not necessary for the end user to understand the underlying complexity as long as the answers satisfy their needs. Unlike in closed systems such as chess or Go, there are no clearly defined rules for winning, and explainable artificial intelligence will be an ongoing issue<sup><a id=\"ref-link-section-d44411e934\" title=\"Preece, A., Harborne, D., Braines, D., Tomsett, R. &amp; Chakraborty, S. Stakeholders in explainable AI. Preprint at \n                    https:\/\/arxiv.org\/abs\/1810.00184\n\n                   (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR57\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\">57<\/a>,<a id=\"ref-link-section-d44411e937\" title=\"Caliskan, A., Bryson, J. J. &amp; Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183\u2013186 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y#ref-CR58\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\">58<\/a><\/sup>. It remains to be seen whether the machines can become experts or merely expert tools.<\/p>\n<p>Future advances in the digitization of chemistry will not come at an equal pace, and some areas of organic synthesis will be affected much sooner than others. Computing power is no longer a limitation, and there are much more sophisticated algorithms that can handle fuzzy datasets developing in fields that have more direct monetization. Although the technology is not yet reliable enough, it is clear that the field of synthesis and optimization in applications such as medicinal and process chemistry will become a more evidence-led practice. Some organic chemists will ignore the signals of this transformation, some will improve and make incremental progress, and some will be the innovators, embracing these tools to augment their scientific intuition and creativity.<\/p>\n<\/div>\n<\/div>\n<\/section>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>(\uc6d0\ubb38: <a href=\"https:\/\/www.nature.com\/articles\/s41586-019-1288-y?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+nature%2Frss%2Fcurrent+%28Nature+-+Issue%29\">\uc5ec\uae30<\/a>\ub97c \ud074\ub9ad\ud558\uc138\uc694~)<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; &nbsp; Abstract Organic chemistry has largely been conducted in an ad hoc manner by academic laboratories that are funded by grants directed towards the<a href=\"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=3775\" class=\"more-link\">(more&#8230;)<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[34,35,29,30],"tags":[],"class_list":["post-3775","post","type-post","status-publish","format-standard","hentry","category-lets-do-chemistry","category-lets-do-computer-science","category-lets-do-science","category-recent-science-news"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":2551,"url":"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=2551","url_meta":{"origin":3775,"position":0},"title":"Synthetic innovation in drug development","author":"biochemistry","date":"January 19, 2019","format":false,"excerpt":"\u00a0 \u00a0 Chemical synthesis plays a key role in pharmaceutical research and development. Campos\u00a0et al.\u00a0review some of the advantages that have come from recent innovations in synthetic methods. In particular, they highlight small-molecule catalysts stimulated by visible light, enzymes engineered for versatility beyond their intrinsic function, and bio-orthogonal reactions to\u2026","rel":"","context":"In &quot;Let's Do Chemistry!&quot;","block_context":{"text":"Let's Do Chemistry!","link":"https:\/\/biochemistry.khu.ac.kr\/lab\/?cat=34"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2995,"url":"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=2995","url_meta":{"origin":3775,"position":1},"title":"The construction of supramolecular systems","author":"biochemistry","date":"March 29, 2019","format":false,"excerpt":"\u00a0 \u00a0 Self-assembly by intermolecular noncovalent interactions directed by self-recognition created the field of supramolecular chemistry (1). However, the word \u201cself\u201d appears to limit this field to mixing components in one assembly step where most of the complexity is inherent in the covalently synthesized reactants, rather than the result of\u2026","rel":"","context":"In &quot;Let's Do Chemistry!&quot;","block_context":{"text":"Let's Do Chemistry!","link":"https:\/\/biochemistry.khu.ac.kr\/lab\/?cat=34"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2940,"url":"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=2940","url_meta":{"origin":3775,"position":2},"title":"Charting a course for chemistry","author":"biochemistry","date":"March 23, 2019","format":false,"excerpt":"\u00a0 \u00a0 To mark the occasion of\u00a0Nature Chemistry\u00a0turning 10 years old, we asked scientists working in different areas of chemistry to tell us what they thought the most exciting, interesting or challenging aspects related to the development of their main field of research will be \u2014 here is what they\u2026","rel":"","context":"In &quot;Essays on Science&quot;","block_context":{"text":"Essays on Science","link":"https:\/\/biochemistry.khu.ac.kr\/lab\/?cat=32"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3411,"url":"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=3411","url_meta":{"origin":3775,"position":3},"title":"Automation: Chemistry shoots for the Moon","author":"biochemistry","date":"April 23, 2019","format":false,"excerpt":"\u00a0 \u00a0 A new class of chemical instrumentation seeks to alleviate the tedium and complexity of organic syntheses. \u00a0 A machine for synthesizing small molecules at the University of Illinois at Urbana\u2013Champaign relies on syringe pumps to push reagents into reaction stations.Credit: L. Brian Stauffer, Univ. Illinois \u00a0 \u00a0 In\u2026","rel":"","context":"In &quot;Let's Do Chemistry!&quot;","block_context":{"text":"Let's Do Chemistry!","link":"https:\/\/biochemistry.khu.ac.kr\/lab\/?cat=34"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4189,"url":"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=4189","url_meta":{"origin":3775,"position":4},"title":"Double-click enables synthesis of chemical libraries for drug discovery","author":"biochemistry","date":"October 6, 2019","format":false,"excerpt":"\u00a0 \u00a0 Operationally simple chemical reactions, termed click reactions, are widely used in many scientific fields. A streamlined synthesis of compounds called azides looks set to expand the role of click chemistry still further. \u00a0 \u00a0 Generating molecules and materials that have desirable functional properties is arguably the central goal\u2026","rel":"","context":"In &quot;Let's Do Chemistry!&quot;","block_context":{"text":"Let's Do Chemistry!","link":"https:\/\/biochemistry.khu.ac.kr\/lab\/?cat=34"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1126,"url":"https:\/\/biochemistry.khu.ac.kr\/lab\/?p=1126","url_meta":{"origin":3775,"position":5},"title":"Engineering multilayered cellular structures","author":"biochemistry","date":"July 17, 2018","format":false,"excerpt":"\u00a0 \u00a0 (\uc6d0\ubb38: \uc5ec\uae30\ub97c \ud074\ub9ad\ud558\uc138\uc694~) \u00a0 \u00a0 Science\u00a0\u00a013 Jul 2018: Vol. 361, Issue 6398, pp. 141-143 DOI: 10.1126\/science.361.6398.141-k \u00a0 \u00a0 \u00a0 The ability to program the manufacture of biological structures may yield new biomaterials or synthetic tissues and organs. Toda\u00a0et al.\u00a0engineered mammalian \u201csender\u201d and \u201creceiver\u201d cells with synthetic cell surface\u2026","rel":"","context":"In &quot;Let's Do Biology!&quot;","block_context":{"text":"Let's Do Biology!","link":"https:\/\/biochemistry.khu.ac.kr\/lab\/?cat=33"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":false,"jetpack_shortlink":"https:\/\/wp.me\/p9Xo1j-YT","_links":{"self":[{"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=\/wp\/v2\/posts\/3775","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3775"}],"version-history":[{"count":1,"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=\/wp\/v2\/posts\/3775\/revisions"}],"predecessor-version":[{"id":3776,"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=\/wp\/v2\/posts\/3775\/revisions\/3776"}],"wp:attachment":[{"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3775"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3775"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/biochemistry.khu.ac.kr\/lab\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3775"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}