Bringing machine learning to the masses

A machine learning tool called Northstar lets users play with data visually.

PHOTO: MELANIE GONICK

Yang-Hui He, a mathematical physicist at the University of London, is an expert in string theory, one of the most abstruse areas of physics. But when it comes to artificial intelligence (AI) and machine learning, he was naïve. “What is this thing everyone is talking about?” he recalls thinking. Then his go-to software program, Mathematica, added machine learning tools that were ready to use, no expertise required. He began to play around, and realized AI might help him choose the plausible geometries for the countless multidimensional models of the universe that string theory proposes.

In a 2017 paper, He showed that, with just a few extra lines of code, he could enlist the off-the-shelf AI to greatly speed up his calculations. “I don’t have to get down to the nitty gritty,” He says. Now, He says he is “on a crusade” to get mathematicians and physicists to use machine learning, and gives about 20 talks a year on the power of these new user-friendly versions.

AI used to be the specialized domain of data scientists and computer programmers. But companies such as Wolfram Research, which makes Mathematica, are trying to democratize the field, so scientists without AI skills can harness the technology for recognizing patterns in big data. In some cases, they don’t need to code at all. Insights are just a drag-and-drop away. Computational power is no longer much of a limiting factor in science, says Juliana Freire, a computer scientist at New York University in New York City who is developing a ready-to-use AI tool with funding from the Defense Advanced Research Projects Agency (DARPA). “To a large extent, the bottleneck to scientific discoveries now lies with people.”

One of the latest systems is software called Ludwig, first made open-source by Uber in February and updated last week. Uber used Ludwig for projects such as predicting food delivery times before releasing it publicly. Ludwig can train itself when fed two files: a spreadsheet with the training data and a file specifying which columns are the inputs and outputs. Once it learns to recognize associations, the software can process new data to label images, answer questions, or make numerical estimates.

At least a dozen startups are using it, plus big companies such as Apple, IBM, and Nvidia, says Piero Molino, Ludwig’s lead developer at Uber AI Labs in San Francisco, California. Scientists are using it to analyze images from telescopes and microscopes. Tobias Boothe, a biologist at the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden, Germany, uses it to visually distinguish the thousands of species of flatworms, a difficult task even for experts. To train Ludwig, he just uploads images and labels. “Just to get something started and get a result was superstraightforward,” he says.

The AI tools are more than mere toys for nonprogrammers, says Tim Kraska, a computer scientist at the Massachusetts Institute of Technology in Cambridge who leads Northstar, a machine learning tool supported by the $80 million DARPA program called Data-Driven Discovery of Models. Wade Shen, who leads the DARPA program, says the tools can outperform data scientists at building models, and they’re even better with a subject matter expert in the loop.

In a demo for Science, Kraska showed how easy it was to use Northstar’s drag-and-drop interface for a serious problem. He loaded a freely available database of 60,000 critical care patients that includes details on their demographics, lab tests, and medications. In a couple of clicks, Kraska created several heart failure prediction models, which quickly identified risk factors for the condition. One model fingered ischemia—a poor blood supply to the heart—which doctors know is often codiagnosed with heart failure. That was “almost like cheating,” Kraska said, so he dragged ischemia off the list of inputs and the models immediately began to retrain to look for other predictive factors.

Maciej Baranski, a physicist at the Singapore-MIT Alliance for Research & Technology Centre, says the group plans to use Northstar to explore cell therapies for fighting cancer or replacing damaged cartilage. The system will help biologists combine the optical, genetic, and chemical data they’ve collected from cells to predict their behavior.

The Wolfram computer language, which powers Mathematica, does require some coding to tap into its machine learning tools, but it makes thousands of complex functions available through simple languagelike commands. In a demo for Science, Jon McLoone, a strategist at Wolfram in Oxford, U.K., trained his computer’s camera to recognize when he was forming a rock, paper, or scissors with his hand—without specifying what type of algorithm to use. In this case, it chose a neural network—an algorithm made of interconnected layers inspired by the brain.

Marco Thiel, an applied mathematician at the University of Aberdeen in the United Kingdom, was so enamored with the program that he used it to train an algorithm to distinguish cats from toddlers, and connected the software to a camera and a garden sprinkler. Now, the sprinkler soaks the neighbor’s pet when it intrudes—but isn’t triggered by his own daughter. He also works with drug companies that are sifting through patient data in search of early signs of dementia or the triggers of epileptic seizures. To search for telltale patterns, Thiel feeds Mathematica patient data from home cameras, appliances, and wearable devices like Fitbits.

The trend toward off-the-shelf AI has risks. Machine learning algorithms are often called black boxes, their inner workings shrouded in mystery, and the prepackaged versions can be even more opaque. Novices who don’t bother to look under the hood might not recognize problems with their data sets or models, leading to overconfidence in biased or inaccurate results.

But Kraska says Northstar has a safeguard against misuse: more AI. It includes a module that anticipates and counteracts typical rookie mistakes, such as assuming any pattern an algorithm finds is statistically significant. “In the end it actually tries to mimic what a data scientist would do,” he says.

(원문: 여기를 클릭하세요~)

Related

Leave a Reply Cancel reply