Machine Learning Is Causing a ‘Reproducibility Crisis’ in Science

Yet Kapoor and Narayanan warn that AI’s impact on scientific research has been less than stellar in many instances. When the pair surveyed areas of science where machine learning was applied, they found that other researchers had identified errors in 329 studies that relied on machine learning, across a range of fields.

Kapoor says that many researchers are rushing to use machine learning without a comprehensive understanding of its techniques and their limitations. Dabbling with the technology has become much easier, in part because the tech industry has rushed to offer AI tools and tutorials designed to lure newcomers, often with the goal of promoting cloud platforms and services. “The idea that you can take a four-hour online course and then use machine learning in your scientific research has become so overblown,” Kapoor says. “People have not stopped to think about where things can potentially go wrong.”

Excitement around AI’s potential has prompted some scientists to bet heavily on its use in research. Tonio Buonassisi, a professor at MIT who researches novel solar cells, uses AI extensively to explore novel materials. He says that while it is easy to make mistakes, machine learning is a powerful tool that should not be abandoned. Errors can often be ironed out, he says, if scientists from different fields develop and share best practices. “You don’t need to be a card-carrying machine-learning expert to do these things right,” he says.

Kapoor and Narayanan organized a workshop late last month to draw attention to what they call a “reproducibility crisis” in science that makes use of machine learning. They were hoping for 30 or so attendees but received registrations from over 1,500 people, a surprise that they say suggests issues with machine learning in science are widespread.

During the event, invited speakers recounted numerous examples of situations where AI had been misused, from fields including medicine and social science. Michael Roberts, a senior research associate at Cambridge University, discussed problems with dozens of papers claiming to use machine learning to fight Covid-19, including cases where data was skewed because it came from a variety of different imaging machines. Jessica Hullman, an associate professor at Northwestern University, compared problems with studies using machine learning to the phenomenon of major results in psychology proving impossible to replicate. In both cases, Hullman says, researchers are prone to using too little data, and misreading the statistical significance of results.

Momin Malik, a data scientist at the Mayo Clinic, was invited to speak about his own work tracking down problematic uses of machine learning in science. Besides common errors in implementation of the technique, he says, researchers sometimes apply machine learning when it is the wrong tool for the job.

Malik points to a prominent example of machine learning producing misleading results: Google Flu Trends, a tool developed by the search company in 2008 that aimed to use machine learning to identify flu outbreaks more quickly from logs of search queries typed by web users. Google won positive publicity for the project, but it failed spectacularly to predict the course of the 2013 flu season. An independent study would later conclude that the model had latched onto seasonal terms that have nothing to do with the prevalence of influenza. “You couldn’t just throw it all into a big machine-learning model and see what comes out,” Malik says.