DeepMind’s New AI Can Predict Genetic Diseases

About 10 years ago, Žiga Avsec was a PhD physics student who found himself taking a crash course in genomics via a university module on machine learning. He was soon working in a lab that studied rare diseases, on a project aiming to pin down the exact genetic mutation that caused an unusual mitochondrial disease.

Avsec describes this as a challenging problem, like finding a needle in a haystack. There were numerous possibilities in the genetic code, with DNA mutations that could cause significant harm to a person’s biology. The focus was on missense variants, which are alterations in a single letter of the genetic code that lead to the production of a different amino acid in a protein. Amino acids are essential for building proteins, which are crucial for various bodily functions. Therefore, even minor changes can have significant and widespread consequences.

There are 71 million possible missense variants in the human genome, and the average person carries more than 9,000 of them. Most are harmless, but some have been implicated in genetic diseases such as sickle cell anemia and cystic fibrosis, as well as more complex conditions like type 2 diabetes, which may be caused by a combination of small genetic changes. Avsec started asking his colleagues: “How do we know which ones are actually dangerous?” The answer: “Well largely, we don’t.”

Out of the 4 million missense variants observed in humans, only a small fraction of 2 percent have been classified as either harmful or harmless after extensive and costly research efforts. Analyzing the impact of a single missense variant can be a time-consuming process that may span several months.

Today, a tool called AlphaMissense has been released by Google DeepMind, where Avsec currently works as a staff research scientist. This tool can greatly speed up the process of analyzing missense variants and accurately predict the probability of them causing a disease, surpassing the accuracy of existing tools with a rate of 90 percent.

It’s built on AlphaFold, DeepMind’s groundbreaking model that predicted the structures of hundreds of millions proteins from their amino acid composition, but it doesn’t work in the same way. Instead of making predictions about the structure of a protein, AlphaMissense operates more like a large language model such as OpenAI’s ChatGPT.

It has been trained on the language of human (and primate) biology, so it knows what normal sequences of amino acids in proteins should look like. When it’s presented with a sequence gone awry, it can take note, as with an incongruous word in a sentence. “It’s a language model but trained on protein sequences,” says Jun Cheng, who, with Avsec, is co-lead author of a paper published today in Science that announces AlphaMissense to the world. “If we substitute a word from an English sentence, a person who is familiar with English can immediately see whether these substitutions will change the meaning of the sentence or not.”

Pushmeet Kohli, DeepMind’s vice president of research, uses the analogy of a recipe book. If AlphaFold was concerned with exactly how ingredients might bind together, AlphaMissense predicts what might happen if you use the wrong ingredient entirely.