In a breakthrough study, MIT and Tufts University researchers have developed a machine learning (ML) method to accelerate the drug discovery process.
There are vast drug libraries containing billions of different compounds that could effectively treat everything from cancer to heart disease. The question is, how do we find them?
Drug discovery has traditionally been a labor-intensive process, with scientists having to test each potential compound against all possible targets – a time-consuming and costly endeavor.
To solve this problem, researchers have started using computational methods to screen drug compound libraries. However, this approach still requires significant time as it involves calculating each target protein’s three-dimensional structure based on its amino-acid sequence.
However, the team at MIT and Tufts have devised a new approach based on a large language model (LLM), which is the type of model that powers AI like ChatGPT. The model analyzes vast amounts of data to determine which amino acids are likely to match together, akin to how language models such as ChatGPT analyze huge volumes of text to determine which words match together.
This novel model, named ConPLex, matches target proteins with potential drug molecules without calculating molecule structures. The model enabled researchers to screen over 100 million compounds in a day.
Bonnie Berger, the Simons Professor of Mathematics and head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), said, “This work addresses the need for efficient and accurate in silico screening of potential drug candidates, and the scalability of the model enables large-scale screens for assessing off-target effects, drug repurposing, and determining the impact of mutations on drug binding.”
AI streamlines drug screening and development
A key hurdle in the field has been the tendency of existing models to fail to discard ‘decoy’ compounds. These compounds are similar to successful drugs but don’t interact well with the target. To rule out decoys, the team incorporated a training stage that helped the model distinguish between real drugs and imposters.
The researchers tested the ConPLex model by screening a library of about 4,700 candidate drug molecules against a set of 51 enzymes known as protein kinases.
After experimentally testing 19 of the most promising drug-protein pairs, they found 12 with strong binding affinity with the target.
While this study focused mainly on small-molecule drugs, the team is exploring how to apply this approach to other types of drugs, such as therapeutic antibodies.
The model could also run toxicity screens on potential drug compounds, ensuring they don’t produce unwanted side effects before testing in animal models.
Rohit Singh, a CSAIL research scientist, states, “Part of the reason why drug discovery is so expensive is because it has high failure rates. If we can reduce those failure rates by saying upfront that this drug is not likely to work out, that could go a long way in lowering the cost of drug discovery.”
Eytan Ruppin, chief of the Cancer Data Science Laboratory at the National Cancer Institute, lauds the approach as a “significant breakthrough in drug-target interaction prediction.”
In late May, another research team involving MIT built an antibiotic screening model that worked similarly to this one.
The team used machine learning to explore thousands of existing drugs, locating one particular compound that was effective against antibiotic-resistant superbugs.