AI startup EvolutionaryScale has released ESM3, a 98B-parameter generative LLM for “programming biology”.
The company is focused on proteomics, the study of the interactions, function, composition, and structures of proteins and their cellular activities.
While multimodal models like GPT-4 can generate text or images, ESM3 is an AI tool for prototyping and creating new proteins.
When a ribosome creates a protein, it uses mRNA which carries the code for making a specific protein.
Every living organism shares the same genetic code across the same 20 amino acids. If you could read and understand that code you could program the ribosome to make a protein on demand.
EvolutionaryScale says ESM3 “understands all of this biological data, translates it, and speaks it fluently to be used as a generative tool.”
Instead of a painstaking and expensive process of trial and error in a lab, ESM3 can predict the shape and function of a protein in a simulation.
We have trained ESM3 and we’re excited to introduce EvolutionaryScale.
ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins.
Read more: https://t.co/iAC3lkj0iV pic.twitter.com/AhWtC4vxlF
— Alex Rives (@alexrives) June 25, 2024
ESM3 is trained across billions of proteins found in nature. One of the biggest challenges in creating the model was to tokenize the three-dimensional protein structure and its functions.
This required the development of a way to write every three-dimensional structure and function as a sequence of letters using discrete alphabets.
Once trained on billions of proteins, ESM3 speaks the language of nature fluently and can reason over the sequence, structure, and function of proteins.
As a demonstration of ESM3’s abilities, EvolutionaryScale used it to generate a novel green fluorescent protein (GFP). GFPs are responsible for the beautiful fluorescence we see in some lifeforms like jellyfish or corals.
GFPs are incredibly rare in nature. The company estimates that the novel protein it calls esmGFP “represents an equivalent of over 500 million years of natural evolution performed by an evolutionary simulator.”
EvolutionaryScale is making the ESM3 model openly available and hopes it will “allow scientists to explore the frontiers of protein design and synthetic biology, and invent new solutions for some of the most important problems facing our world.”
The dual-use and open-source nature of a tool like ESM3 raises potential risks that the company says it will mitigate with its Responsible Development Framework.
Using AI to program biology predictably could lead to proteins that capture carbon, consume stubborn pollutants like plastics, or new medicines.
AI advancements in tools like ESM3, AlphaFold, and CRISPR may soon lead to the eradication of diseases and environmental problems that have challenged scientists for decades.