EvolutionaryScale’s ESM3: a generative model for biology

June 26, 2024

  • AI-for-proteomics startup EvolutionaryScale released ESM3, a frontier generative model for biology
  • ESM3 is a generative language model for programming biology and creating new proteins
  • ESM3 can follow prompts to generate new proteins with specific structures and functions

AI startup EvolutionaryScale has released ESM3, a 98B-parameter generative LLM for “programming biology”.

The company is focused on proteomics, the study of the interactions, function, composition, and structures of proteins and their cellular activities.

While multimodal models like GPT-4 can generate text or images, ESM3 is an AI tool for prototyping and creating new proteins.

When a ribosome creates a protein, it uses mRNA which carries the code for making a specific protein.

Every living organism shares the same genetic code across the same 20 amino acids. If you could read and understand that code you could program the ribosome to make a protein on demand.

EvolutionaryScale says ESM3 “understands all of this biological data, translates it, and speaks it fluently to be used as a generative tool.”

Instead of a painstaking and expensive process of trial and error in a lab, ESM3 can predict the shape and function of a protein in a simulation.

ESM3 is trained across billions of proteins found in nature. One of the biggest challenges in creating the model was to tokenize the three-dimensional protein structure and its functions.

This required the development of a way to write every three-dimensional structure and function as a sequence of letters using discrete alphabets.

Once trained on billions of proteins, ESM3 speaks the language of nature fluently and can reason over the sequence, structure, and function of proteins.

As a demonstration of ESM3’s abilities, EvolutionaryScale used it to generate a novel green fluorescent protein (GFP). GFPs are responsible for the beautiful fluorescence we see in some lifeforms like jellyfish or corals.

A rendering of esmGFP, a new green fluorescent protein generated by ESM3. Source: EvolutionaryScale

GFPs are incredibly rare in nature. The company estimates that the novel protein it calls esmGFP “represents an equivalent of over 500 million years of natural evolution performed by an evolutionary simulator.”

EvolutionaryScale is making the ESM3 model openly available and hopes it will “allow scientists to explore the frontiers of protein design and synthetic biology, and invent new solutions for some of the most important problems facing our world.”

The dual-use and open-source nature of a tool like ESM3 raises potential risks that the company says it will mitigate with its Responsible Development Framework.

Using AI to program biology predictably could lead to proteins that capture carbon, consume stubborn pollutants like plastics, or new medicines.

AI advancements in tools like ESM3, AlphaFold, and CRISPR may soon lead to the eradication of diseases and environmental problems that have challenged scientists for decades.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions