AI Evo: Advancing Genome Design and Biological Research

Scientists from the Arc Institute, Stanford University, and the University of California have developed an artificial intelligence named Evo, which could accelerate biological research. Evo can independently design new genomes. The AI operates on the principle of large language models, like ChatGPT, which sequence text blocks to provide answers. Instead of text, Evo combines DNA, RNA, and protein blocks to create nucleotides or even new genomes. This is based on training with 2.7 million genome datasets, mainly from bacteria and viruses that infect bacteria.

Patrick Hsu, a Stanford Assistant Professor, explains, “Evo deciphers patterns written in DNA over billions of years of evolution. Just as generative AI has revolutionized our interaction with texts, audio, and video, these creative possibilities can now be applied to the fundamental codes of life.”

To test the AI, researchers tasked Evo with creating a CRISPR system, which can cut DNA strands at specific points to insert new DNA or modify strands. CRISPR systems consist of proteins and RNA. Typically, new CRISPR systems are discovered by searching natural systems. However, Evo proposed several new systems, and by the eleventh attempt, researchers had a successful result. “EvoCas9-1” shares only about 73% of its structure with CRISPR-Cas9 and is otherwise significantly different but can perform similar DNA strand cutting activities.

Hsu emphasizes, “Creating functional CRISPR systems requires complex coordination of proteins and RNA. Evo’s ability to harmonize these components and make them work efficiently shows a new level of sophistication in biological engineering tools.”

Furthermore, Evo was used to create a DNA sequence capable of moving within genomes. A complex group of these jumping genes is known as IS200/IS605 transposons. Despite the challenge, Evo managed to create a new set of transposons distinct from any found in nature.

Evo can currently design genomes with more than a million base pairs. Researchers aim to significantly increase this output to better understand multicellular organisms. Hsu states, “We are working towards creating a new field of genome design, where we perform cellular groundwork and potentially create new organisms.”

Another goal is to reduce and eventually eliminate hallucinations produced by Evo. These inaccuracies, common in many AIs, manifest in Evo as non-functional CRISPR systems.