OpenAI’s Protein Engineering Model Aims to Advance Longevity Research

OpenAI : OpenAI's Protein Engineering Model Aims to Advance Longevity Research

OpenAI has developed a language model to aid in the creation of new proteins. This model is designed to “dream up” proteins capable of transforming regular cells into stem cells, surpassing human ability in this task. This marks OpenAI’s first venture into biological data, and it is the first time the company publicly claims its models can yield new scientific results. This step is considered significant in the journey towards achieving Artificial General Intelligence (AGI).

OpenAI’s project began a year ago when Retro Biosciences, a company focused on longevity research, approached OpenAI for collaboration. Sam Altman, OpenAI’s CEO, personally invested $180 million in Retro. Retro’s goal is to extend human lifespan by at least ten years by studying Yamanaka factors, a set of proteins that can turn a human skin cell into a stem cell capable of producing any tissue in the body.

This cellular reprogramming is seen as a potential starting point for rejuvenating animals, constructing human organs, or providing replacement cells. However, the process is inefficient, taking weeks, with less than one percent of cells completing the rejuvenation journey. OpenAI’s new model, GPT-4b micro, was trained to suggest improvements to the protein factors, reportedly enhancing their effectiveness by over 50 times according to preliminary measurements.

John Hallman, an OpenAI researcher, stated that the proteins appear superior to those developed by scientists. The model’s results, however, remain to be verified by external scientists upon publication. The model is currently a custom demonstrator and not available for broader use.

The project aims to demonstrate OpenAI’s commitment to scientific contribution. Whether these capabilities will be released as a separate model or integrated into main models is yet to be decided. Unlike Google’s Alphafold, which predicts protein structures, OpenAI’s model uses large language models for the unstructured Yamanaka factors.

The model was trained on protein sequences from various species and information on protein interactions. Despite being large datasets, they are smaller than those used for OpenAI’s flagship chatbots, making GPT-4b micro an example of a “small language model” with a targeted dataset.

Retro scientists initially used the model to propose redesigns for Yamanaka proteins. The model often suggests changing a third of the amino acids in the proteins. Joe Betts-Lacroix, CEO of Retro, believes the model’s ideas often lead to improvements over original Yamanaka factors.

Vadim Gladyshev, an aging researcher at Harvard University advising Retro, sees the need for better stem cell creation methods. He notes that while skin cells are easily reprogrammed, other cells are not, and results can vary significantly across species.

Understanding how GPT-4b generates its suggestions is still a work in progress. The model’s behavior is compared to AlphaGo, which defeated the best human Go players, though it took time to understand why. OpenAI and Retro are still uncovering the model’s potential applications.

No money was exchanged in the collaboration, but the work could benefit Retro, raising potential criticisms due to Altman’s investment. Altman’s investments in tech startups have been described as creating potential conflicts, as some companies collaborate with OpenAI. OpenAI insists Altman is not directly involved in these projects and that company decisions are independent of his other investments.