Post methodology: Claude 3.7 via custom Dust assistant @TDep-SubstackPost with the system prompt: Please read the text of the podcast transcript in the prompt and write a short post that summarizes the main points and incorporates any recent news articles, substack posts or X posts that provide helpful context for the interview. Please make the post as concise as possible and avoid academic language or footnotes. please put any linked articles or tweets inline in the text. Please refer to Podcast guests by their first names after the initial mention. Light editing and reformatting for the Substack editor.
Patrick Hsu, co-founder of the Arc Institute and Berkeley professor of bioengineering, brings a refreshing perspective on AI and biology that extends far beyond the typical focus on drug discovery. In this conversation with Sequoia Capital's Josephine Chen and Pat Grady, Patrick explains how his team's groundbreaking Evo 2 model is enabling a fundamental shift in our understanding of biological systems.
Beyond Drug Discovery: AI for Understanding Life Itself
While most discussions about AI in biology center on accelerating drug development, Patrick emphasizes a broader vision: using AI to understand and program fundamental biological systems. As he puts it, "ML for bio is not just drug design." Evolution serves as the unifying theory for biology, acting across all scales from entire planets to individual molecules, and Evo 2 leverages this evolutionary data to build a foundation model for biology.
Released in February 2025, Evo 2 is the largest AI model for biology to date, trained on over 9.3 trillion nucleotides from more than 128,000 genomes across all domains of life. The model can process genetic sequences up to 1 million nucleotides at once, enabling it to understand relationships between distant parts of a genome that would take researchers years to uncover experimentally.
Reading and Writing the Code of Life
One of Evo 2's most impressive capabilities is accurately predicting which genetic mutations might cause disease. When tasked with identifying potentially pathogenic variants in the breast cancer-associated BRCA1 gene, Evo 2 achieved over 90% accuracy compared to lab-validated results. As Patrick explains, this addresses a critical challenge in genomics: interpreting "variants of unknown significance" that leave doctors and patients uncertain about their health implications.
Beyond analysis, Evo 2 can also generate DNA sequences as long as simple bacterial genomes. This generative capability points toward a future where scientists might design entirely new biological systems from scratch.
An App Store for Biology
The Arc Institute has made Evo 2 fully open source, releasing its code, training data, and model weights to the scientific community. Patrick envisions this as creating "an app store for biology" where researchers can build specialized applications on top of this foundation model.
"Evo 2 has a generalist understanding of the tree of life that's useful for a multitude of tasks," Patrick notes. "From predicting disease-causing mutations to designing potential code for artificial life."
This approach is already gaining traction. In April 2025, Dario Amodei cited the Evo 2 paper in his essay "The Urgency of Interpretability," highlighting the model's importance in advancing AI transparency. The Arc Institute has also collaborated with AI research lab Goodfire to develop a mechanistic interpretability visualizer that reveals the biological features Evo 2 has learned to recognize.
JUMP TO: The App Store for Biology
Integrating Different Data Streams for Health
Looking to the future, Patrick describes a vision where AI models can integrate multiple data streams—from genetics to real-time biomarkers—to provide personalized health predictions and interventions. He points to fascinating research at Arc Institute on interoception—how the body communicates with the brain and vice versa—as an example of the cross-functional insights AI could help uncover.
"We have very fragmented data sets today," Patrick explains, "but being able to collect this data at scale across populations and over time with temporal resolution will transform how we understand health."
The Path Forward
While drug discovery will remain important, Patrick vision extends to a world where AI enables us to program biology itself—potentially altering everything from disease susceptibility to how our bodies and brains interact. As he puts it, "I think building the sort of PDB of virtual cells is something that we've been focusing a lot on at Arc."
By 2030, Patrick predicts we'll have "accurate and useful virtual cell models that make a cell biologist feel emotion," fundamentally changing how scientists approach biological research and potentially revolutionizing our approach to medicine and human health.
Hosted by Josephine Chen and Pat Grady, Sequoia Capital
BONUS ESSAY | AlphaFold vs. Evo: Two AI Paradigms Tackle Biology
Mentioned in this episode:
Sequence modeling and design from molecular to genome scale with Evo: Public pre-print of original Evo paper
Genome modeling and design across all domains of life with Evo 2: Public pre-print of Evo 2 paper
ClinVar: NIH database of the genes that are known to cause disease, and mutations in those genes causally associated with disease state
Sequence Read Archive: Massive NIH database of gene sequencing data
Machines of Loving Grace: Daria Amodei essay that Patrick cites on how AI could transform the world for the better
Arc Virtual Cell Atlas: Arc’s first step toward assembling, curating and generating large-scale cellular data from AI-driven biological discovery (among many other tools)
Protein Data Bank (PDB): a global archive of 3D structural information of biomolecules used by DeepMind to train AlphaFold
OpenAI Deep Research: The one AI app Patrick uses daily
Share this post