From The Editor | November 4, 2025

OpenFold3 Released: Redefining The Limits Of Protein Prediction Models

Ray Dogum 450 450 Headshot

By Ray Dogum, Chief Editor, Drug Discovery Online

GettyImages-2207965610

AI And Drug Discovery: Betting Big On Each Other

In recent years, we’ve witnessed remarkable advancements across the drug discovery AI landscape. From de novo molecule generation to predictive toxicology, AI tools are now being embedded across the drug pipeline.  

In fact, even “traditional” LLM-based companies are heavily investing in drug discovery. For example, Anthropic recently announced Claude for Life Sciences, which allows scientists to integrate its Claude Sonnet 4.5 model with familiar drug discovery platforms such as Benchling, BioRender, PubMed, Synapse.org, and 10x Genomics.

Big pharma companies have jumped into this space with both feet as Lilly builds TuneLab, an AI factory with 1,016 NVIDIA Blackwell Ultra GPUs, and is expected to make select models accessible via a federated learning infrastructure. Federated models offer a great step toward more collaborative insight sharing but are still not quite enough to push the envelope of discovery.

Open Source The Model

While various models are plentiful, most remain locked behind proprietary walls. Their algorithms and weights are inaccessible to the broader scientific community.

This lack of transparency raises real concerns: data reproducibility, trust, and regulatory compliance among them. In a field built on scientific validation, closed systems can hinder collaboration and slow scientific progress.

Fortunately, open-source initiatives are gaining ground among academic and industry people.

These open-source AI models include:

  • Boltz-2, developed by the MIT Jameel Clinic and the public company, Recursion
  • Chai-2, developed by Chai Discovery, a San Francisco-based company which raised $70M in a Series A
  • Protenix, developed by ByteDance’s AML AI4Science
  • OpenFold3, developed by the OpenFold Consortium

The last project listed, OpenFold3, sets itself apart and offers a fully open-source protein structure prediction model.

According to its official Github repository:

“OpenFold3-preview is a biomolecular structure prediction model aiming to be a bitwise reproduction of DeepMind's AlphaFold3, developed by the AlQuraishi Lab at Columbia University and the OpenFold consortium. This research preview is intended to gather community feedback and allow developers to start building on top of the OpenFold ecosystem. The OpenFold project is committed to long-term maintenance and open source support, and our repository is freely available for academic and commercial use under the Apache 2.0 license.”

AlphaFold3, built by Google’s DeepMind, is currently accessible to academics and nonprofits for inference purposes only, with no access to the training code. OpenFold3 is already accessible via Hugging Face, a popular community for sharing AI models and datasets, and Tamarind Bio, a web-based interface for running computational biology tools. Meaning anyone can access it, including commercial biotechs. OpenFold3 is close to matching AlphaFold3’s performance.

According to the OpenFold Consortium’s recent press release, the model was “trained on more than 300,000 publicly-available, experimentally determined structures, and an OpenFold-curated synthetic database of over 13 million structures. OpenFold3 provides the underpinning to significantly accelerate in silico screening of biomolecules and serves as a foundation model for next-generation protein AI tools across drug discovery, enzyme and biosensor design, and biomaterials development.”

I recently asked the team a few questions about OpenFold3. Here’s what Woody Sherman, Ph.D., OpenFold Consortium Executive Committee Chairperson and Chief Innovation Officer at Psivant Therapeutics, told me about the new OpenFold3 release and his thoughts on overcoming some of the data-sharing challenges lingering across the drug development industry.  

Why OpenFold3

Ray: What is the most exciting thing about OpenFold3?

Woody: The possibility that democratization of AI models for drug discovery will have a profound impact on human health is quite exciting. Of course, drug discovery is a long, winding road with many potential pitfalls. Co-folding is just one part of the convoluted process. OpenFold3 can already do great things, but we also need to consider chemistry, biology, and pharmacology for AI to make a transformative impact in drug discovery. Being part of the Open Molecular Software Foundation (OMSF), where we have projects focused on each of these areas, is wonderful. For now, OpenFold3 offers the opportunity for people to participate in something big and contribute to the democratization of AI for drug discovery. OpenFold3 is a state-of-the-art foundation model built by leading academics and tested in the real-world by a diverse consortium of industry stakeholders.

Ray: Beyond protein structure and dynamic predictions, what new capabilities or integrations are on the roadmap for OpenFold3 or future versions?

Woody: Binding affinity, binding selectivity, generative design, improved atomic interactions, incorporation of water molecules, and higher accuracy in low data regimes.

On Data Sharing

Ray: How is federated learning in OpenFold3 changing the way pharma companies collaborate without compromising proprietary data?

Woody: Federated learning is currently being done by Apheris in the AI Structural Biology (AISB) Network. Model performance will be publicly available. Based on the findings, we will explore ways to extend the federated learning beyond a small number of large pharmaceutical companies.

Ray: Why are many biotechs and pharmas resistant to sharing their data, even in a privacy-preserving federated manner?

Woody: Data is gold in drug discovery, especially compound-specific data. There are good reasons for companies to be sensitive about data. Years of work creating value can be erased overnight if data is leaked. Fortunately, the federated learning paradigm does not require anyone to share data. Instead, software is run within a biotech/pharma environment and only the model outputs (e.g. weights) are brought back to the consortium. The models cannot be reverse engineered to generate the input data, keeping company data safe while improved models can be shared.

Ray: Will regulatory policy be needed to drive adoption of more transparent data sharing in drug discovery?

Woody: I don’t think regulatory policy will be the lever here. Instead, at OpenFold Consortium, we think the right path is pre-competitive collaboration. Organizations like the OpenFold Consortium allow industry and academia to pool expertise and compute, and even participate in privacy-preserving training, without exposing proprietary assets. This ensures that open, community-driven foundation models can reach the same level of quality and sophistication as closed commercial models, while still respecting proprietary data.

About OpenFold

OpenFold is a nonprofit AI research consortium of academic and industry partners whose goal is to develop free and open-source software tools for biology and drug discovery, hosted as a project of the Open Molecular Software Foundation (OMSF). Membership is encouraged among Biotech, Pharma, Synthetic Bio, Software/Tech, and non-profit research organizations.

About Woody Sherman, Ph.D.

Executive Committee Chairperson, OpenFold Consortium

Woody Sherman, Ph.D.
Woody Sherman is founder and Chief Innovation Officer at Psivant Therapeutics. Prior roles included Chief Computational Scientist at Roivant Sciences, Chief Scientific Officer at Silicon Therapeutics, and Global Head of Applications Science at Schrodinger. Woody received his B.S. in Physical Chemistry and B.A. in Creative Studies from the University of California at Santa Barbara (UCSB). He then completed his Ph.D. at MIT working in Professor Bruce Tidor’s lab, jointly between the departments of Physical Chemistry and The Computer Science and Artificial Intelligence Lab (CSAIL). Woody has published over 100 peer-reviewed papers on a broad range of topics, including molecular dynamics, quantum mechanics, free energy simulations, enhanced sampling, virtual screening, induced-fit docking, protein design, small molecule optimization, machine learning, cheminformatics, hybrid ligand/structure-based methods, pharmacophore modeling, and more. Additionally, Woody is an Adjunct Professor at the University of Massachusetts at Amherst (UMass) where he gives lectures on computational chemistry, drug design, innovation, and company creation.

Connect with Woody Sherman, Ph.D. on LinkedIn