Guest Column | October 14, 2025

Why Drug Discovery Needs Epistemological Wrapping For Generative AI

By Arvind Rao, Ph.D., University of Michigan-Ann Arbor

Science technology, Research and Development, Drug discovery-GettyImages-1209661917

Generative AI has arrived in drug discovery with remarkable speed. Over the past two years, teams across target identification, medicinal chemistry, and preclinical development have begun deploying AI tools to accelerate traditionally time-intensive processes. Mining multi-omics data sets for novel targets; predicting compound absorption, distribution, metabolism, excretion (ADME) properties; designing animal study protocols; and conducting virtual screens of millions of compounds can now be accomplished in hours rather than weeks or months.

The appeal is clear: faster decision cycles, reduced experimental burden, and more hypotheses tested per dollar spent. Yet in drug discovery, speed without rigor can be catastrophic.

In my work with discovery organizations, I have observed teams invest heavily in AI-generated targets that appeared compelling until independent validation revealed they were artifacts of training data biases. I have seen preclinical protocols that initially passed internal review but collapsed under the Institutional Animal Care and Use Committee (IACUC) scrutiny due to missing controls or flawed randomization schemes. Also, there have been instances of IND submissions triggering extensive FDA questioning because the provenance of AI-derived safety predictions could not be adequately documented.

The risks are concrete and consequential:

Regulatory challenges when FDA reviewers demand reproducibility documentation that does not exist
Capital inefficiency when targets fail validation because they cannot be reproduced outside narrow training data sets
Failed preclinical studies due to experimental design flaws that were not apparent in AI-generated protocols
Overlooked safety signals when toxicology predictions fail to account for metabolites, species differences, or chemical space limitations
Eroded credibility with governance committees that have learned to question AI outputs lacking proper validation

For discovery teams making decisions worth tens or hundreds of millions of dollars, unreliable AI outputs represent more than inconvenience. They represent program-level risk.

Guardrails Are Necessary But Insufficient

Most commercial generative AI platforms emphasize their "guardrails" — mechanisms designed to reduce hallucinations, provide citation links, and prevent obviously erroneous outputs. These features are valuable but fundamentally limited in scope.

Guardrails prevent AI from generating nonsense. They do not, however, address the substantive questions that determine whether an AI output can support critical discovery decisions:

What is the provenance of this target hypothesis? Which data sets were used? What assumptions were made?
Can this ADME prediction be independently reproduced?
What experimental evidence would falsify this proposed mechanism of action?
Does this preclinical protocol genuinely comply with Animals in Research: Reporting In Vivo Experiments (ARRIVE) guidelines and NIH rigor standards, or does it merely "appear" to comply?

In other words, guardrails establish minimum quality thresholds. They do not ensure scientific credibility, regulatory defensibility, or reproducibility.

Introducing Epistemological Wrapping

What is needed is a more comprehensive framework. I call this approach epistemological wrapping (a term I coined to describe this methodology).

Epistemological wrapping is a structured framework that surrounds generative AI outputs with the foundational elements of credible science: explicit provenance, testable falsifiability, demonstrated reproducibility, systematic bias assessment, and documented accountability. It addresses specific, high-stakes applications in drug discovery and preclinical development.

Rather than asking "Has the AI generated a useful output?" wrapping asks "Can this output withstand independent validation, peer review, and regulatory scrutiny?"

An epistemology wrapper does not simply reformat AI-generated text. It requires every AI-assisted output — whether a target hypothesis, lead optimization recommendation, or preclinical protocol — to include:

Provenance documentation: explicit identification of source data sets, model versions, training data characteristics, and analytical methods
Falsifiability criteria: clearly defined experimental conditions or observations that would disprove the claim or hypothesis
Bias assessments: systematic evaluation of potential gaps in training data, over-represented populations or chemical classes, and methodological limitations
Reproducibility specifications: sufficient technical detail (data set identifiers, parameter settings, software versions, pipeline specifications) to enable independent replication
Decision frameworks: formal go/revise/no-go decision points before AI-assisted outputs advance to subsequent stages

The objective is to transform generative AI from a tool that produces rapid but fragile outputs into one that generates decision-grade, audit-ready analyses.

Target Identification And Validation

When AI proposes candidate targets based on genomic or transcriptomic analyses, wrappers ensure these targets are linked to validated data sets, tested across independent cohorts, and evaluated for druggability. They surface potential issues, such as reliance on single-data modalities, absence of orthogonal validation, or prior clinical failures in the same target class. This prevents substantial capital investment in targets that will not survive rigorous validation.

Hit-To-Lead And Lead Optimization

AI-driven predictions of compound properties (solubility, permeability, metabolic stability, protein binding) are only reliable within the chemical space where the model was trained. Wrappers make these boundaries explicit. They document the applicability domain of predictions, flag when AI recommendations extrapolate beyond validated chemical space, and identify when structural modifications are based on limited or biased training data. This protects medicinal chemistry teams from pursuing optimization strategies that appear sound but rest on unreliable predictions.

ADME And Toxicology Predictions

Toxicology and pharmacokinetic predictions carry significant consequences for program success. When AI predicts favorable safety or ADME profiles, wrappers require documentation of training data composition (species representation, route of administration coverage, metabolite prediction capabilities), identification of chemical space limitations, and assessment of prediction confidence intervals. They prevent overreliance on predictions that may not account for critical species differences, metabolic pathways, or rare adverse outcome pathways.

Preclinical Protocol Design

AI-generated animal study protocols may appear rigorous while containing subtle but critical flaws — inadequate randomization, missing vehicle controls, inappropriate power calculations, or failure to account for sex as a biological variable. Wrappers enforce compliance with ARRIVE guidelines, NIH rigor and reproducibility standards, and International Council for Harmonization (ICH) guidance by systematically checking for common design deficiencies. This prevents costly study failures and ensures protocols will withstand IACUC review and ultimately support regulatory submissions.

Biomarker Discovery And Validation

Early-stage biomarker candidates identified through AI analysis require careful validation before they can support development decisions. Wrappers ensure proposed biomarkers are tested for reproducibility across independent sample sets, validated using orthogonal assay platforms, and evaluated for potential confounding variables. They prevent premature commitment to biomarker strategies that will not survive the validation process or regulatory review.

IND-Enabling Documentation

When AI contributes to nonclinical safety assessments, pharmacology summaries, or IND documents, regulatory reviewers will require detailed documentation of AI methods, data sources, and validation approaches. Wrappers create audit-ready submissions by ensuring full traceability of AI contributions, comprehensive documentation of underlying data and methods, and explicit disclosure of model limitations. They prevent the scenario where FDA information requests cannot be adequately addressed because critical provenance information was not captured during analysis.

Why Wrapping Creates Competitive Advantage

Generative AI adoption in drug discovery is inevitable. The strategic question is which organizations will implement epistemological wrapping as a core capability rather than retrofitting it under pressure?

Organizations that systematically wrap their AI outputs will achieve faster progression to IND with more defensible nonclinical packages. They will make higher quality target selection and lead prioritization decisions. They will experience fewer late-stage failures attributable to nonreproducible early findings. They will establish credibility with regulatory agencies, governance committees, and investment partners.

Organizations that fail to implement wrapping will face a different trajectory: scrambling to add rigor to AI workflows after expensive failures have occurred, encountering skepticism from reviewers who have observed too many AI-related program setbacks, hitting reproducibility barriers that terminate otherwise promising programs, and managing regulatory interactions complicated by inadequate documentation.

This creates a genuine competitive advantage. Discovery organizations that can credibly state "every AI-assisted output meets defined epistemic standards and is audit-ready" will differentiate themselves in speed, capital efficiency, and regulatory success rates.

The pharmaceutical industry has demonstrated this pattern repeatedly. Good Laboratory Practice (GLP) began as voluntary best practice. ARRIVE guidelines for animal research started as recommendations. FAIR data principles were initially aspirational. Each eventually became mandatory, and organizations that adopted these frameworks early gained substantial advantages over those forced to retrofit compliance under regulatory or institutional pressure.

Epistemological wrapping is following the same trajectory. It will likely become an expected standard for AI use in drug development. Early adopters will shape those standards and establish benchmarks. Late adopters will implement them reactively, at higher cost and under greater pressure.

An Enduring, Renewable Capability

Epistemological wrapping is not a static framework. Generative AI capabilities evolve rapidly — new foundation models for protein structure prediction, new agentic workflows for experimental design, and new regulatory guidance for AI in drug development appear continuously. Similarly, new failure modes and risk patterns emerge as AI is applied to novel problems.

This requires that wrappers be updated regularly, analogous to recurring GLP training or ICH guideline updates. Each year brings new technical capabilities, new regulatory expectations, and new lessons from AI failures in the field. Wrappers must incorporate these developments to remain effective.

This creates a sustainable, renewable capability rather than a one-time implementation. Organizations that view epistemological wrapping as an ongoing integrity framework — regularly updated to align with evolving technology and regulatory standards — will maintain their competitive position as AI capabilities and requirements continue to advance.

Final Thoughts

Drug discovery depends fundamentally on reproducibility. Internal review committees require it. Regulatory agencies demand it. Investors assess programs based on it. Generative AI offers compelling advantages in speed and analytical scale, but speed without epistemic integrity represents liability rather than capability.

Epistemological wrapping ensures that AI-assisted outputs meet the standards of reproducible, bias-aware, falsifiable, and audit-ready science. It transforms generative AI from a risky accelerant into a trustworthy partner in the high-stakes decisions that define successful discovery programs.

The question facing discovery organizations is not whether to adopt epistemological wrapping, but whether to lead in establishing epistemic standards for AI or to follow under regulatory and competitive pressure.

This article was co-prepared and co-edited with LLM tools.

About The Author:

Arvind Rao is a professor in the Department of Computational Medicine and Bioinformatics at the University of Michigan and Department of Radiation Oncology at the Michigan Institute for Data and AI in Society. His group develops AI and image analysis methods to link cellular, tissue, and radiology phenotypes with genetic data. His work spans radiogenomics, drug repurposing, and spatial profiling in tissues and transcriptomics. Beyond research, he advises pharma, healthcare organizations, and biotech leaders through education and consulting, helping teams navigate AI adoption, readiness, and strategy. He looks to blend technical rigor and practical frameworks to align emerging AI tools with drug discovery and patient stratification goals.