Guest Column | November 12, 2025

Progressing Precision Medicine Discovery With AI

A conversation with Iya Khalil, Ph.D., vice president and head of data, AI, and genome sciences at Merck & Co.

Machine Learning-AI-GettyImages-2236865789

Merck created a foundation model for drug discovery that will help progress precision medicine. Their new machine learning tool, Transformers for Enabling Drug DiscoverY (TEDDY), leverages transformer architecture to map gene regulation in a new way, offering unique insight into the root causes of disease. TEDDY’s creation was approached with the goal of targeting the right drug to the right patient at the right time — faster.

In this Q&A, Life Science Connect’s Michelle Raley caught up with Iya Khalil, Ph.D., vice president and head of data, AI, and genome sciences at Merck & Co., to discuss the benefits of AI tools in drug discovery.

Why are foundation models the next leap forward in AI for disease biology?

Foundation models trained on vast, diverse “omics” data can learn generalizable representations of biology. They capture patterns in gene regulation, cell states, tissues, and diseases, so they transfer to many tasks (target ID, biomarker discovery, patient stratification) without retraining from scratch. By boosting the signal-to-noise ratio across massive data sets, they make it faster and more feasible to pinpoint causal disease drivers rather than just correlates, accelerating precision medicine. By adding large language models into drug discovery, we are enabling researchers to sift through data at a rate never achieved before, leading to faster research development. This has broad impact, speeding up the therapeutic development process by identifying new targets and precision biomarkers. This increases the probability of success and enables us to match patients to the right therapy at the right time.

Why is facilitating access to these language models pivotal to the industry’s evolution?

Easy access democratizes experimentation, which leads to greater impact. Open weights on platforms like Hugging Face allow wet lab and computational teams to integrate models directly into their pipelines, compare results across labs, and iterate quickly, raising scientific rigor and pace. Broad access also fosters ethical, transparent collaboration and aligns with the growing policy push for open, reproducible AI science. In fact, the White House recently emphasized the importance of open-source systems like Hugging Face during its announcement of the U.S. Action Plan, which makes me incredibly proud of our work at Merck.

For context, the Action Plan structures federal AI policy around three pillars: accelerating innovation; building American AI infrastructure; and leading in international diplomacy and security — all of which dovetail directly with TEDDY’s ambitions to operate at the frontier of biological modeling. The federal push for “cloud-enabled labs,” minimum standards for data quality, and supportive regulatory sandboxes can accelerate Merck’s efforts to scale and validate TEDDY models responsibly and with confidence.

Can you tell us more about the approach to creating TEDDY models and how Merck is using these models in drug discovery?

Understanding the factors that drive disease is critical for discovering new drugs. The launch of TEDDY is designed to advance how machine learning is used to understand disease biology and speed up the drug discovery process. TEDDY is a set of transformer-based foundation models trained on >116 million single cells from ~24,000 donors spanning 122 diseases. Training incorporates biological annotations (disease, tissue, cell type) as supervisory signals to make predictions biologically grounded and generalizable. Two complementary variants are released:

  • TEDDY-G (ranked gene representation) learns which genes are most implicated in a disease based on relationships with other genes.
  • TEDDY-X (grouped/category representation) learns gene group membership to capture variation in expression programs.

From that, models were scaled efficiently (≈10 million to 400 million parameters) on high-performance infrastructure to study how size and biological priors affect performance.

Beyond target identification, we use LLMs and AI agents to support lead discovery, safety assessment, and clinical trial design. This frees up scientists to focus on higher-order decisions while AI handles repetitive tasks, like data cleaning, hypothesis generation, and integration of vast datasets. In short, we view AI not as a replacement for human insight, but as an invaluable tool, enabling our teams to move faster, de-risk earlier, and explore novel biology more deeply than ever before.

I’m excited about how TEDDY allows teams to apply understanding to disease mechanisms, nominate high-value molecular targets, and develop precision biomarkers to match patients to the right therapy at the right time — improving lead optimization and de-risking early R&D.

How can drug companies avoid any pitfalls associated with using AI/LLMs in drug discovery?

In drug discovery, I’ve seen firsthand how AI and LLMs can create holes — whether it’s data bias leading to misleading predictions, outputs that are hard to interpret, or models that just don’t generalize across different diseases or patient populations. That’s exactly why these new paths are so important.

Among the ones we focused on for TEDDY, we’re keeping biology in the loop by using biologically annotated training data and validating AI hypotheses with orthogonal assays. We also favored multimodality by combining transcriptomic, proteomic, epigenomic, imaging, and clinical data to avoid single-modality blind spots. Additionally, we utilized cross-functional teams — we paired AI engineers with computational biologists and experimentalists to ensure outputs are actionable and testable.

What strides has this tool made since its creation?

We have made exciting progress in both gene regulation and genome science through TEDDY and are eager to see how other organizations utilize the tool. We’ve been able to distinguish causal regulatory signals from downstream effects and improve target prioritization. We also have mapped gene–gene relationships and expression programs across tissues and disease contexts, which reveal conserved and disease-specific regulatory modules. We’ve also been able to generalize unseen cell types/diseases, providing evidence that the models capture transferable rules of gene regulation rather than data set-specific artifacts.

These findings are significant because they deepen our understanding of gene regulation across tissues and disease contexts. By distinguishing causal signals from downstream effects, TEDDY helps identify true disease drivers for more precise target selection. Mapping gene–gene relationships and conserved regulatory modules provides a systems-level view of cellular function. Most importantly, TEDDY’s ability to generalize unseen cell types and diseases shows that it captures fundamental biological principles, enabling predictive insights even where experimental data are limited.

On April 1 of this year, Merck and the Blavatnik Institute at Harvard Medical School also announced that they’re coming together to create multimodal foundation AI models that can interpret information from evidence knowledge graphs to find new biomarkers for immunological diseases.

About The Author

Iya Khalil, Ph.D., is the vice president and head of data, AI, and genome sciences at Merck & Co. charged with driving an AI-first data-driven approach to early discovery biology. Prior to Merck, she was at Novartis where she headed the first-ever AI Innovation Lab in a major pharmaceutical company, leading the development and application of best-in-class AI platforms and algorithms across discovery, translation, early research, clinical, and commercial.

Khalil has a doctorate in physics and is an AI pioneer in the techbio field with 20+ years of experience in AI and “big (and deep) data” for life sciences and healthcare. She is the cofounder and co-inventor of the proprietary AI engine that underpins Aitia Bio, the leader in Casual AI and digital twins. Khalil was named to the PharmaVOICE 100 list of the most inspiring people in the life sciences industry and was recognized as an Endpoints News Top Women in Biopharma R&D for her forward-thinking approach in using AI to help fuel Merck’s diverse pipeline.

She serves as an advisor for the Bill and Melinda Gates Foundation in AgTech and is on the board of directors of Invaio Sciences. Her work was recognized by then President Obama at a White House dinner as a leading entrepreneur in genomic medicine. She has been named in Inc magazine’s list of top female founders of 2018 and in Forbes’ 2019 top women-led startups “crushing tech.”