Building Inclusivity Into Early-Stage Preclinical Drug Discovery
By Anuli Khairatkar

Cell lines are foundational in early-stage preclinical drug discovery, serving as well-characterized molecular systems for high-throughput screening, target validation, and toxicity testing. They provide consistent, reproducible models to investigate disease mechanisms and evaluate potential therapies before transitioning to more complex models. Typically, cell line panels within drug discovery are designed to encompass varying levels of target expression, mutational burden, gene amplification, or protein abundance, enabling evaluation of a drug candidate’s efficacy, selectivity, and mechanism of action.
Large, well-curated cell line collections have substantially expanded the experimental toolkit available for cancer research and drug development. One of the earliest and most influential collections includes the NCI-60 panel, comprising cell lines representing nine major tumor types: leukemia, colon, lung, CNS, kidney, melanoma, ovary, breast, and prostate. These panels were initially intended to identify compounds with tumor type-specific cytotoxicity, and patterns of sensitivity and resistance from such screens often reflected the underlying mechanisms of drug action. Building on such foundations, large-scale resources like the Cancer Cell Line Encyclopedia (CCLE; now part of DepMap) and the Catalogue of Somatic Mutations in Cancer (COSMIC) expanded the availability of molecularly annotated cell lines for preclinical cancer research. COSMIC, for instance, curates extensive genomic data across over 1,000 cell lines, and includes diverse parameters like point mutations, gene fusions, copy number variation, structural variants, gene expression, and methylation profiles. Similarly, the Cancer Genome Atlas (TCGA) provides multi-omics data that streamlined the identification of oncogenic drivers in human cancers. Additionally, they use genome-wide data to show TCGA cell line data superimposed with the inferred genetic ancestry of patients across 33 different cancer types.
Ancestry And Biological Sex
Despite these advances, several factors limit the translational validity of in vitro models, with cell line ancestry being a critical but often overlooked variable. A study evaluated responses to 28 chemotherapeutic agents in lymphoblastoid cell lines from individuals of Hispanic and non-Hispanic ancestry. They found that variability in drug response correlated with both self-reported race and inferred genetic ancestry. These findings indicate that while many drug response pathways may be ancestry-independent, genetic ancestry can influence responses to specific chemotherapeutic agents, supporting its consideration in preclinical research when evaluating inter-individual variability. However, ancestral information is rarely incorporated into model selection, leading to panels that fail to reflect global genetic diversity. This underrepresentation is further compounded by the makeup and annotation of commonly used cell line databases. In COSMIC, ancestry was unreported for 701 of 1,018 cancer cell lines, with most inferred to be of European origin.1 Similar underrepresentation was observed in the CCLE.2 Inferred ancestry data incorporated into TCGA to determine population-specific differences in oncogenic and biomarker mutations3,4 suggests that most preclinical cell lines remain overwhelmingly European in ancestry — approximately 63%.5 Even newer population-representative resources, such as the NCI’s patient-derived models’ repository, which includes xenografts, tumor cell cultures, and organoids, continue to lack diversity: 46% of samples have no reported race or ethnicity, and most others are of white/European origin.6 In some cases, this imbalance may risk prioritizing drug targets and biomarkers optimized for European-ancestry tumors, potentially resulting in less effective therapies and the underdevelopment of population-relevant treatment strategies. While repositories such as ATCC provide some ancestral information for individual cell lines, large-scale databases like COSMIC, CCLE, and TCGA provide comprehensive genomic and high-throughput data. Consequently, researchers often rely on these datasets to design panels, interpret drug response, and prioritize targets, meaning that systemic ancestry biases in these widely used resources can propagate into preclinical studies despite the existence of ancestry-annotated catalogs.
In addition to ancestry, biological sex is an equally important yet under-appreciated determinant of cellular response to treatment. Growing evidence over the past decade has shown sex-based differences in cancer incidence, response to chemotherapy, and survival, even in non-reproductive cancers. For example, responses to chemotherapy are often more favorable in female melanoma patients,7 and drugs such as 5-fluorouracil are cleared more rapidly in males, altering pharmacokinetics and therapeutic exposure.8 A study that screened 81 new cytotoxins across 14 human cell lines (seven male- and seven female-derived) under identical conditions found that male-derived cells were more sensitive to 79 of 81 compounds (~97.5%),9 underscoring that cellular sex exerts a robust, nontrivial influence on drug sensitivity. These differences are thought to arise from sex chromosome-linked gene expression and hormone-mediated signaling pathways, which influence drug metabolism, DNA repair, and cancer cell death.10 Failure to account for sex-based differences can bias drug discovery pipelines in the very early stages toward compounds preferentially active in one sex.
A Workflow For Inclusive Cell Line Panel Design
Specific actions can be implemented to better take into consideration ancestry and sex of cell lines during early-stage preclinical development.11 Inclusive cell line selection begins with intentional design of panels and rigorous documentation of key biological metadata (KBM), followed by ongoing authentication and transparent reporting.11 Below is a workflow that can be adopted during early-stage drug testing:
1. Build A Holistic Cell Line List
Typically, researchers first compile a candidate list of cell lines relevant to the disease and mechanism of interest, then systematically annotate each line with KBM. For each cell line, it is useful to record, at minimum, the following KBMs:
- Genetic ancestry stating the major ancestral group, often referred to as the “source population,” and estimated ancestry fractions
- Donor sex, age, and tissue of origin
- Major oncogenic drivers and tumor suppressor alterations (e.g., TP53, KRAS, EGFR), copy number changes, and relevant fusions
- Authentication status (STR profile), mycoplasma status, and source repository ID (e.g., RRID, Cellosaurus accession)
These metadata can be sourced from originating repositories or biobanks, primary publications, or specialized databases, such as Cellosaurus or ATCC. When using cell lines not sourced from specialized repositories that collect information on ancestry or sex, the source ancestry population(s) and sex may be determined to a certain extent using in-house genomic pipelines or outsourced services that perform genotyping or ancestry inference. One aspect to note is that not all will have the financial abilities and resources to do so but if this is possible, it should be implemented.
2. Select Diverse Experimental Panels
If there are multiple cell lines available in the annotated list, investigators can deliberately select panels that capture sex and the major ancestry populations. Where possible, panels should strive to have a balance in male- and female-derived lines to ensure both sexes are adequately represented in early in vitro screening.
3. Perform Authentication And Quality Control
To retain the specific genetic profile of cell lines across experiments, master banks should be frozen at the lowest feasible passage number after obtaining a verified cell line. Genetic and phenotypic characteristics (e.g., STR profile, key driver mutations, and basic growth parameters) should be re-verified at defined intervals. Experiments should preferentially use cells within a predefined low-passage window to limit potential phenotypic or genetic drift.
4. Report In Manuscripts
To make studies reusable and comparable, ancestry and sex for new, uncommonly used cell lines, along with other KBM, should be clearly reported in the methods or supplementary materials. For all cell lines, including stable identifiers (e.g., Cellosaurus accessions) and authentication dates supports traceability and allows other groups to reconstruct or extend inclusive panels in future work. When drug screening is performed using panels that include ancestry- and sex-diverse models, any observed ancestry- or sex-associated differences in drug response should be documented and reported where possible.
Conclusion
Systematically reporting both ancestry and sex as standard biological variables, in addition to parameters like tissue type or mutation status, may foster more representative and translationally relevant approaches to drug discovery. It is important to introduce inclusivity in the early design and annotation of in vitro drug discovery models to avoid selecting for compounds that are preferentially activated in specific genetic groups and/or biological sex. Ultimately, addressing these gaps is essential not only for scientific accuracy but also for advancing equity in biomedical research and precision therapeutics.
“Conscious inclusion of genetic ancestry in in vitro models will enable the identification of the root cause of disparity in treatment, pharmacological variables and even tumor biology. If we continue to ignore this variable as an inconvenience, we are willfully contributing to a system of perpetuating inequity.” - Raghavan S. (May 2022)12
References:
- Kessler, M. D., Bateman, N. W., Conrads, T. P., Maxwell, G. L., Dunning Hotopp, J. C. and O'connor, T. D. (2019). Ancestral characterization of 1018 cancer cell lines highlights disparities and reveals gene expression and mutational differences. Cancer 125, 2076-2088. 10.1002/cncr.32020
- Dutil, J., Chen, Z., Monteiro, A. N., Teer, J. K. and Eschrich, S. A. (2019). An interactive resource to probe genetic diversity and estimated ancestry in cancer cell lines. Cancer Res. 79, 1263-1273. 10.1158/0008-5472.CAN-18-2747
- Carrot-Zhang, J., Chambwe, N., Damrauer, J. S., Knijnenburg, T. A., Robertson, A. G., Yau, C., Zhou, W., Berger, A. C., Huang, K. L., Newberg, J. Y.et al. (2020). Comprehensive analysis of genetic ancestry and its molecular correlates in cancer. Cancer Cell 37, 639-654.e6. 10.1016/j.ccell.2020.04.012
- Yuan, J., Hu, Z., Mahal, B. A., Zhao, S. D., Kensler, K. H., Pi, J., Hu, X., Zhang, Y., Wang, Y., Jiang, J.et al. (2018). Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell 34, 549-560.e9. 10.1016/j.ccell.2018.08.019
- Nguyen, P. B. H., Ohnmacht, A. J., Sharifli, S., Garnett, M. J. and Menden, M. P. (2021). Inferred ancestral origin of cancer cell lines associates with differential drug response. Int. J. Mol. Sci. 22, 10135. 10.3390/ijms221810135
- Guerrero, S., López-Cortés, A., Indacochea, A., García-Cárdenas, J. M., Zambrano, A. K., Cabrera-Andrade, A., Guevara-Ramirez, P., González, D. A., Leone, P. E. and Paz, Y.-M. C. (2018). Analysis of racial/ethnic representation in select basic and applied cancer research studies. Sci. Rep. 8, 13978. 10.1038/s41598-018-32264-x
- Rampen, F. H. J. (1982). Malignant melanoma: sex differences in response to chemotherapy? Eur. J. Cancer Clin. Oncol. 18, 107-110. 10.1016/0277-5379(82)90033-5
- Mueller F, Büchel B, Köberle D, Schürch S, Pfister B, Krähenbühl S, Froehlich TK, Largiader CR, Joerger M. Gender-specific elimination of continuous-infusional 5-fluorouracil in patients with gastrointestinal malignancies: results from a prospective population pharmacokinetic study. Cancer Chemother Pharmacol. 2013 Feb;71(2):361-70. doi: 10.1007/s00280-012-2018-4. Epub 2012 Nov 9. PMID: 23139054.
- Nunes LM, Robles-Escajeda E, Santiago-Vazquez Y, Ortega NM, Lema C, Muro A, Almodovar G, Das U, Das S, Dimmock JR, Aguilera RJ, Varela-Ramirez A. The gender of cell lines matters when screening for novel anti-cancer drugs. AAPS J. 2014 Jul;16(4):872-4. doi: 10.1208/s12248-014-9617-4. Epub 2014 May 30. PMID: 24875051; PMCID: PMC4070257.
- Lopes-Ramos CM, Kuijjer ML, Ogino S, Fuchs CS, DeMeo DL, Glass K, Quackenbush J. Gene Regulatory Network Analysis Identifies Sex-Linked Differences in Colon Cancer Drug Metabolism. Cancer Res. 2018 Oct 1;78(19):5538-5547. doi: 10.1158/0008-5472.CAN-18-0454. Erratum in: Cancer Res. 2019 Apr 15;79(8):2084. doi: 10.1158/0008-5472.CAN-19-0678. PMID: 30275053; PMCID: PMC6169995.
- Zaaijer S, Capes-Davis A. Ancestry matters: Building inclusivity into preclinical study design. Cell. 2021 May 13;184(10):2525-2531. doi: 10.1016/j.cell.2021.03.041. PMID: 33989545.
- Raghavan S. How inclusive are cell lines in preclinical engineered cancer models? Dis Model Mech. 2022 May 1;15(5):dmm049520. doi: 10.1242/dmm.049520. Epub 2022 Jun 1. PMID: 35642685; PMCID: PMC9187871.
About The Author
Anuli Khairatkar is a biomedical scientist and science communicator specializing in cancer immunology and preclinical drug development. She began her career in a CRO setting, where she independently executed preclinical assays using oncolytic virology platforms and patient-derived organoids, earning recognition for exceeding revenue milestones. She later spent three years in the biotech industry contributing to immunotherapy pipeline development, with a focus on T-cell engagers and translational preclinical strategy. She is currently pursuing a Ph.D. in microbiology, genetics, and immunology, where she studies tumor–immune interactions within the tumor microenvironment to better understand mechanisms of immune evasion and therapeutic resistance.