Guest Column | June 18, 2026

Aligning Standards, Validation, And Governance For Digital Preclinical Research

Stefano Gaburro, CSO, and Szczepan W. Baran, CEO, Digital Preclinical Society

Business connections-GettyImages-509174197

The instrumentation of preclinical research has advanced substantially over the past decade. Automated home cage monitoring systems capture activity, circadian rhythm, body composition, and social interaction continuously and without the confounding effects of handling stress.¹ Digital biomarkers derived from such systems offer longitudinal, high-resolution endpoints.² Parallel developments in computational modeling, microphysiological systems, and machine learning have further increased the quantity and diversity of data available to support decision-making.

These advances have not yet produced a measurable improvement in the reliability of preclinical findings or in their predictive value for clinical outcomes. The present perspective examines the basis for this disparity. We contend that the deficits in reproducibility and translation are attributable primarily to structural factors, including inconsistent endpoints, inadequate statistical power, incomplete metadata, and misaligned incentives, rather than to insufficient measurement capacity. We then describe how a coordinating professional body could address these factors.

Reproducibility Deficits In Preclinical Research

The reproducibility of preclinical research has been documented as limited across multiple independent assessments. In an early analysis, Begley and Ellis³ reported that investigators at a single pharmaceutical company were able to confirm the principal findings of only six of 53 landmark preclinical oncology studies, corresponding to a reproducibility rate of approximately 11%. The Reproducibility Project: Cancer Biology subsequently attempted systematic replication and was able to complete only 50 of 193 planned experiments; the information required to calculate effect sizes and conduct power analyses was available for four of the 193 experiments.⁴ Freedman et al.⁵ estimated that more than half of preclinical research is not reproducible and associated this with approximately $28 billion in annual expenditure on irreproducible preclinical research in the United States, a figure presented as a modeled estimate rather than a measured value.

The recurring contributors identified across these analyses are methodological and structural. They include the use of weak or poorly defined endpoints, underpowered experimental designs, incomplete reporting of experimental detail and metadata, and incentive systems that reward novelty over verification.^6,7 The addition of higher resolution measurement does not address these contributors directly.

Attrition And The Interpretation Of Clinical Failure Rates

The limited predictive value of preclinical research is reflected in clinical attrition. Using a data set of more than 400,000 trial records, Wong et al.⁸ estimated the overall probability that a compound entering clinical development would obtain approval at 13.8%, with a corresponding probability of 3.4% in oncology. Hay et al.⁹ reported a comparable overall likelihood of approval of approximately 10%, and earlier analyses identified similar patterns of attrition concentrated in later development phases.¹⁰

The widely cited estimate that approximately 90% of drug candidates fail in clinical development is consistent with these data and follows directly from an overall approval probability near 14%. The interpretation of this figure warrants precision. A substantial proportion of late-stage attrition is attributable to insufficient efficacy rather than to safety findings. Efficacy failure in clinical testing following a positive preclinical signal indicates that the preclinical models employed did not adequately represent clinically relevant biology. The financial consequences of this pattern are considerable: DiMasi et al.¹¹ estimated the capitalized pre-approval cost of a new therapeutic at approximately $2.6 billion. Under these conditions, incremental improvements in the predictive validity of preclinical evidence carry substantial economic value.

Existing Standards And Their Fragmentation

A set of standards and frameworks capable of addressing the structural contributors to irreproducibility and poor translation already exists. The FAIR Guiding Principles define data that are findable, accessible, interoperable, and reusable, establishing a precondition for the reliable application of artificial intelligence to preclinical data sets.¹² The ARRIVE 2.0 guidelines specify reporting requirements for animal research and address directly the underpowered and incomplete reporting associated with irreproducibility.¹³ The Standard for Exchange of Nonclinical Data, maintained by the Clinical Data Interchange Standards Consortium and required by the United States FDA for nonclinical regulatory submissions, standardizes the structure of nonclinical data.¹⁴ On the validation of non-animal methods specifically, dedicated authorities are already established. In the United States, the Interagency Coordinating Committee on the Validation of Alternative Methods and the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods coordinate interagency evaluation of new approach methodologies.¹⁵ In the European Union, the European Union Reference Laboratory for alternatives to animal testing performs the equivalent function under Directive 2010/63/EU.¹⁶

Complementary developments address measurement quality. Continuous home cage monitoring and associated digital biomarkers provide endpoints with reduced confounding and greater longitudinal resolution.^1,2 A validation framework for in vivo digital measures has been described, adapting the verification, analytical validation, and clinical validation paradigm developed for digital medicine to a preclinical context of use.¹⁷ The digitalization of toxicology has been examined as a route to improved preclinical-to-clinical translation.¹⁸ Work on biological variation indicates that reproducibility can be enhanced through the deliberate introduction of controlled heterogeneity rather than through rigid standardization alone.¹⁹

These resources are developed and maintained in separate domains. Data interoperability standards are associated principally with data science communities; reporting guidelines with journals and animal welfare organizations; nonclinical data standards with regulatory affairs functions; and digital biomarkers with technology providers and a limited number of academic groups. The validation of non-animal methods is governed by separate national and intergovernmental authorities. No single body integrates these components into a coherent discipline supported by shared standards, common terminology, and collective accountability.

Regulatory Context And The Role Of Non-Animal Methods

The regulatory framework governing preclinical evidence has been the subject of frequent misinterpretation, particularly with respect to the relationship between animal and non-animal methods. The FDA Modernization Act 2.0, enacted within the Consolidated Appropriations Act of 2023, amended the relevant provisions of the Federal Food, Drug, and Cosmetic Act by replacing the phrase “preclinical tests (including tests on animals)” with “nonclinical tests” and by listing examples such as cell-based assays, microphysiological systems, and computer models.²⁰

The act did not prohibit animal testing, nor did it remove an existing requirement for animal testing, and the FDA possessed the authority to consider non-animal data prior to its enactment.²¹ The effect of the amendment is to direct evidentiary expectations toward methods that are fit for a defined purpose within a specified context of use. The governing question concerns whether a given method, applied in a given context, produces a clinically relevant signal. This orientation is consistent with the principles of replacement, reduction, and refinement articulated by Russell and Burch,²² interpreted as objectives to be achieved through improved scientific methodology.

Digital endpoints are relevant to this reframing because they are independent of the underlying model system. A validated digital measure may be applied to an animal model, a microphysiological readout, or a computational prediction, and data standards apply irrespective of the originating modality. A discipline organized around digital preclinical measurement is therefore positioned to support context-of-use evidence across method types.

The Function Of A Professional Society

Several consortia already address components of the digital preclinical landscape. The Pistoia Alliance coordinates precompetitive data standards across the biopharmaceutical industry.²³ The Digital Medicine Society developed the validation framework subsequently adapted for preclinical application.²⁴ The Digital In Vivo Alliance, a precompetitive consortium of preclinical and data scientists, advances the development, validation, and adoption of digital biomarkers for preclinical in vivo research.²⁵ The Clinical Data Interchange Standards Consortium maintains nonclinical data formats.¹⁴ Organizations including the National Centre for the Replacement, Refinement and Reduction of Animals in Research and the 3Rs Collaborative address welfare and method development,^26,27 and additional consortia address specific technical domains. Each of these organizations addresses a defined component, and none constitutes a professional society for the digital preclinical discipline considered as a whole.

A professional society performs functions that are not readily performed by a consortium, a commercial entity, an individual journal, or a regulatory authority. A society can establish standards with a neutrality that a commercial actor cannot provide, since standards advanced by a single vendor are unlikely to be adopted by competitors. It can convene academic, industrial, and regulatory participants on equal terms rather than within transactional relationships. It can provide training and certification, which support the formation of a recognized profession rather than a collection of techniques. It can maintain a durable and impartial venue for the validation of digital biomarkers, such that a measure validated once may be accepted broadly rather than reevaluated within each organization. It can also provide professional identity and development pathways for early-career scientists in a field that currently lacks defined professional designations.

These functions depend on neutrality, continuity, and membership, and are therefore characteristic of a professional society rather than of an individual commercial or research organization.

The Proposed Digital Preclinical Society

The authors are cofounders of the Digital Preclinical Society (DiPS). Baran serves as its chief executive officer and Gaburro as its chief scientific officer and president. This perspective is presented with that interest declared.

DiPS is being established as a United States 501(c)(6) membership organization. It is not yet incorporated and is described here as a forming organization. Its intended function is integrative rather than duplicative. These consortia operate predominantly at the level of components: data standards, validation methods, reporting requirements, and domain-specific tools. DiPS is positioned at the application layer, where these components are deployed together against a specific preclinical decision and where methods, measures, and models must be made coherent for a defined context of use. The existing consortia constitute established assets, and an organization that sought to displace them would be neither warranted nor effective. The proposed contribution of DiPS is to federate these components within a shared professional structure and to define common expectations for trustworthy digital preclinical data, thereby addressing an integrative gap that the existing organizations were not designed to fill.

Discussion: NAMs, Context Of Use, And Fit For Purpose

The classification of preclinical methods has increasingly been framed around new approach methodologies (NAMs), a term encompassing in vitro systems, microphysiological systems, computational and in silico models, and related non-animal approaches.^28,29 The present analysis treats NAMs and animal-based methods as method classes to be evaluated on equivalent terms, without an a priori assumption that either class is preferable. The relevant determinant of utility is not the category to which a method belongs but the evidence that the method generates a reliable and decision-relevant signal for a defined application.

Two related constructs support this position. The context of use specifies the role a method plays within a particular decision, including the question being addressed, the stage of development, and the regulatory or scientific consequence of the result. Fit for purpose denotes the degree to which a method’s demonstrated performance satisfies the requirements of that context. A method that is fit for purpose in one context of use may be inadequate in another, and this conditionality applies equally to animal models, in-vitro systems, microphysiological systems, and computational approaches. Neither construct privileges a method class; each requires that performance be characterized against an explicitly defined application.²⁸

This framing has direct implications for the standards and validation functions discussed above. Context of use and fit-for-purpose criteria can be operationalized only where method performance is documented in interoperable, well-annotated data and assessed through validation procedures appropriate to the intended application. The verification, analytical validation, and clinical validation paradigm described for in vivo digital measures is, in principle, applicable across method classes, since it characterizes measurement performance rather than method origin.¹⁷ Digital endpoints are themselves method agnostic and may be applied to animal models, microphysiological readouts, or computational predictions, which positions a digital preclinical discipline to support comparison and integration across method classes rather than substitution of one class by another.

A method-agnostic, context-of-use orientation also accommodates regulatory developments without overstating their effect. The statutory expansion of recognized nonclinical method types broadens the set of approaches that may be considered but does not establish the adequacy of any specific method for any specific purpose; adequacy remains an empirical determination tied to the context of use.^20,21 The community function required to support this determination, namely the consistent definition of contexts of use and the transparent documentation of fit-for-purpose evidence across method classes, is not at present held by any single organization.

Conclusion

The reproducibility and translation deficits documented in preclinical research are attributable principally to structural and infrastructural factors rather than to limitations in measurement capacity. The relevant technologies are largely available, and the relevant standards have, in substantial part, been defined. The principal deficiency is the absence of an integrative structure capable of coordinating these elements into a coherent discipline and of orienting the field toward context-of-use evidence rather than toward categorical distinctions between method types.

A professional society cannot by itself resolve the challenges of preclinical-to-clinical translation, and no single instrument or framework can do so. The establishment of such a society is, however, among the more tractable interventions available to the community and is one that the field is positioned to undertake directly. The Digital Preclinical Society is proposed on this basis, with the authors’ interests disclosed and with the recognition that its development remains in progress.

References

Voikar, V., & Gaburro, S. (2020). Three pillars of automated home-cage phenotyping of mice: Novel findings, refinement, and reproducibility based on literature and experience. Frontiers in Behavioral Neuroscience, 14, 575434. https://doi.org/10.3389/fnbeh.2020.575434
Baran, S. W., Bratcher, N., Dennis, J., Gaburro, S., Karlsson, E. M., Maguire, S., Makidon, P., Noldus, L. P. J. J., Potier, Y., Rosati, G., Ruiter, M., Schaevitz, L., Sweeney, P., & LaFollette, M. R. (2021). Emerging role of translational digital biomarkers within home cage monitoring technologies in preclinical drug discovery and development. Frontiers in Behavioral Neuroscience, 15, 758274. https://doi.org/10.3389/fnbeh.2021.758274
Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531–533. https://doi.org/10.1038/483531a
Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. eLife, 10, e71601. https://doi.org/10.7554/eLife.71601
Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The economics of reproducibility in preclinical research. PLoS Biology, 13(6), e1002165. https://doi.org/10.1371/journal.pbio.1002165
Begley, C. G., & Ioannidis, J. P. A. (2015). Reproducibility in science: Improving the standard for basic and preclinical research. Circulation Research, 116(1), 116–126. https://doi.org/10.1161/CIRCRESAHA.114.303819
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
Wong, C. H., Siah, K. W., & Lo, A. W. (2019). Estimation of clinical trial success rates and related parameters. Biostatistics, 20(2), 273–286. https://doi.org/10.1093/biostatistics/kxx069
Hay, M., Thomas, D. W., Craighead, J. L., Economides, C., & Rosenthal, J. (2014). Clinical development success rates for investigational drugs. Nature Biotechnology, 32(1), 40–51. https://doi.org/10.1038/nbt.2786
Kola, I., & Landis, J. (2004). Can the pharmaceutical industry reduce attrition rates? Nature Reviews Drug Discovery, 3(8), 711–716. https://doi.org/10.1038/nrd1470
DiMasi, J. A., Grabowski, H. G., & Hansen, R. W. (2016). Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics, 47, 20–33. https://doi.org/10.1016/j.jhealeco.2016.01.012
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
Percie du Sert, N., Hurst, V., Ahluwalia, A., Alam, S., Avey, M. T., Baker, M., Browne, W. J., Clark, A., Cuthill, I. C., Dirnagl, U., Emerson, M., Garner, P., Holgate, S. T., Howells, D. W., Karp, N. A., Lazic, S. E., Lidster, K., MacCallum, C. J., Macleod, M., … Würbel, H. (2020). The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. PLoS Biology, 18(7), e3000410. https://doi.org/10.1371/journal.pbio.3000410
CDISC. (n.d.). SEND: Standard for Exchange of Nonclinical Data. Clinical Data Interchange Standards Consortium. Retrieved June 11, 2026, from https://www.cdisc.org/standards/foundational/send
ICCVAM. (2018). A strategic roadmap for establishing new approaches to evaluate the safety of chemicals and medical products in the United States. Interagency Coordinating Committee on the Validation of Alternative Methods, National Toxicology Program. https://ntp.niehs.nih.gov/go/natl-strategy
EURL ECVAM. (n.d.). EU Reference Laboratory for alternatives to animal testing. European Commission, Joint Research Centre. Retrieved June 11, 2026, from https://joint-research-centre.ec.europa.eu/eu-reference-laboratory-alternatives-animal-testing-eurl-ecvam_en
Baran, S. W., Bolin, S. E., Gaburro, S., van Gaalen, M. M., LaFollette, M. R., Liu, C.-N., Maguire, S., Noldus, L. P. J. J., Bratcher-Petersen, N., & Berridge, B. R. (2025). Validation framework for in vivo digital measures. Frontiers in Toxicology, 6, 1484895. https://doi.org/10.3389/ftox.2024.1484895
Berridge, B. R., Baran, S. W., Kumar, V., Bratcher-Petersen, N., Ellis, M., Liu, C.-N., & Robertson, T. L. (2024). Digitalization of toxicology: Improving preclinical to clinical translation. Frontiers in Toxicology, 6, 1377542. https://doi.org/10.3389/ftox.2024.1377542
Voelkl, B., Altman, N. S., Forsman, A., Forstmeier, W., Gurevitch, J., Jaric, I., Karp, N. A., Kas, M. J., Schielzeth, H., Van de Casteele, T., & Würbel, H. (2020). Reproducibility of animal research in light of biological variation. Nature Reviews Neuroscience, 21(7), 384–393. https://doi.org/10.1038/s41583-020-0313-3
FDA Modernization Act 2.0, S. 5002, 117th Cong. (2022). https://www.congress.gov/bill/117th-congress/senate-bill/5002/text
National Association for Biomedical Research. (2023). NABR’s press statement on the FDA Modernization Act 2.0. https://www.nabr.org/about-nabr/news/nabrs-press-statement-fda-modernization-act-20
Russell, W. M. S., & Burch, R. L. (1959). The principles of humane experimental technique. Methuen.
Pistoia Alliance. (n.d.). About the Pistoia Alliance. Retrieved May 31, 2026, from https://pistoiaalliance.org/membership/about/
Goldsack, J. C., Coravos, A., Bakker, J. P., Bent, B., Dowling, A. V., Fitzer-Attas, C., Godfrey, A., Godino, J. G., Gujar, N., Izmailova, E., Manta, C., Peterson, B., Vandendriessche, B., Wood, W. A., Wang, K. W., & Dunn, J. (2020). Verification, analytical validation, and clinical validation (V3): The foundation of determining fit-for-purpose for Biometric Monitoring Technologies (BioMeTs). npj Digital Medicine, 3, 55. https://doi.org/10.1038/s41746-020-0260-4
DIVA. (n.d.). Digital In Vivo Alliance. Retrieved June 11, 2026, from https://diva.bio/
National Centre for the Replacement, Refinement and Reduction of Animals in Research. (n.d.). Who we are. Retrieved June 11, 2026, from https://nc3rs.org.uk/who-we-are
3Rs Collaborative. (n.d.). Advancing the 3Rs of animal research. Retrieved June 11, 2026, from https://www.3rc.org/
Schmeisser, S., Miccoli, A., von Bergen, M., Berggren, E., Braeuning, A., Busch, W., Desaintes, C., Gourmelon, A., Grafström, R., Harrill, J., Hartung, T., Herzler, M., Kass, G. E. N., Kleinstreuer, N., Leist, M., Luijten, M., Marx-Stoelting, P., Poetz, O., van Ravenzwaay, B., … Tralau, T. (2023). New approach methodologies in human regulatory toxicology – Not if, but how and when! Environment International, 178, 108082. https://doi.org/10.1016/j.envint.2023.108082
Stucki, A. O., Barton-Maclaren, T. S., Bhuller, Y., Henriquez, J. E., Henry, T. R., Hirn, C., Miller-Holt, J., Nagy, E. G., Perron, M. M., Ratzlaff, D. E., Stedeford, T. J., & Clippinger, A. J. (2022). Use of new approach methodologies (NAMs) to meet regulatory requirements for the assessment of industrial chemicals and pesticides for effects on human health. Frontiers in Toxicology, 4, 964553. https://doi.org/10.3389/ftox.2022.964553

About The Authors

Stefano Gaburro and Szczepan W. Baran are cofounders of the now-forming Digital Preclinical Society. Stefano Gaburro serves as chief scientific officer of DiPS. He is an independent scientific consultant and former scientific director of digital solutions at Tecniplast S.p.A. Szczepan W. Baran serves as chief executive officer of DiPS and is chief scientific officer of Instem.