Avoiding Protein Purification Artifacts That Still Undermine Drug Design
By Saeed Binsabaan, Ph.D.

Purity is one of the most commonly used, yet potentially misleading, words in protein science labs. Researchers often invest considerable time in a long purification process, aiming to obtain high protein concentration and see a clean band on sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) before moving on to the next step. Once this aim is achieved, the team moves ahead with structure determination, docking tests, screening experiments, and data collection.
In drug discovery, however, a purified protein may not represent the form that exists in a cellular environment. As a protein scientist, I can confidently say that many promising projects have quietly failed because the target protein, although highly pure and well-behaved, was not in its native state. Proteins can be subject to minor misfolding, adopt non-physiological conformations, undergo partial degradation, form artificial oligomers, or lose essential binding partners. However, the target protein may remain stable and homogeneous and still be able to generate high-resolution structural data.1-3 This is largely because most structural methods favor ordered and homogeneous protein populations over those that reflect biological relevance. For this reason, many drug discovery programs fail not because of poor chemistry but rather because the protein target is not accurately representative of its native state.4
What Are Protein Purification Artifacts?
Protein purification artifacts are common technical issues. They can be defined as structural or chemical changes introduced during protein expression or purification. These changes cause the purified protein to deviate from its native cellular state and typically do not present as poor purity.5 Purification artifacts occur regularly and may represent an overlooked source of error in many biological assays. When they go unnoticed, they can distort structural data and later lead to unreliable experimental results, including binding and activity assays. Given that drug discovery programs depend heavily on structure-guided approaches, the cost of purification artifacts is especially high.
The most common examples of protein purification artifacts include:
- misfolded protein that is still soluble but lacks its native fold
- aggregates or micro-aggregates that can change binding behavior
- truncations caused by proteolysis
- artificial oligomerization
- loss of binding partners, lipids, metals, or cofactors
- non-native states artificially stabilized during protein preparation
- post-translational modifications that are not present in the cellular context.5-7
The presence of the protein in these non-physiological states does not prevent it from passing standard quality control steps. The real concern, however, is not purity but rather biological relevance.
How Purification Artifacts Mislead Drug Design
Once the target 3D protein structure is revealed by conventional experimental methods or predicted using available computational tools, the protein is then ready to enter the drug discovery process. Drug discovery programs typically assume that the protein structure is a reliable representation of the biological target. This assumption forms the basis of structure-based drug design and binding kinetics analysis. If, however, the protein is not in its native form, the structural and biochemical information derived from it can become questionable, which, in turn, may lead to poor downstream decisions. In this context, drug discovery programs can be misled by purification artifacts in several ways:
1. Artificial pockets and binding sites
Aggregated proteins can expose non-native pockets or binding sites that do not exist in cellular environments. These pockets may become targets for designing ligands that fit into them and may even succeed in in vitro testing, but they can produce misleading results in cellular systems.8
2. Altered native states
Proteins are dynamic by nature and can exist in multiple transient conformations. Many targets are only druggable during these transient states.9,10 During purification, proteins may become biased toward a stable but non-native conformational state. This means that the biologically relevant conformation is no longer accessible. Drug design efforts may then focus on a protein adopting a non-native conformation, resulting in wasted resources and failed outcomes.
3. Distorted affinity and kinetics
A purified protein doesn’t always exist as a single homogeneous species. It may consist of correctly folded monomers alongside dimers and higher-order oligomers, as well as partially degraded fragments. These multiple species can behave differently during protein–small molecule interaction studies, potentially leading to biased or contradictory results. Kinetic parameters also can be affected as different species may associate with or dissociate from small molecules at different rates. This can generate misleading binding data that reflects a mixture of interactions across multiple structural forms of the protein.
4. Poor reproducibility
Proteins may behave differently when purified in different labs, or even across different batches within the same lab. Differences in expression host, buffer composition, temperature, purification tools, or purification duration can introduce subtle shifts in protein conformation or oligomeric state.11 If the protein behaves differently from lab to lab or batch to batch, the structure–activity relationship becomes unreliable, and compounds that are not real hits may appear promising.
Why Standard Quality Control Is Not Enough
Many protein science labs rely on standard quality control methods, such as SDS-PAGE, western blotting, UV absorbance, and basic size-exclusion chromatography (SEC). Although these techniques are valuable for confirming molecular weight, purity, and yield, they do not provide a complete picture of protein integrity. They cannot, for example, distinguish correctly folded from misfolded protein, and they are not reliable for detecting micro-aggregation or conformational heterogeneity. This can produce a protein that appears clean on SDS-PAGE and shows a single sharp SEC peak but remains in a non-native conformation. The gap here between apparent purity and actual structural integrity is potentially a major source of misinterpretation in drug discovery studies.
Identifying And Preventing Protein Purification Artifacts
Several biophysical and structural techniques are now available that offer more accurate assessment of protein quality than standard quality control methods. These include size-exclusion chromatography with multi-angle light scattering (SEC-MALS), mass photometry, and native mass spectrometry, which provide valuable information on molecular mass, oligomeric state, and the presence of aggregation.12,13 Other supporting methods, such as small-angle x-ray scattering (SAXS), nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (EM) heterogeneity analysis, enable scientists to investigate overall protein shape, conformational flexibility, and the presence of multiple structural populations in solution.14 Other protein quality issues, such as local stability, unfolding transitions, and dynamics, can be assessed using hydrogen deuterium exchange mass spectrometry (HDX-MS) and nano differential scanning fluorimetry (DSF), while cellular thermal shift assay (CETSA) can provide insight into structural integrity within the cellular environment.15-17
While these techniques provide valuable insight into protein quality, no single method is sufficient alone. They should be used in combination to obtain a reliable and comprehensive assessment of protein quality. By examining mass, oligomeric state, dynamics, and stability, scientists can better determine whether a purified protein represents the native biological target. This helps identify hidden structural issues early, saving time and effort and supporting a more reliable drug discovery process.
Many purification artifacts can be minimized or even avoided by careful practice from construct design through post-purification handling. Constructs can be optimized by removing aggregation-prone regions, applying rational truncations when needed, and selecting affinity tags carefully.18,19 These steps can significantly improve the likelihood of obtaining a native-like protein. During protein expression, factors such as temperature, inducer concentration, host strain, and expression duration can all influence protein folding.20 Throughout the purification process, parameters, such as buffer composition, ionic strength, cofactors, temperature, purification duration, and detergent selection, should be optimized to avoid risks that could affect protein folding, stability, or oligomeric state.20,21 After purification, the protein should be flash frozen immediately, and it is good practice to compare fresh and stored samples and assess batch-to-batch consistency to ensure reproducibility. This rigorous strategy at each purification stage is key to maintaining structural integrity and minimizing misleading outcomes in downstream assays.
Conclusion
While the concept of the structure–function relationship remains fundamentally valid, the dependence on purified protein in drug discovery underscores that protein structure is not a static form. The dynamic nature of many proteins is essential for maintaining their function within the biological environment.18 This implies that proteins may undergo structural changes, especially when removed from their native environment. Hence, protein purification artifacts are not rare or accidental; they represent an inherent risk in recombinant protein production. When these artifacts are overlooked, the cost of addressing them later in drug design or binding assays can be substantial. A good drug candidate is evaluated using high-quality structural data and tested against well-purified protein.
References
- Halder, R., Nissley, D.A., Sitarik, I. et al. How soluble misfolded proteins bypass chaperones at the molecular level. Nat Commun 14, 3689 (2023).
- Rinauro DJ, Chiti F, Vendruscolo M, Limbocker R. Misfolded protein oligomers: mechanisms of formation, cytotoxic effects, and pharmacological approaches against protein misfolding diseases. Mol Neurodegener. 2024 Feb 20;19(1):20
- Pukala TL. Mass spectrometric insights into protein aggregation. Essays Biochem. 2023 Mar 29;67(2):243-253
- Sun D, Gao W, Hu H, Zhou S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm Sin B. 2022 Jul;12(7):3049-3062. doi: 10.1016/j.apsb.2022.02.002. Epub 2022 Feb 11.
- Niedzialkowska E, Gasiorowska O, Handing KB, Majorek KA, Porebski PJ, Shabalin IG, Zasadzinska E, Cymborowski M, Minor W. Protein purification and crystallization artifacts: The tale usually not told. Protein Sci. 2016 Mar;25(3):720-33. doi: 10.1002/pro.2861. Epub 2016 Jan 26.
- Sidoryk-Węgrzynowicz, M.; Adamiak, K.; Strużyńska, L. Targeting Protein Misfolding and Aggregation as a Therapeutic Perspective in Neurodegenerative Disorders. Int. J. Mol. Sci. 2024, 25.
- Bhatwa A, Wang W, Hassan YI, Abraham N, Li XZ, Zhou T. Challenges Associated With the Formation of Recombinant Protein Inclusion Bodies in Escherichia coli and Strategies to Address Them for Industrial Applications. Front Bioeng Biotechnol. 2021 Feb 10;9:630551.
- Murugan NA, Nordberg A, Agren H. Cryptic sites in tau fibrils explain the preferential binding of AV-1451 PET tracer toward Alzheimer’s tauopathy. ACS Chem Neurosci. 2021;12(13):2437–2447.
- Meller, A., Ward, M., Borowsky, J. et al. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat Commun 14, 1177 (2023).
- Trezza, A.; Visibelli, A.; Roncaglia, B.; Barletta, R.; Iannielli, S.; Mahboob, L.; Spiga, O.; Santucci, A. Unveiling Dynamic Hotspots in Protein–Ligand Binding: Accelerating Target and Drug Discovery Approaches. Int. J. Mol. Sci. 2025, 26, 3971.
- Ventouri, I. K., Malheiro, D. B. A., Voeten, R. L. C., Kok, S., Honing, M., Somsen, G. W., & Haselberg, R. (2020). Probing protein denaturation during size-exclusion chromatography using native mass spectrometry. Analytical Chemistry, 92(8), 4292-4300
- Soltermann F, Foley EDB, Pagnoni V, Galpin M, Benesch JLP, Kukura P, Struwe WB. Quantifying Protein-Protein Interactions by Molecular Counting with Mass Photometry. Angew Chem Int Ed Engl. 2020 Jun 26;59(27):10774-10779.
- Benesch, J. L. P., & Robinson, C. V. (2006). Mass spectrometry of macromolecular assemblies: Preservation and dissociation. Current Opinion in Structural Biology, 16(2), 245-251.
- Ziegler, S. J., Mallinson, S. J. B., St. John, P. C., & Bomble, Y. J. (2021). Advances in integrative structural biology: Towards understanding protein complexes in their cellular context. Computational and Structural Biotechnology Journal, 19, 214-225.
- Narang, D.; Lento, C.; J. Wilson, D. HDX-MS: An Analytical Tool to Capture Protein Motion in Action. Biomedicines 2020, 8, 224.
- Kim SH, Yoo HJ, Park EJ, Na DH. Nano Differential Scanning Fluorimetry-Based Thermal Stability Screening and Optimal Buffer Selection for Immunoglobulin G. Pharmaceuticals (Basel). 2021 Dec 25;15(1):29.
- Dai, L., Prabhu, N., Yu, L. Y., Bacanu, S., Ramos, A. D., & Nordlund, P. (2019). Horizontal cell biology: Monitoring global changes of protein interaction states with the proteome-wide cellular thermal shift assay (CETSA). Annual Review of Biochemistry, 88, 383-408.
- Dyson HJ, Wright PE. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005 Mar;6(3):197-208.
- Hebditch M, Carballo-Amador MA, Charonis S, Curtis R, Warwicker J. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics. 2017 Oct 1;33(19):3098-3100.
- Rosano GL, Ceccarelli EA. Recombinant protein expression in Escherichia coli: advances and challenges. Front Microbiol. 2014 Apr 17;5:172.
- Bhat, E. A., Abdalla, M., & Rather, I. A. (2018). Key factors for successful protein purification and crystallization. Global Journal of Biotechnology and Biomaterial Science, 4(1), 1-7.
About The Author
Saeed Binsabaan, Ph.D., is a biochemist and structural biologist with broad experience in protein sciences. His expertise includes understanding how proteins behave, their structures and structure–function relationships, and how they can be targeted therapeutically. He earned his Ph.D. in biological sciences from the University of Pittsburgh, where he focused on studying protein structure and function, X-ray crystallography, and phage-host interactions. His postdoctoral work involved ubiquilin- and ubiquitin-mediated protein quality control and GPCR signaling, using a combination of structural, biochemical, computational, and cell-based approaches. He has contributed to drug discovery efforts and developed new assays and structural strategies to investigate dynamic protein complexes.