The Economics of Reproducibility in Preclinical Research

Affiliations.

  • 1 Global Biological Standards Institute, Washington, D.C., United States of America.
  • 2 Boston University School of Management, Boston, Massachusetts, United States of America.
  • 3 Boston University School of Management, Boston, Massachusetts, United States of America; Council of Economic Advisers, Washington, D.C., United States of America.
  • PMID: 26057340
  • PMCID: PMC4461318
  • DOI: 10.1371/journal.pbio.1002165

Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible-in the United States alone. We outline a framework for solutions and a plan for long-term improvements in reproducibility rates that will help to accelerate the discovery of life-saving therapies and cures.

  • Biomedical Research / economics*
  • Biomedical Research / standards
  • Reproducibility of Results*

Grants and funding

  •   OpenBU
  • BU Open Access Articles

The economics of reproducibility in preclinical research

Thumbnail

Date Issued

Publisher version.

Share to Facebook

Export Citation

Permanent link, citation (published version), collections.

  • BU Open Access Articles [6104]
  • QSB: Scholarly Works [231]

Related items

Showing items related by title, author, creator and subject.

Thumbnail

Contribution of leaf specular reflection to canopy reflectance under black soil case using stochastic radiative transfer model 

Thumbnail

Integrating archaeology and ancient DNA analysis to address invasive species colonization in the Gulf of Alaska 

Thumbnail

Size-resolved aerosol fluxes above a temperate broadleaf forest 

Deposit materials.

Browse Econ Literature

  • Working papers
  • Software components
  • Book chapters
  • JEL classification

More features

  • Subscribe to new research

RePEc Biblio

Author registration.

  • Economics Virtual Seminar Calendar NEW!

IDEAS home

The Economics of Reproducibility in Preclinical Research

  • Author & abstract
  • 6 References
  • 35 Citations
  • Most related
  • Related works & more

Corrections

  • Leonard P Freedman
  • Iain M Cockburn
  • Timothy S Simcoe
  • Iain M. Cockburn

Suggested Citation

Download full text from publisher, references listed on ideas.

Follow serials, authors, keywords & more

Public profiles for Economics researchers

Various research rankings in Economics

RePEc Genealogy

Who was a student of whom, using RePEc

Curated articles & papers on economics topics

Upload your paper to be listed on RePEc and IDEAS

New papers by email

Subscribe to new additions to RePEc

EconAcademics

Blog aggregator for economics research

Cases of plagiarism in Economics

About RePEc

Initiative for open bibliographies in Economics

News about RePEc

Questions about IDEAS and RePEc

RePEc volunteers

Participating archives

Publishers indexing in RePEc

Privacy statement

Found an error or omission?

Opportunities to help RePEc

Get papers listed

Have your research listed on RePEc

Open a RePEc archive

Have your institution's/publisher's output listed on RePEc

Get RePEc data

Use data assembled by RePEc

Europe PMC requires Javascript to function effectively.

Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page.

Search life-sciences literature (43,947,359 articles, preprints and more)

  • Free full text
  • Citations & impact
  • Similar Articles

The Economics of Reproducibility in Preclinical Research.

Author information, affiliations.

  • Freedman LP 1
  • Cockburn IM 2
  • Simcoe TS 3

Plos Biology , 09 Jun 2015 , 13(6): e1002165 https://doi.org/10.1371/journal.pbio.1002165   PMID: 26057340  PMCID: PMC4461318

Abstract 

Free full text .

Logo of plosbiol

The Economics of Reproducibility in Preclinical Research

Leonard p. freedman.

1 Global Biological Standards Institute, Washington, D.C., United States of America,

Iain M. Cockburn

2 Boston University School of Management, Boston, Massachusetts, United States of America,

Timothy S. Simcoe

3 Council of Economic Advisers, Washington, D.C., United States of America,

  • Associated Data

Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US 28 , 000 , 000 , 000 ( U S 28B)/year spent on preclinical research that is not reproducible—in the United States alone. We outline a framework for solutions and a plan for long-term improvements in reproducibility rates that will help to accelerate the discovery of life-saving therapies and cures.

  • Introduction

Much has been written about the alarming number of preclinical studies that were later found to be irreproducible [ 1 , 2 ]. Flawed preclinical studies create false hope for patients waiting for lifesaving cures; moreover, they point to systemic and costly inefficiencies in the way preclinical studies are designed, conducted, and reported. Because replication and cumulative knowledge production are cornerstones of the scientific process, these widespread accounts are scientifically troubling. Such concerns are further complicated by questions about the effectiveness of the peer review process itself [ 3 ], as well as the rapid growth of postpublication peer review (e.g., PubMed Commons, PubPeer), data sharing, and open access publishing that accelerate the identification of irreproducible studies [ 4 ]. Indeed, there are many different perspectives on the size of this problem, and published estimates of irreproducibility range from 51% [ 5 ] to 89% [ 6 ] ( Fig 1 ). Our primary goal here is not to pinpoint the exact irreproducibility rate, but rather to identify root causes of the problem, estimate the direct costs of irreproducible research, and to develop a framework to address the highest priorities. Based on examples from within life sciences, application of economic theory, and reviewing lessons learned from other industries, we conclude that community-developed best practices and standards must play a central role in improving reproducibility going forward.

a literature review by freedman cockburn and simcoe in 2015

Source: Begley and Ellis [ 6 ], Prinz et al. [ 7 ], Vasilevsky [ 8 ], Hartshorne and Schachner [ 5 ], and Glasziou et al. [ 9 ].

  • Defining Reproducibility

Studies of reproducibility define the phenomenon in a number of ways [ 10 ]. For example, some studies define reproducibility as the ability to replicate the same results demonstrated in a particular study using precisely the same methods and materials [ 11 ]; others evaluate whether the study’s methodology and results were presented in sufficient detail to allow replication or reanalysis [ 8 ]. The definition of reproducibility may also vary depending upon whether a particular study is confirmatory (designed to test basic theories through rigorous study design and analysis) or exploratory (primarily aimed at developing theories and frameworks for further study) [ 12 ]. For this paper, we adopt an inclusive definition of irreproducibility that encompasses the existence and propagation of one or more errors, flaws, inadequacies, or omissions (collectively referred to as errors) that prevent replication of results. Clearly, perfect reproducibility across all preclinical research is neither possible nor desirable. Attempting to achieve total reproducibility would dramatically increase the cost of such studies and radically curb their volume. Our assumption that current irreproducibility rates exceed a theoretically (and perhaps indeterminable) optimal level is based on the tremendous gap between the conventional 5% false positive rate (i.e., statistical significance level of 0.05) and the estimates reported below and elsewhere (see S1 Text and Fig 1 ). Although the optimal statistical power of each study will depend on its objectives, this large gap suggests that published preclinical study results are often less reliable than claimed. From an economic perspective, the system is highly inefficient. While there are several root causes, one overarching source of inefficiency is the continued emphasis on placing responsibility with the researcher—despite the fact that a significant portion of the costs of irreproducibility are ultimately borne by downstream parties in the translation of bench discoveries to bedside therapies [ 13 ].

  • Analysis of Four Categories of Irreproducibility

Many studies have concluded that the prevalence of irreproducible biomedical research is substantial [ 1 ]. The wide range of published estimates reflects the challenges of accurately quantifying and subsequently addressing the problem. Multiple systemic causes contribute to irreproducibility and many can ultimately be traced to an underlying lack of a standards and best practices framework [ 13 ]. However, it is reasonable to state that cumulative errors in the following broad categories—as well as underlying biases that could contribute to each problem area [ 14 ] or even result in entire studies never being published or reported [ 15 ]—are the primary causes of irreproducibility [ 16 ]: (1) study design, (2) biological reagents and reference materials, (3) laboratory protocols, and (4) data analysis and reporting. Fig 2 , S1 Text , S1 and S2 Datasets show the results of our analysis, which estimates the prevalence (low, high, and midpoint estimates) of errors in each category and builds up to a cumulative (total) irreproducibility rate that exceeds 50%. Using a highly conservative probability bounds approach [ 17 ], we estimate that the cumulative rate of preclinical irreproducibility lies between 18% (the maximum of the low estimates, assuming maximum overlap between categories), and 88.5% (the sum of the high estimates, assuming minimal overlap). A natural point estimate of the cumulative irreproducibility rate is the midpoint of the upper and lower bounds, or 53.3%.

a literature review by freedman cockburn and simcoe in 2015

Note that the percentage value of error for each category is the midpoint of the high and low prevalence estimates for that category divided (weighted) by the sum of all midpoint error rates (see S1 Dataset ). Source: Chakma et al. [ 18 ] and the American Association for the Advancement of Science (AAAS) [ 19 ].

Limitations of the Analysis

This analysis is subject to a number of important limitations, including (1) the small number of studies we were able to identify that provide or support the determination of low, high, and midpoint estimates of prevalence rates for one or more categories of irreproducibility; (2) the lack of consistency as to how reproducibility and irreproducibility are defined across studies; and (3) in some cases, extrapolating from a clinical environment to the preclinical setting when no suitable preclinical studies were available. For these reasons, a rigorous meta-analysis or systematic review was also not feasible. To estimate a theoretically optimal baseline rate of irreproducibility, we would also need data on the financial and opportunity costs of irreproducibility and how these costs (and benefits) vary within the population of preclinical studies. Nonetheless, even simple calculations of direct costs can show that irreproducible preclinical research is a significant problem in terms of lost dollars and lost opportunities for scientific discovery.

  • Economic Impact of Irreproducibility

Extrapolating from 2012 data, an estimated US$114.8B in the United States [ 18 ] is spent annually on life sciences research, with the pharmaceutical industry being the largest funder at 61.8%, followed by the federal government (31.5%), nonprofits (3.8%), and academia (3.0%) [ 20 ]. Of this amount, an estimated US 56.4 B ( 49 38B) [ 19 ]. Using a conservative cumulative irreproducibility rate of 50% means that approximately US$28B/year is spent on research that cannot be replicated (see Fig 2 and S2 Dataset ). Of course, uncertainty remains about the precise magnitude of the direct economic costs—the conservative probability bounds approach reported above suggest that these costs could plausibly be much smaller or much larger than US 28 B . N e v e r t h e l e s s , w e b e l i e v e a 50 28B/year, provides a reasonable starting point for further debate. To be clear, this does not imply that there was no return on that investment. As noted in a recent paper by Stern et al. [ 21 ], even in cases of retracted publications due to scientific misconduct, which is not a major source of irreproducibility [ 13 , 22 ], “it is conceivable that some of the research resulting in a retracted article still provides useful information for other nonretracted studies.” However, it does suggest that, even under our relatively conservative assumptions, the impact of the reproducibility problem is economically significant.

Irreproducibility also has downstream impacts in the drug development pipeline. Academic research studies with potential clinical applications are typically replicated within the pharmaceutical industry before clinical studies are begun, with each study replication requiring between 3 and 24 months and between US 500 , 000 t o U S 2,000,000 investment [ 23 ]. While industry will continue to replicate external studies for their own drug discovery process, a substantially improved preclinical reproducibility rate would derisk or result in an increased hit rate on such investments, both increasing the productivity of life science research and improving the speed and efficiency of the therapeutic drug development processes. The annual value added to the return on investment from taxpayer dollars would be in the billions in the US alone.

  • The Role of Best Practices and Standards

Many key stakeholder groups are developing and piloting a range of solutions to help increase reproducibility in preclinical research. For example, the National Institutes of Health (NIH) have recently announced a list of Principles and Guidelines for Reporting Preclinical Research [ 24 ], which over 100 journals have joined as cosignatories and that builds on previous recommendations by Landis et al. [ 25 ] to improve methodological reporting of animal studies in grant applications and publications. Despite the emergence of a wide variety of reporting guidelines to improve reporting of biomedical research methods and results, to date, compliance levels and their impact to improve reproducibility have been disappointing [ 26 ]. Given the size, scale, and complexity of the challenge of reproducibility in preclinical research, there is no single magic bullet solution to the problem. However, one issue that has shown demonstrable impact on similar challenges in other settings is the expanded development and adoption of standards and best practices [ 13 ].

In the information and communication technology industries, several standard development organizations have moved beyond simply defining technical interfaces to assume the role of a governing body for critical pieces of shared infrastructure. The Internet is a prime example. The evolution of the Web has been messy; constrained by patent claims, the financial benefit of controlling standards, and confusion over the evolutionary model. However, two organizations, the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) emerged to develop Web standards and maintain its interoperability as a universal space. The W3C is an excellent example of a successful, internally driven and self-regulating international consortium comprising a public and private partnership working together. Similarly, the IETF operates as a noncommercial/not-for-profit/nongovernmental organization and operates a large number of work groups and informal discussion groups, working on specific, timely issues, then disbanding once these issues are addressed. In the early days of the Internet, both groups successfully steered major global players toward common standards requiring each to compromise and adapt in the short term, but ultimately gain tremendous benefits over the longer horizon.

Although neither example focuses directly on reproducibility, they highlight the importance for the life sciences to engage all stakeholders in a dynamic, collaborative effort to standardize common scientific processes. In the clinical research arena, where the stakes are high and oversight by the US Food and Drug Administration (FDA) is stringent, irreproducibility has been reduced to rates that are generally considered to be scientifically and commercially appropriate [ 1 ]. However, this level of stringent oversight often precludes the direct application of clinical methods, practices, and procedures to preclinical research [ 27 ]. Furthermore, in a clinical setting, the number of assays and interventions is tightly controlled, which is not typically possible in a basic or preclinical research environment without incurring a significant increase in time and cost. Nonetheless, economic research also has shown that standardization and auditing of biological materials—through biological resource centers—can enhance cumulative production of scientific knowledge by improving both availability and reliability of research inputs [ 28 ].

An illustrative example is the use and misuse of cancer cell lines. The history of cell lines used in biomedical research is riddled with misidentification and cross-contamination events [ 29 ], which have been estimated to range from 15% to 36% [ 30 ]. Yet despite the availability of the short tandem repeat (STR) analysis as an accepted standard to authenticate cell lines, and its relatively low cost (approximately US$200 per assay), only one-third of labs typically test their cell lines for identity [ 31 ]. For an NIH-funded academic researcher receiving an average US 450 , 000 , f o u r − y e a r g r a n t , p u r c h a s i n g c e l l l i n e s f r o m a r e p u t a b l e v e n d o r ( o r v a l i d a t i n g t h e i r o w n s t o c k ) a n d t h e n a u t h e n t i c a t i n g a n n u a l l y w i l l o n l y c o s t a b o u t U S 1,000 or 0.2% of the award. A search of NIH Reporter for projects using “cell line” or “cell culture” suggests that NIH currently funds about US$3.7B annually on research using cell lines. Given that a quarter of these research projects apparently use misidentified or contaminated cell lines, reducing this to even 10% through a broader application of the STR standard—a very realistic goal—would ensure a more effective use of nearly three-quarters of a billion dollars and ultimately speed the progress of research and the development of new treatments for disease.

The economics literature on standardization posits that unless there is a clearly dominant platform leader willing to impose a solution, complex challenges such as irreproducibility that require a coordinated response are best solved by internally organized and driven, dynamic, and self-regulating collaborations of key stakeholders who establish and enforce their respective rules of engagement [ 32 , 33 ]. What is needed is not another list of unfunded mandates, but rather community consensus on priorities for improvement and commitment for the additional funding for implementation. This includes training that focuses specifically on the importance of standards and best practices in basic research in graduate and postdoctoral programs, as well as quality management systems to ensure that best practices are implemented throughout the research process. No doubt that improving training and increasing quality control measures will add costs to the preclinical research enterprise. One estimate in a clinical setting suggests the adoption of mandated quality control procedures would increase costs to 15% to 25% above current spending levels [ 34 ]. However, the societal benefits garnered from an increase in reproducible life science research far outweigh the cost. Assuming that we could recover even half of the approximately US 28 b i l l i o n a n n u a l l y s p e n t o n i r r e p r o d u c i b l e p r e c l i n i c a l r e s e a r c h i n t h e U S a l o n e b y a p p l y i n g b e s t p r a c t i c e s a n d s t a n d a r d s , t h e s a v i n g s w o u l d b e r o u g h l y U S 14B/year. Moreover, because our analysis indicates that errors in study design and biological reagents and materials contribute to a majority of this spend (see Fig 2 ), implementing steps to improve preclinical reproducibility should be a priority in these two areas (see Box 1 ).

Box 1. Investing in Practical Solutions

Taking immediate steps in two areas where there will be significant return on investment—study design and biological reagents and reference materials—will yield substantial improvements in preclinical reproducibility rates.

Study Design

Improve training programs at academic institutions to ensure that best practices are reinforced in the areas of core skills, methods, technology, and tools.

Establish targeted training, coaching, and certification of established principal investigators (PIs) to reinforce application of best practices throughout the research process.

Establish research funder policies, including funders such as NIH and leading disease foundations, requiring successful completion of training courses at all levels.

Biological Reagents and Reference Materials

Promote broad adoption by vendors to offer only validated reagents (e.g., antibodies and cell lines) and broad utilization of these reagents by PIs as a documented best practice in the research process.

Ensure that research funder policies require documented use of validated and noncontaminated reagents, annual reagent authentication throughout the research study, and adequate funding to cover these additional costs.

Ensure that procedures to document reagent validation and lack of contamination are required by publishers.

Incentivize the continued development of tools for reagent validation using improved genomics data.

Define standard operating procedures for biological materials handling throughout the material’s lifecycle.

In order to change practices throughout the preclinical research community, all invested stakeholders (academia, journals, industry, and government) must work in partnership to develop, institutionalize, and reward (or even sanction) behaviors, working within a mutually agreed upon set of rules and guiding principles. Such dynamic collaborations could more efficiently represent the needs of all stakeholders and provide unifying guidance and funding suggestions to facilitate meaningful change. Establishing effective collaborative efforts is no simple feat, but we can look to other industries that have been successful in the past as models for the life science community.

  • Conclusions

Although differing perspectives on the irreproducibility rate in preclinical research may persist, one fact remains clear: the challenge of increasing reproducibility and addressing the costs associated with the lack of reproducibility in life science research is simply too important and costly to ignore. Lifesaving therapies are being delayed, research budgets face increasing pressure, and drug development and treatment costs are rising. Improving reproducibility remains a critical cornerstone to solving each of these challenges. There are no easy answers to this problem. Real solutions, such as addressing errors in study design and using high quality biological reagents and reference materials, will require time, resources, and collaboration between diverse stakeholders that will be a key precursor to change. Millions of patients are waiting for therapies and cures that must first survive preclinical challenges. Although any effort to improve reproducibility levels will require a measured investment in capital and time, the long-term benefits to society that are derived from increased scientific fidelity will greatly exceed the upfront costs.

  • Supporting Information
  • Acknowledgments

We thank A. Gerstein and S. Rosenfield of Genesis Revenue Partners and the staff of the Global Biological Standards Institute (GBSI) for their support of this project.

  • Abbreviations
  • Funding Statement

The authors received no specific funding for this work.

Full text links 

Read article at publisher's site: https://doi.org/10.1371/journal.pbio.1002165

Citations & impact 

Impact metrics, citations of article over time, alternative metrics.

Altmetric item for https://www.altmetric.com/details/4125228

Article citations

Fecal microbiota impacts development of cryptosporidium parvum in the mouse..

Widmer G , Creasey HN

Sci Rep , 14(1):5498, 06 Mar 2024

Cited by: 0 articles | PMID: 38448682 | PMCID: PMC10917813

The Effect of Micronutrients on Obese Phenotype of Adult Mice Is Dependent on the Experimental Environment.

Yang Z , Kubant R , Kranenburg E , Cho CE , Anderson GH

Nutrients , 16(5):696, 29 Feb 2024

Cited by: 0 articles | PMID: 38474824 | PMCID: PMC10935069

Differences in enteric neuronal density in the NSE-Noggin mouse model across institutes.

Schonkeren SL , Thijssen MS , Idris M , Wouters K , de Vaan J , Teubner A , Gijbels MJ , Boesmans W , Melotte V

Sci Rep , 14(1):3686, 14 Feb 2024

Cited by: 0 articles | PMID: 38355947 | PMCID: PMC10866904

Consistency in Reporting of Loss of Righting Reflex for Assessment of General Anesthesia in Rats and Mice: A Systematic Review.

Teng MZ , Merenick D , Jessel A , Ganshorn H , Pang DSJ

Comp Med , 74(1):12-18, 01 Feb 2024

Cited by: 0 articles | PMID: 38532260

Risk of bias in exercise science: A systematic review of 340 studies.

Preobrazenski N , McCaig A , Turner A , Kushner M , Pacitti L , Mendolia P , MacDonald B , Storoschuk K , Bouck T , Zaza Y , Lu S , Gurd BJ

iScience , 27(3):109010, 26 Jan 2024

Cited by: 0 articles | PMID: 38405604 | PMCID: PMC10884506

Other citations

Wikipedia (2).

  • https://en.wikipedia.org/wiki/Chemical_Probes_Portal
  • https://en.wikipedia.org/wiki/Reproducibility_Project

Data behind the article

This data has been text mined from the article, or deposited into data resources.

BioStudies: supplemental material and supporting data

  • http://www.ebi.ac.uk/biostudies/studies/S-EPMC4461318?xr=true

Similar Articles 

To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.

The Costs of Reproducibility.

Poldrack RA

Neuron , 101(1):11-14, 01 Jan 2019

Cited by: 26 articles | PMID: 30605654

Reproducibility in science: improving the standard for basic and preclinical research.

Begley CG , Ioannidis JP

Circ Res , 116(1):116-126, 01 Jan 2015

Cited by: 414 articles | PMID: 25552691

The increasing urgency for standards in basic biologic research.

Freedman LP , Inglese J

Cancer Res , 74(15):4024-4029, 17 Jul 2014

Cited by: 48 articles | PMID: 25035389 | PMCID: PMC4975040

Free full text in Europe PMC

How Modeling Standards, Software, and Initiatives Support Reproducibility in Systems Biology and Systems Medicine.

Waltemath D , Wolkenhauer O

IEEE Trans Biomed Eng , 63(10):1999-2006, 02 Jun 2016

Cited by: 28 articles | PMID: 27295645

Tackling reproducibility in academic preclinical drug discovery.

Frye SV , Arkin MR , Arrowsmith CH , Conn PJ , Glicksman MA , Hull-Ryde EA , Slusher BS

Nat Rev Drug Discov , 14(11):733-734, 21 Sep 2015

Cited by: 31 articles | PMID: 26388229

Europe PMC is part of the ELIXIR infrastructure

Loading metrics

Open Access

Perspective

The Perspective section provides experts with a forum to comment on topical or controversial issues of broad interest.

See all article types »

The Economics of Reproducibility in Preclinical Research

* E-mail: [email protected]

Affiliation Global Biological Standards Institute, Washington, D.C., United States of America

Affiliation Boston University School of Management, Boston, Massachusetts, United States of America

Affiliations Boston University School of Management, Boston, Massachusetts, United States of America, Council of Economic Advisers, Washington, D.C., United States of America

  • Leonard P. Freedman, 
  • Iain M. Cockburn, 
  • Timothy S. Simcoe

PLOS

Published: June 9, 2015

  • https://doi.org/10.1371/journal.pbio.1002165
  • Reader Comments

10 Apr 2018: The PLOS Biology Staff (2018) Correction: The Economics of Reproducibility in Preclinical Research. PLOS Biology 16(4): e1002626. https://doi.org/10.1371/journal.pbio.1002626 View correction

Fig 1

Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone. We outline a framework for solutions and a plan for long-term improvements in reproducibility rates that will help to accelerate the discovery of life-saving therapies and cures.

Citation: Freedman LP, Cockburn IM, Simcoe TS (2015) The Economics of Reproducibility in Preclinical Research. PLoS Biol 13(6): e1002165. https://doi.org/10.1371/journal.pbio.1002165

Copyright: © 2015 Freedman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Funding: The authors received no specific funding for this work.

Competing interests: Dr. Simcoe currently works as a Senior Economist for the Council of Economic Advisers (CEA). The CEA disclaims responsibility for any of the views expressed herein and these views do not necessarily represent the views of the CEA or the United States.

Abbreviations: AAAS, American Association for the Advancement of Science; FDA, US Food and Drug Association; GBSI, Global Biological Standards Institute; IETF, Internet Engineering Task Force; NIH, National Institutes of Health; PI, principal investigator; STR, short tandem repeat; W3C, World Wide Web Consortium

Introduction

Much has been written about the alarming number of preclinical studies that were later found to be irreproducible [ 1 , 2 ]. Flawed preclinical studies create false hope for patients waiting for lifesaving cures; moreover, they point to systemic and costly inefficiencies in the way preclinical studies are designed, conducted, and reported. Because replication and cumulative knowledge production are cornerstones of the scientific process, these widespread accounts are scientifically troubling. Such concerns are further complicated by questions about the effectiveness of the peer review process itself [ 3 ], as well as the rapid growth of postpublication peer review (e.g., PubMed Commons, PubPeer), data sharing, and open access publishing that accelerate the identification of irreproducible studies [ 4 ]. Indeed, there are many different perspectives on the size of this problem, and published estimates of irreproducibility range from 51% [ 5 ] to 89% [ 6 ] ( Fig 1 ). Our primary goal here is not to pinpoint the exact irreproducibility rate, but rather to identify root causes of the problem, estimate the direct costs of irreproducible research, and to develop a framework to address the highest priorities. Based on examples from within life sciences, application of economic theory, and reviewing lessons learned from other industries, we conclude that community-developed best practices and standards must play a central role in improving reproducibility going forward.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Source: Begley and Ellis [ 6 ], Prinz et al. [ 7 ], Vasilevsky [ 8 ], Hartshorne and Schachner [ 5 ], and Glasziou et al. [ 9 ].

https://doi.org/10.1371/journal.pbio.1002165.g001

Defining Reproducibility

Studies of reproducibility define the phenomenon in a number of ways [ 10 ]. For example, some studies define reproducibility as the ability to replicate the same results demonstrated in a particular study using precisely the same methods and materials [ 11 ]; others evaluate whether the study’s methodology and results were presented in sufficient detail to allow replication or reanalysis [ 8 ]. The definition of reproducibility may also vary depending upon whether a particular study is confirmatory (designed to test basic theories through rigorous study design and analysis) or exploratory (primarily aimed at developing theories and frameworks for further study) [ 12 ]. For this paper, we adopt an inclusive definition of irreproducibility that encompasses the existence and propagation of one or more errors, flaws, inadequacies, or omissions (collectively referred to as errors) that prevent replication of results. Clearly, perfect reproducibility across all preclinical research is neither possible nor desirable. Attempting to achieve total reproducibility would dramatically increase the cost of such studies and radically curb their volume. Our assumption that current irreproducibility rates exceed a theoretically (and perhaps indeterminable) optimal level is based on the tremendous gap between the conventional 5% false positive rate (i.e., statistical significance level of 0.05) and the estimates reported below and elsewhere (see S1 Text and Fig 1 ). Although the optimal statistical power of each study will depend on its objectives, this large gap suggests that published preclinical study results are often less reliable than claimed. From an economic perspective, the system is highly inefficient. While there are several root causes, one overarching source of inefficiency is the continued emphasis on placing responsibility with the researcher—despite the fact that a significant portion of the costs of irreproducibility are ultimately borne by downstream parties in the translation of bench discoveries to bedside therapies [ 13 ].

Analysis of Four Categories of Irreproducibility

Many studies have concluded that the prevalence of irreproducible biomedical research is substantial [ 1 ]. The wide range of published estimates reflects the challenges of accurately quantifying and subsequently addressing the problem. Multiple systemic causes contribute to irreproducibility and many can ultimately be traced to an underlying lack of a standards and best practices framework [ 13 ]. However, it is reasonable to state that cumulative errors in the following broad categories—as well as underlying biases that could contribute to each problem area [ 14 ] or even result in entire studies never being published or reported [ 15 ]—are the primary causes of irreproducibility [ 16 ]: (1) study design, (2) biological reagents and reference materials, (3) laboratory protocols, and (4) data analysis and reporting. Fig 2 , S1 Text , S1 and S2 Datasets show the results of our analysis, which estimates the prevalence (low, high, and midpoint estimates) of errors in each category and builds up to a cumulative (total) irreproducibility rate that exceeds 50%. Using a highly conservative probability bounds approach [ 17 ], we estimate that the cumulative rate of preclinical irreproducibility lies between 18% (the maximum of the low estimates, assuming maximum overlap between categories), and 88.5% (the sum of the high estimates, assuming minimal overlap). A natural point estimate of the cumulative irreproducibility rate is the midpoint of the upper and lower bounds, or 53.3%.

thumbnail

Note that the percentage value of error for each category is the midpoint of the high and low prevalence estimates for that category divided (weighted) by the sum of all midpoint error rates (see S1 Dataset ). Source: Chakma et al. [ 18 ] and the American Association for the Advancement of Science (AAAS) [ 19 ].

https://doi.org/10.1371/journal.pbio.1002165.g002

Limitations of the Analysis

This analysis is subject to a number of important limitations, including (1) the small number of studies we were able to identify that provide or support the determination of low, high, and midpoint estimates of prevalence rates for one or more categories of irreproducibility; (2) the lack of consistency as to how reproducibility and irreproducibility are defined across studies; and (3) in some cases, extrapolating from a clinical environment to the preclinical setting when no suitable preclinical studies were available. For these reasons, a rigorous meta-analysis or systematic review was also not feasible. To estimate a theoretically optimal baseline rate of irreproducibility, we would also need data on the financial and opportunity costs of irreproducibility and how these costs (and benefits) vary within the population of preclinical studies. Nonetheless, even simple calculations of direct costs can show that irreproducible preclinical research is a significant problem in terms of lost dollars and lost opportunities for scientific discovery.

Economic Impact of Irreproducibility

Extrapolating from 2012 data, an estimated US$114.8B in the United States [ 18 ] is spent annually on life sciences research, with the pharmaceutical industry being the largest funder at 61.8%, followed by the federal government (31.5%), nonprofits (3.8%), and academia (3.0%) [ 20 ]. Of this amount, an estimated US$56.4B (49%) is spent on preclinical research, with government sources providing the majority of funding (roughly US$38B) [ 19 ]. Using a conservative cumulative irreproducibility rate of 50% means that approximately US$28B/year is spent on research that cannot be replicated (see Fig 2 and S2 Dataset ). Of course, uncertainty remains about the precise magnitude of the direct economic costs—the conservative probability bounds approach reported above suggest that these costs could plausibly be much smaller or much larger than US$28B. Nevertheless, we believe a 50% irreproducibility rate, leading to direct costs of approximately US$28B/year, provides a reasonable starting point for further debate. To be clear, this does not imply that there was no return on that investment. As noted in a recent paper by Stern et al. [ 21 ], even in cases of retracted publications due to scientific misconduct, which is not a major source of irreproducibility [ 13 , 22 ], “it is conceivable that some of the research resulting in a retracted article still provides useful information for other nonretracted studies.” However, it does suggest that, even under our relatively conservative assumptions, the impact of the reproducibility problem is economically significant.

Irreproducibility also has downstream impacts in the drug development pipeline. Academic research studies with potential clinical applications are typically replicated within the pharmaceutical industry before clinical studies are begun, with each study replication requiring between 3 and 24 months and between US$500,000 to US$2,000,000 investment [ 23 ]. While industry will continue to replicate external studies for their own drug discovery process, a substantially improved preclinical reproducibility rate would derisk or result in an increased hit rate on such investments, both increasing the productivity of life science research and improving the speed and efficiency of the therapeutic drug development processes. The annual value added to the return on investment from taxpayer dollars would be in the billions in the US alone.

The Role of Best Practices and Standards

Many key stakeholder groups are developing and piloting a range of solutions to help increase reproducibility in preclinical research. For example, the National Institutes of Health (NIH) have recently announced a list of Principles and Guidelines for Reporting Preclinical Research [ 24 ], which over 100 journals have joined as cosignatories and that builds on previous recommendations by Landis et al. [ 25 ] to improve methodological reporting of animal studies in grant applications and publications. Despite the emergence of a wide variety of reporting guidelines to improve reporting of biomedical research methods and results, to date, compliance levels and their impact to improve reproducibility have been disappointing [ 26 ]. Given the size, scale, and complexity of the challenge of reproducibility in preclinical research, there is no single magic bullet solution to the problem. However, one issue that has shown demonstrable impact on similar challenges in other settings is the expanded development and adoption of standards and best practices [ 13 ].

In the information and communication technology industries, several standard development organizations have moved beyond simply defining technical interfaces to assume the role of a governing body for critical pieces of shared infrastructure. The Internet is a prime example. The evolution of the Web has been messy; constrained by patent claims, the financial benefit of controlling standards, and confusion over the evolutionary model. However, two organizations, the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) emerged to develop Web standards and maintain its interoperability as a universal space. The W3C is an excellent example of a successful, internally driven and self-regulating international consortium comprising a public and private partnership working together. Similarly, the IETF operates as a noncommercial/not-for-profit/nongovernmental organization and operates a large number of work groups and informal discussion groups, working on specific, timely issues, then disbanding once these issues are addressed. In the early days of the Internet, both groups successfully steered major global players toward common standards requiring each to compromise and adapt in the short term, but ultimately gain tremendous benefits over the longer horizon.

Although neither example focuses directly on reproducibility, they highlight the importance for the life sciences to engage all stakeholders in a dynamic, collaborative effort to standardize common scientific processes. In the clinical research arena, where the stakes are high and oversight by the US Food and Drug Administration (FDA) is stringent, irreproducibility has been reduced to rates that are generally considered to be scientifically and commercially appropriate [ 1 ]. However, this level of stringent oversight often precludes the direct application of clinical methods, practices, and procedures to preclinical research [ 27 ]. Furthermore, in a clinical setting, the number of assays and interventions is tightly controlled, which is not typically possible in a basic or preclinical research environment without incurring a significant increase in time and cost. Nonetheless, economic research also has shown that standardization and auditing of biological materials—through biological resource centers—can enhance cumulative production of scientific knowledge by improving both availability and reliability of research inputs [ 28 ].

An illustrative example is the use and misuse of cancer cell lines. The history of cell lines used in biomedical research is riddled with misidentification and cross-contamination events [ 29 ], which have been estimated to range from 15% to 36% [ 30 ]. Yet despite the availability of the short tandem repeat (STR) analysis as an accepted standard to authenticate cell lines, and its relatively low cost (approximately US$200 per assay), only one-third of labs typically test their cell lines for identity [ 31 ]. For an NIH-funded academic researcher receiving an average US$450,000, four-year grant, purchasing cell lines from a reputable vendor (or validating their own stock) and then authenticating annually will only cost about US$1,000 or 0.2% of the award. A search of NIH Reporter for projects using “cell line” or “cell culture” suggests that NIH currently funds about US$3.7B annually on research using cell lines. Given that a quarter of these research projects apparently use misidentified or contaminated cell lines, reducing this to even 10% through a broader application of the STR standard—a very realistic goal—would ensure a more effective use of nearly three-quarters of a billion dollars and ultimately speed the progress of research and the development of new treatments for disease.

The economics literature on standardization posits that unless there is a clearly dominant platform leader willing to impose a solution, complex challenges such as irreproducibility that require a coordinated response are best solved by internally organized and driven, dynamic, and self-regulating collaborations of key stakeholders who establish and enforce their respective rules of engagement [ 32 , 33 ]. What is needed is not another list of unfunded mandates, but rather community consensus on priorities for improvement and commitment for the additional funding for implementation. This includes training that focuses specifically on the importance of standards and best practices in basic research in graduate and postdoctoral programs, as well as quality management systems to ensure that best practices are implemented throughout the research process. No doubt that improving training and increasing quality control measures will add costs to the preclinical research enterprise. One estimate in a clinical setting suggests the adoption of mandated quality control procedures would increase costs to 15% to 25% above current spending levels [ 34 ]. However, the societal benefits garnered from an increase in reproducible life science research far outweigh the cost. Assuming that we could recover even half of the approximately US$28 billion annually spent on irreproducible preclinical research in the US alone by applying best practices and standards, the savings would be roughly US$14B/year. Moreover, because our analysis indicates that errors in study design and biological reagents and materials contribute to a majority of this spend (see Fig 2 ), implementing steps to improve preclinical reproducibility should be a priority in these two areas (see Box 1 ).

Box 1. Investing in Practical Solutions

Taking immediate steps in two areas where there will be significant return on investment—study design and biological reagents and reference materials—will yield substantial improvements in preclinical reproducibility rates.

Study Design

  • Improve training programs at academic institutions to ensure that best practices are reinforced in the areas of core skills, methods, technology, and tools.
  • Establish targeted training, coaching, and certification of established principal investigators (PIs) to reinforce application of best practices throughout the research process.
  • Establish research funder policies, including funders such as NIH and leading disease foundations, requiring successful completion of training courses at all levels.

Biological Reagents and Reference Materials

  • Promote broad adoption by vendors to offer only validated reagents (e.g., antibodies and cell lines) and broad utilization of these reagents by PIs as a documented best practice in the research process.
  • Ensure that research funder policies require documented use of validated and noncontaminated reagents, annual reagent authentication throughout the research study, and adequate funding to cover these additional costs.
  • Ensure that procedures to document reagent validation and lack of contamination are required by publishers.
  • Incentivize the continued development of tools for reagent validation using improved genomics data.
  • Define standard operating procedures for biological materials handling throughout the material’s lifecycle.

In order to change practices throughout the preclinical research community, all invested stakeholders (academia, journals, industry, and government) must work in partnership to develop, institutionalize, and reward (or even sanction) behaviors, working within a mutually agreed upon set of rules and guiding principles. Such dynamic collaborations could more efficiently represent the needs of all stakeholders and provide unifying guidance and funding suggestions to facilitate meaningful change. Establishing effective collaborative efforts is no simple feat, but we can look to other industries that have been successful in the past as models for the life science community.

Conclusions

Although differing perspectives on the irreproducibility rate in preclinical research may persist, one fact remains clear: the challenge of increasing reproducibility and addressing the costs associated with the lack of reproducibility in life science research is simply too important and costly to ignore. Lifesaving therapies are being delayed, research budgets face increasing pressure, and drug development and treatment costs are rising. Improving reproducibility remains a critical cornerstone to solving each of these challenges. There are no easy answers to this problem. Real solutions, such as addressing errors in study design and using high quality biological reagents and reference materials, will require time, resources, and collaboration between diverse stakeholders that will be a key precursor to change. Millions of patients are waiting for therapies and cures that must first survive preclinical challenges. Although any effort to improve reproducibility levels will require a measured investment in capital and time, the long-term benefits to society that are derived from increased scientific fidelity will greatly exceed the upfront costs.

Supporting Information

S1 text. analysis to determine irreproducibility rate of preclinical research..

https://doi.org/10.1371/journal.pbio.1002165.s001

S1 Dataset. Analysis.

https://doi.org/10.1371/journal.pbio.1002165.s002

S2 Dataset. Economic impact.

https://doi.org/10.1371/journal.pbio.1002165.s003

Acknowledgments

We thank A. Gerstein and S. Rosenfield of Genesis Revenue Partners and the staff of the Global Biological Standards Institute (GBSI) for their support of this project.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 13. GBSI (2013) The Case for Standards in Life Science Research: Seizing Opportunities at a Time of Critical Need. Washington, D.C.: Global Biological Standards Institute (GBSI). 41 p.
  • 17. Manski CF (2003) Partial Identification of Probability Distributions. In: Manski CF, editor. Springer Series in Statistics New York, New York, USA: Springer International Publishing AG. pp. 178.
  • 19. AAAS (2013) AAAS Report XXXVIII: Research and Development FY 2014. Washington, DC, USA: American Association for the Advancement of Science (AAAS). 315 p.
  • 20. Battelle (2013) 2014 Global R&D Funding Forecast. Columbus, Ohio, USA: Battelle. 36 p.
  • 23. PhRMA (2013) 2013 Biopharmaceutical Research Industry Profile. Washington, DC, USA: Pharmaceutical Research and Manufacturers of America (PhRMA). 78 p.
  • 32. Farrell J, Simcoe T (2012) Four Paths to Compatibility. In: Peitz M, Waldfogel J, editors. The Oxford Handbook of the Digital Economy. New York, New York, USA: Oxford University Press. pp. 34–58.
  • 33. Ostrom E (1990) Governing the Commons: The Evolution of Institutions for Collective Action. New York, New York, USA: Oxford University Press. 298 p.
  • 34. Berte LM, Daley AT (2014) Understanding the Cost of Quality in the Laboratory; A Report. Wayne, Pennsylvania, USA: Clinical Laboratory Standards Institute (CLSI). QMS20-R QMS20-R 94 p.

Irreproducible biology research costs put at $28 billion per year

  • Monya Baker  

Nature ( 2015 ) Cite this article

2977 Accesses

16 Citations

1014 Altmetric

Metrics details

  • Peer review
  • Research data
  • Research management

Study calculates cost of flawed biomedical research in the United States.

a literature review by freedman cockburn and simcoe in 2015

Scientists in the United States spend $28 billion each year on basic biomedical research that cannot be repeated successfully. That is the conclusion of a study published on 9 June in PLoS Biology 1 that attempts to quantify the causes, and costs, of irreproducibility.

John Ioannidis, an epidemiologist at Stanford University in California who studies scientific robustness, says that the analysis is sure to prompt discussion about the problem — but should be taken with a pinch of salt, given that its estimates carry great uncertainty.

But Len Freedman, the study’s lead author and head of the non-profit Global Biological Standards Institute in Washington DC, says that the work is of value, even though it cannot pin down the size of the problem. “Clearly, there are tremendous inefficiences [in research], and this is putting a spotlight on that,” says Freedman, whose group seeks to develop best practices for biological experiments.

Tracing trouble

Freedman and his colleagues defined irreproducibility broadly, as being any errors or omissions that prevent attempts to replicate experimental findings. The researchers sought to determine the relative contributions of four sources of irreproducibility identified in an earlier analysis 2 : study design; laboratory protocols; biological reagents and reference materials; and data analysis and reporting.

The team did not examine whether the results of any individual paper were reproducible, however. Instead, the researchers surveyed existing analyses of factors that contribute to irreproducibility, using these to estimate an overall reproducibility rate — and to weigh the relative influences of various factors.

Iain Cockburn, an economist at Boston University in Massachusetts and a co-author of the study, says that the analysis was limited by the quality and amount of data available. For example, assessments of poor materials were based on estimates of how often cell lines are misidentified. In other cases, the team extrapolated from reports from clinical work.

Overall, the team found that poor materials made the largest contribution to reproducibility problems, at 36%, followed by study design at 28% and data analysis at 26%. The team estimates the overall rate of irreproducibility at 53%, but cautions that the true rate could be anywhere between 18% and 89%. That puts the potential economic cost of irreproducibility anywhere from $10 billion to $50 billion per year.

“The four categories are decent, but the estimates are off,” says Ioannidis. “I would put a much higher rate on the data analysis and reporting component.”

The analysis is further limited by the fact that the four contributing factors that it uses are interdependent, says Melissa Haendel, an information scientist at Oregon Health and Science University in Portland. For example, research materials and the controls used to assess them are part of laboratory protocols, and study design affects data analysis.

Seeking solutions

Concerns about reproducibility in biomedical research have increased over the past decade, partly in response to a 2005 study by Ioannidis that found scientific journals to be biased towards publishing flashy, positive results 3 . And researchers at pharmaceutical companies have reported that their attempts to replicate the conclusions of peer-reviewed papers fail at rates upwards of 75% 4 , 5 .

Several institutions are taking steps to address the problem. In November, for example, the US National Institutes of Health urged journals to adopt guidelines aimed at boosting reproducibility. (Nature Publishing Group is among many scientific publishers that have done so.)

The major cost of irreproducibility does not lie with a faulty initial study, but with the wasted follow-on work that it inspires, says Lee Ellis, a cancer biologist at the University of Texas MD Anderson Cancer Center in Houston. Scientists may lose time pursuing false leads, which can delay discoveries that lead to new therapies.

Freedman hopes that his latest work will help to convince scientists that taking small steps, such as better documenting of protocols and the use of certified reagents, could produce large gains in reproducibility. “The message is less about ‘Oh my god, we’re flushing $20 billion down the toilet’,” he says, “and more about ‘Here is an opportunity to increase efficiencies to get more bang for the buck’.”

Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. PLoS Biol. 13 , e1002165 (2015).

Article   Google Scholar  

Freedman, L. P. & Inglese, J. Cancer Res. 74 , 4024–4029 (2014).

Article   CAS   Google Scholar  

Ioannidis, J. P. A. PLoS Med. 2 , e124 (2005).

Begley, C. G. & Ellis, L. M. Nature 483 , 531–533 (2012).

Article   ADS   CAS   Google Scholar  

Prinz, F., Schlange, T. & Asadullah, K. Nature Rev. Drug Discov. 10 , 712 (2011).

Download references

You can also search for this author in PubMed   Google Scholar

Related links

Related links in nature research.

Researchers argue for standard format to cite lab resources 2015-May-29

US societies push back against NIH reproducibility guidelines 2015-Apr-17

Policy: NIH plans to enhance reproducibility 2014-Jan-27

Independent labs to verify high-profile papers 2012-Aug-14

Drug development: Raise standards for preclinical cancer research 2012-Mar-28

Related external links

Challenges in Irreproducible Research

Global Biological Standards Institute

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Baker, M. Irreproducible biology research costs put at $28 billion per year. Nature (2015). https://doi.org/10.1038/nature.2015.17711

Download citation

Published : 09 June 2015

DOI : https://doi.org/10.1038/nature.2015.17711

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Analytic transparency is key for reproducibility of agricultural research.

  • Gudeta W. Sileshi

CABI Agriculture and Bioscience (2023)

Recognizing and marshalling the pre-publication error correction potential of open data for more reproducible science

  • Rebecca Shuhua Chen
  • Ane Liv Berthelsen
  • Tim Schmoll

Nature Ecology & Evolution (2023)

Practical guide for managing large-scale human genome data in research

  • Tomoya Tanjo
  • Yosuke Kawai
  • Masao Nagasaki

Journal of Human Genetics (2021)

Motivations, benefits and challenges on ISO/IEC 17025 accreditation of higher education institution laboratories

  • Inês Hexsel Grochau
  • Carla Schwengber ten Caten
  • Maria Madalena de Camargo Forte

Accreditation and Quality Assurance (2018)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

a literature review by freedman cockburn and simcoe in 2015

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

THE ART OF SMART SCIENCE: WEAVING THEORY AND RISKY STUDY DESIGN INTO PSYCHOPATHOLOGY RESEARCH AND RDOC

Uma vaidyanathan.

a University of Minnesota

Scott I. Vrieze

b University of Colorado Boulder

William G. Iacono

Uma Vaidyanathan : Department of Psychology, University of Minnesota, N218 Elliot Hall, 75 East River Road, Minneapolis, MN 55455. ude.nmu710ydiav

Scott I. Vrieze : Department of Psychology & Neuroscience, Institute for Behavioral Genetics, University of Colorado Boulder, 1480 30 th Street, Boulder, CO 80303. ude.odarolocezeirv.ttocs

William G. Iacono : Department of Psychology, University of Minnesota, N218 Elliot Hall, 75 East River Road, Minneapolis, MN 55455. ude.nmuonocaiw

Yes, the art. We are grateful for the opportunity to respond to commentaries on our target article. Just like the commentators, the commentaries were excellent, expanding on the target article in important ways while simultaneously taking us to task in others. On one thing each commentator agreed: there is a lot of room to improve how we conduct research to advance knowledge. However, there was less agreement on how best to do that, reflecting the complex and interrelated nature of experimental design, statistics, multi-method inference, and causal inference, which were the focus of our target article. The commentaries all expounded on these issues in important ways and many took the topic further, emphasizing the importance of clinical application, tradition, and health policy.

Special Issue Recap and Overview of Response

We have summarized in Table 1 the chief arguments for our collective main theses and the solutions proposed to move the field forward. We delve further into these themes in this article, cutting across them to emphasize commonalities and grouping topics in our discussion to tease out philosophical and policy perspectives. It was unfortunately not feasible to cover in our response all the themes that were raised, in no small part due to the extensive experience and expertise represented by our reviewers. Our goal in this response is to weave these thoughtful perspectives into a bigger picture and propose a way forward for mental health science.

Conducting Psychopathology Research: Problems and Solutions

Theme 1: Philosophy, Data, Theory, and the Puzzle of How Best to Integrate Them

Our target article dealt with issues relating to study design and evaluation, as well as what constitutes strong evidence for and against particular etiological theories. Most of the responses we received were likewise focused on these themes.

I. Validity and Measurement

A number of commentators, including Lilienfeld and Pinto (this issue) , and Markon (this issue) , explicated the role of various types of validity, and emphasized the importance of measurement issues involved in study design. Lilienfeld and Pinto (this issue) agreed with our arguments that the measures and methods we use are imperfect indicators of the phenomena we are attempting to study. We wholeheartedly concur that it is important to understand measurement properties for a construct of interest, and whether the methods scientists use provide a stringent means to test substantive hypotheses, methods that go beyond mere focus on a t-test or ANOVA result (as Miller and Yee (this issue) noted). Moreover, as emphasized by Lilienfeld and Pinto (this issue) , if the construct we are measuring is a robust one and has some real world meaning, results from different studies should converge, regardless of the statistic, questionnaire, or methodology used.

Markon (this issue) , on the other hand, emphasized more the roles of statistics, ontology and parsimony in shaping scientific discourse and inference. We agree there is great value in quantification and precise measurement; our point was that focusing on these qualities in the absence of a thorough understanding of their limits, a good research design, and a strong theory will not yield substantial etiological insight. An example of this comes from a recent meta-analysis of almost every twin study undertaken from 1958 – 2012 ( Polderman et al., 2015 ). This meta-analysis investigated the heritability of almost every medical and psychiatric condition from twin studies using 14.5 million twin pairs. Clearly, this undertaking represents an important integration and summary of decades of twin research that will be useful as a reference for years to come. The results suggested that most traits fit an additive genetic model with about 49% of variation attributable to additive genetic factors on average. Some exceptions were noted for psychiatric conditions like conduct disorder and recurrent major depression where the pattern of twin correlation suggested shared environment or non-additive genetic factors. However, as the authors of the paper pointed out, this study could not pinpoint the reason for the “missing heritability” indicated from genome wide association studies showing that observed molecular genetic variation cannot account fully for the heritability estimates derived from twin studies, or additional sources of genetic variation such as those derived from non-additive genetic effects. In fact, they opine that in the latter case, we need data “…for example, from large population samples with extensive phenotypic and DNA sequence information, detailed measures of environmental exposures and larger pedigrees including non-twin relationships” (p. 7). In other words, despite the large sample size and sophisticated statistics, because there are no risky tests or hypotheses involved here, it is difficult to further our knowledge about the etiology of any of these disorders beyond what the twin correlation patterns tell us.

II. Research design versus statistical inference and how they affect causal inference

While we emphasized a risky test and a good research design as being of central importance, several commentators noted the need for large sample sizes and replicability as a fundamental issue. Ioannidis’ (this issue) commentary is an exemplar of this perspective; Lilienfeld and Pinto (this issue) likewise noted the need for replicability and highlighted the importance of convergence of indicators. These are all additional important components of risky test taking because they all increase the likelihood of disconfirming or at least pointing to the limitations of a theory. But what type of replication is most useful under what circumstances, and is it enough to just ask for large samples? How large is large enough – hundreds, thousands, tens of thousands, millions, or simply large enough to reflect accurate capture of a predicted effect size related to the question at hand? One way to answer this question (without specifying some arbitrary N) would be to require studies to provide a rationale (based on the state of prior knowledge) as to why a sample should be adequately powered to detect some predicted effect size (see Miller and Yee (this issue) for a similar point), to avoid publication of results from small studies that could be statistically significant based on chance alone (e.g., Button et al. (2013) found that the median statistical power in neuroscience studies is 21%), or are so underpowered that a null finding is uninformative. Genetic association studies of complex traits and diseases must be very large because the expected effect sizes are small, with “large” effects accounting for a fraction of a percent of variance for continuous traits, for example, whether those traits are questionnaire responses, EEG recordings, or a brain scan.

Our point is that while all design elements (e.g., experimental design, extremely large samples, replicability, etc.) work together to provide useful additional evidence regarding a theory’s robustness and the limits of its applicability, none of these factors alone is sufficient to confirm or disprove a theory. Consider a simple example – every day millions of people observe the sun moving from east to west from different parts of the world; multiple repeated measurements are obtained from a very large sample, but as we all know, the sun does not revolve around the earth. In and of itself, a large sample does not help here – and actually leads to the incorrect conclusion in this case. Simply put, it is not risky enough a test on its own. And yet, the geocentric or Ptolemaic system was dominant for hundreds of years. It was only with the addition of other information that could not be confirmed with the naked eye and theoretical postulates which were later supported (e.g., elliptical orbits, phases of Venus, Jupiter’s moons) that the test became riskier, the data failed to fit the geocentric theory, and the findings were deemed to fit the Copernican theory better. It was the combination of multiple elements of theory testing that falsified the geocentric theory.

III. How does one decide what is sufficient evidence for a theory?

Agrawal and Bogdan (this issue) , Ioannidis (this issue) , and Widiger, Crego, and Oltmanns (this issue) all noted that there are no objective standards to determine how to optimally interpret findings in such a way that technology, statistics, and multiple converging lines of evidence from different levels of analysis can be used to decide what constitutes sufficient empirical support. Because there is no specific numerical cutoff or entirely objective criterion, we should strive to develop and use research designs that put competing theories at risk. We contend (despite Markon’s (this issue) or Ioannidis’ (this issue) assertions) that almost any criterion used to select theories such as “good fit”, “parsimony”, “harm minimization”, etc. all contain an element of subjectivity that renders them difficult to adopt in a universally accepted, objective manner. This is where the art of science applies – in positing theories that may sometimes go beyond what seems reasonable given existing knowledge (e.g., as in the example provided above about whether the geocentric or Copernican models reflected reality), while evaluating them using well thought out research designs that provide risky tests and narrow the number of interpretative possibilities.

Agrawal and Bogdan (this issue) present a compelling example of the successful application of a risky test in their paper where they review evidence from quasi-experimental research showing that early marijuana use results in higher risk for use of other substances later on. While the results they describe are consistent with a “gateway” model of causation, follow-up studies can test more directly the gateway interpretation, and rule out alternative interpretations. For example, it may be that deviant peers affect both early marijuana use and later hard substance use, a possibility that could be explored using twins concordant for early marijuana use who are discordant for deviant peer relationships. Unfortunately, such twin pairs are not typical, rendering difficult ascertainment of a sufficiently large sample and possibly leaving unanswered questions regarding the generalizability of results. However, our point is that, when possible, we should capitalize on and value the results of such studies and continually look for complementary ways to put the theory at further risk. In this case, for instance, we may rely on a longitudinal study of more readily ascertainable discordant siblings instead of twins, or evaluate in a purely observational sample the effect of naturalistic switching of peer groups among early marijuana users as an instrumental variable. Far more clever designs whether observational, quasi-experimental, or even experimental, are surely possible.

We also acknowledge Klein and Hajcak’s (this issue) commentary in this regard. They not only expand upon our example of recurrent depression by providing a more comprehensive and in-depth investigation of the topic (differences in correlates of recurrent vs. single episode depression) by including more indices of neurobiology, self-report, diagnostic data, and functional outcomes in their response, they also include discussion of research using observational and quasi-experimental designs. Together these two commentaries ( Agrawal & Bogdan, this issue ; Klein & Hajcak, this issue ) represent precisely the kind of integrative thinking and risky testing we intended our article to spur. In our opinion, these are examples of the path our field should take to make substantive gains in knowledge.

Theme 2. Implications for Science Policy and Incorporating the Human Element into Research

A second theme in many of the commentaries is to redesign the incentive system in science to better support the accumulation of knowledge rather than focusing on publishing alone as an endpoint. As scientists, we have a responsibility to ourselves, the research community, and the public at large to put honesty and accuracy at the forefront of our research and communicate results accordingly. However, as much as science is considered to be an objective profession, as we have repeatedly attempted to underscore throughout our discussion, certain fundamental concepts in our field – e.g., “parsimony”, “harm”, “clinical significance”, “utility”, etc. – are not quantifiable using a universally accepted metric. Neither do we operate in a purely scientific vacuum that is unhindered by concerns such as job security, funding, acceptance by peers, desire for fame and prestige, and so on. Our current system based on publications and peer review all but ignores such human elements, and focuses simply on outcomes such as number of publications, impact factors of journals that we publish in, amount of grant funding and so on. This set of external contingencies affects science in two ways – at the group level, in which fields like psychiatry embrace a certain school of thought (e.g., currently neurobiology and genetics), and at the individual level (where scientists have pressures to produce results – any result, not necessarily accurate ones.).

I. Science as a political process

A couple of our commentators ( Widiger, et al., this issue ; Zachar, this issue ) alluded to this issue, pointing out correctly that science is a competitive, political process. They noted that self-critical examination of pet theories by scientists is essential, and that competing viewpoints need to be acknowledged and discussed. This is perhaps especially true for major decisions that affect healthcare worldwide, such as the classification of personality disorders by the DSM 5 workgroup (as outlined by Widiger et al. (this issue) ). We concur with our commentators that risky tests are a good way to arbitrate between competing models even in such political decisions. Our commentators also stressed that convergence between results from disparate methodologies and domains is a key component in building robust theories with explanatory potential. However, as we noted earlier, questions of utility can be distinct from questions of etiology. One may not need a rigorous experimental design to test whether people get better after receiving psychotherapy, for example, although whether the psychotherapy caused the improvement is a question that requires quasi-experimental and experimental research. Similarly, knowledge of a mechanistic causal relationship between variants in nicotinic receptor gene and increased cigarette smoking may have zero impact on clinical treatment if the causal chain explains only a tiny fraction of risk for smoking. That said, knowledge of etiology in both cases is useful when making informed decisions about how to allocate resources for treatment. If psychotherapy has no causal relationship with improved outcome, then one could imagine replacing psychotherapy with a less expensive alternative, with no detriment to the patient. If the all genetic effects within a candidate gene(s) are known to be very near zero, then targeting that system for therapeutic development may not be appropriate.

Political decisions are, well, political, but certainly can be informed by scientific understanding. In turn, we contend that scientific understanding is accelerated when scientists undertake research programs with study designs that permit risky tests of etiological theories from multiple angles – whether those risky tests involve large samples, multiple methodologies, (quasi-)experimental designs, longitudinal data, or some variation and/or combination of all the above.

II. Scientists as faulty human beings

Aside from the political issues noted above, we think there is a broader issue of responsible science at the individual level. Currently, as most readers are aware, research involves obtaining resources to fund research, collecting data, performing some sort of study (e.g., in a lab setting, running a statistical model), getting a (statistically significant) result, writing up said result in a manuscript, submitting it for peer review and hopefully publishing it, and repeating the cycle.

The number of publications affects job security as well as future grant funding, regardless of their scientific quality, reproducibility, and actual contribution to the state of knowledge in a particular field. One of our commentators ( Ioannidis, 2011 ) has already shown very elegantly that there are more studies in the volumetric brain imaging literature with statistically significant results, than what would be expected from power calculations using sample sizes in those studies alone. Others ( Fanelli, 2010 , 2011 ) have similarly noted that the number of positive findings in published scientific papers across all fields has increased by 22% from 1990 to 2007; this increase was especially marked for the social sciences in which “…the odds of reporting a positive result were around 5 times higher among papers in the disciplines of Psychology and Psychiatry and Economics and Business compared to Space Science” (p. e10068). This hyper-emphasis on the number of publications with positive findings (and grants) is occurring in the context of decreasing funding for research, as noted by the head of NIH, Francis Collins ( Szabo, 2014 ), leading to decreasing numbers of academic positions for young investigators ( Harris, 2014 ).

An additional problem that has been gaining more attention in recent years is that of researchers falsifying or more commonly, being unknowingly careless with their data and analysis. Several high profile recent cases include that of Diederik Stapel ( Levelt Committee, Noort Committee, & Drenth Committee, 2015 ), Marc Hauser ( Department of Health and Human Services, 2012 ), Andrew Wakefield ( Dominus, 2011 ), to name a few. The true incidence of falsification is unknown, though estimates of irreproducible research from fields such as preclinical research exceed 50% and cost about $28 billion per year ( Freedman, Cockburn, & Simcoe, 2015 ). Likewise, the frequency of retractions has been found to be strongly correlated with the impact factor of a journal ( Fang & Casadevall, 2011 ). While speculative, results such as these suggest that the pressure to publish positive findings might motivate some to engage in questionable data analytic practices, knowingly or unknowingly. An uber-competitive system that emphasizes number of publications and promotes a winner-takes-all approach (all funding, all jobs, all big ideas, all credit), does little to foster responsible research, scientific practice and, by corollary, accumulation of knowledge. It is impossible to prevent any researcher from ever falsifying data or engaging in questionable research practices. However, it is possible to modify the current incentive system to overcome or mitigate some of these challenges.

How can we design such a system? As a first step, we can stop relying exclusively on the number of publications, or publications in high-tier journals as a primary metric of good research. What would be a good alternative? Perhaps whether an investigator enters into multi-site, team-based collaborations, or perhaps the number of times they share their data for replication with other investigators, or even the number of times they attempt to undertake replications of their or others’ findings. Note that such metrics are independent of whether some researcher gets a positive or negative result. In other words, we reward acts like collaboration and data sharing rather than exclusively focusing on the outcome, which would result in larger datasets for analyses, greater transparency in procedures used, and multiple theoretical perspectives to analyzing data. The 1000 Genomes Project is a great example of how the power of combined datasets and public data release can lead to greater knowledge about the topic they are focused on ( The 1000 Genomes Project Consortium, 2012 ). 1000 Genomes was especially powerful because all raw data is public – anyone can download it. While more restrictive than 1000 Genomes, several data repositories have been formed in recent years such as the Database of Genotypes and Phenotypes (dbGAP; http://www.ncbi.nlm.nih.gov/gap ), the National Database of Autism Research (NDAR; https://ndar.nih.gov/ ), and Research Domain Criteria database (RDoCdb; http://rdocdb.nimh.nih.gov/ ). What is needed at this point are incentives to submit to, contribute to, and utilize such databases. In this regard, it is encouraging that NIH and other organizations have set out guidelines and started various initiatives to encourage replicable work and replication amongst researchers ( Bobrow, 2015 ; Collins & Tabak, 2014 ; NIH, 2015 ).

Second, as Miller and Yee (this issue) suggested, we could build good theory testing as a peer review criterion for a grant proposal or manuscript submitted for publication. For example, NIH uses a peer review system to evaluate grants where each reviewer is asked to take into account the following five criteria: significance, suitability of investigators, innovation, research approach, and suitability of the environment. Ioannidis, in the recent past, has been a vocal critic of this evaluative system (see Nicholson & Ioannidis, 2012 and series of responses from the Office of Portfolio Analysis at NIH and others in Nature ), noting that NIH rarely ever funds truly innovative research. Perhaps the problem here in part is relying on a subjective criterion such as innovation, which as Miller and Yee (this issue) posited, may involve little more than application of novel technology. On the other hand, if the grant review process assigned greater value to a research design that provided a risky test of an influential theory, then both investigators and peer reviewers would be more likely to recognize the merit of research that proposes a risky test involving a well conceptualized theory.

Third, journals could prioritize publishing adequately powered and designed replications, and mirroring this, granting institutions and departments could also weigh heavily the value of replications in their grant review and/or tenure process. To some extent, we cannot blame investigators for not wanting to attempt replication studies if journal editors do not want to publish them, and research departments and institutions do not view replication as nearly as important as a faculty member’s ability to establish an independent line of research. While one objection to encouraging a program of replication would be that it would slow down progress, we believe the opposite would happen instead: It would ensure that major studies are replicated by different scientists, thereby providing the solid foundation needed to justify subsequent investment of resources to build on the findings. In this context, it is encouraging to see replication ventures such as the Reproducibility Project: Psychology and Many Labs receive much positive publicity and editorials in conventional journals like Nature ( Baker, 2015 ; Yong, 2013 ).

We have attempted to synthesize and draw out the common themes from among the varied perspectives provided by our commentators. We were pleased to see that all our commentators supported our general conclusion that the way research in psychology and psychiatry is currently conducted is not satisfactory; the way forward, though, was not as clear. In our rejoinder, we have proposed that the key is to design a system of science that rewards undertaking risky, collaborative, and replicable research, rather than focusing merely on a particular methodology, or feature of research such as statistics, novel technologies, or large sample sizes.

We are not alone in such calls for reforming research in psychology. As mentioned earlier, Ioannidis, Fanelli, and their colleagues have been very active this field. Likewise, Brian Nosek and Yoav Bar-Anan (2012) have published in this same journal (see Psychological Inquiry Vol. 23(3) for target article and commentaries), arguing the need for a revamped system for scientific communication – especially one that is undergirded by openness and transparency at all levels, including the availability of data, the peer review process, and continuous post publication review. Nosek has a taken such calls one step further, and founded the Center for Open Science (COS), which attempts to foster exactly the kind of work he and his co-author outlined in their target article ( Nosek & Bar-Anan, 2012 ). We are in complete agreement with their efforts and find commendable that he and his colleagues not only “talked the talk” but are “walking the walk”!

Another initiative that was mentioned quite often throughout the commentaries was the Research Domain Criteria (RDoC; Insel et al., 2010 ). Several commentators offered perspectives on RDoC and provided suggestions for its improvement. Regier (this issue) exhorted the critical need for systems such as RDoC while urging NIMH to rely not just on basic science research, but focus on clinical, epidemiological, and health services research as well. Similarly, Zachar (this issue) emphasized “convergence seeking is what RDoC should evolve into”. Likewise, Kagan’s (this issue) central point is also highly pertinent to RDoC: against the backdrop of biology and genetics, it is nevertheless the case that the environment profoundly impacts what is or is not perceived as a disorder. Miller and Yee (this issue) , based on their communications with RDoC workgroup members, noted that this is indeed the case – that in actuality, RDoC is not reductionistic and that it does incorporate levels of analysis ranging from the biological to the psychological, as can be seen from the RDoC matrix. RDoC has the potential to improve research in ways that we and many of our commentators would likely endorse. In a recent blog post, the Director of NIMH, Thomas Insel, refers to RDoC as “convergent science” and as “bringing together many levels of analysis” ( Insel, 2015 ). Bruce Cuthbert, the Director of the RDoC Unit at NIMH, has likewise noted that constructs included in the RDoC matrix had to be defined in terms of some behavioral or cognitive process, be linked to a neural circuit, and be relevant to psychopathology ( Cuthbert, 2015 ), thus emphasizing the importance of a conceptual (if not quite theoretical) connection across research domains.

It is heartening to see that our field is starting to evolve in the various directions proposed by our commentators including replicability, large scale research, and encouraging convergence amongst various types of measures. We would still contend that neither addresses explicitly what we consider a linchpin of good research that holds all these elements in place – i.e., risky tests.

We started our target article with a quote, and would like to book-end our response with another one:

"After a certain high level of technical skill is achieved, science and art tend to coalesce in esthetics, plasticity, and form. The greatest scientists are always artists as well." —Albert Einstein
  • Agrawal A, Bogdan R. Risky business: Pathways to progress in biologically informed studies of psychopathology. Psychological Inquiry. (this issue) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Baker M. First results from psychology’s largest reproducibility test. Nature. 2015 [ Google Scholar ]
  • Bobrow M. Funders must encourage scientists to share. Nature. 2015; 522 (7555):129–129. [ PubMed ] [ Google Scholar ]
  • Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience. 2013; 14 (5):365–376. [ PubMed ] [ Google Scholar ]
  • Collins FS, Tabak LA. NIH plans to enhance reproducibility. Nature. 2014; 505 (7485):612–613. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cuthbert BN. Research Domain Criteria: toward future psychiatric nosologies. Dialogues in clinical neuroscience. 2015; 17 (1):89. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Department of Health and Human Services. Case Summary: Hauser, Marc. 2012 Retrieved from https://ori.hhs.gov/content/case-summary-hauser-marc .
  • Dominus S. The crash and burn of an autism guru. [Retrieved June 14, 2015]; 2011 from http://www.nytimes.com/2011/04/24/magazine/mag-24Autism-t.html?_r=0 . [ Google Scholar ]
  • Fanelli D. “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE. 2010; 5 (4):e10068. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics. 2011; 90 (3):891–904. [ Google Scholar ]
  • Fang FC, Casadevall A. Retracted Science and the Retraction Index. Infection and Immunity. 2011; 79 (10):3855–3859. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLoS Biol. 2015; 13 (6):e1002165. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Harris R. Too few university jobs for America’s young scientists. NPR. org. 2014 [ Google Scholar ]
  • Insel T. Crowdsourcing RDoC. 2015 Retrieved from http://www.nimh.nih.gov/about/director/2015/crowdsourcing-rdoc.shtml .
  • Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, Wang P. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. American Journal of Psychiatry. 2010; 167 (7):748–751. [ PubMed ] [ Google Scholar ]
  • Ioannidis JPA. Excess significance bias in the literature on brain volume abnormalities. Archives of General Psychiatry. 2011; 68 (8):773–780. [ PubMed ] [ Google Scholar ]
  • Ioannidis JPA. Research and theories on the etiology of mental diseases: doomed to failure? Psychological Inquiry. (this issue) [ Google Scholar ]
  • Kagan J. Amen. Psychological Inquiry. (this issue) [ Google Scholar ]
  • Klein DN, Hajcak G. Heterogeneity of depression: Clinical Considerations and Psychophysiological Measures. Psychological Inquiry. (this issue) [ Google Scholar ]
  • Levelt Committee, Noort Committee, & Drenth Committee. Stapel Investigation. 2015 from https://www.commissielevelt.nl/ [ Google Scholar ]
  • Lilienfeld SO, Pinto MA. Risky tests of etiological models in psychopathology research: The need for meta-methodology. Psychological Inquiry. (this issue) [ Google Scholar ]
  • Markon KE. Ontology, measurement, and other fundamental problems of scientific inference. Psychological Inquiry. (this issue) [ Google Scholar ]
  • Miller GA, Yee CM. Moving psychopathology forward. Psychological Inquiry. (this issue) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Nicholson JM, Ioannidis JPA. Research grants: Conform and be funded. [10.1038/492034a] Nature. 2012; 492 (7427):34–36. doi: http://www.nature.com/nature/journal/v492/n7427/abs/492034a.html#supplementary-information . [ PubMed ] [ Google Scholar ]
  • NIH. NOT-OD-15-103: Enhancing Reproducibility through Rigor and Transparency. 2015 from http://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-103.html .
  • Nosek BA, Bar-Anan Y. Scientific Utopia: I. Opening cientific Communication. Psychological Inquiry. 2012; 23 (3):217–243. [ Google Scholar ]
  • Polderman TJ, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, Posthuma D. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics. 2015 [ PubMed ] [ Google Scholar ]
  • Regier DA. Potential DSM-5 and RDoC synergy for mental health research, treatment, and health policy advances. Psychological Inquiry. (this issue) [ Google Scholar ]
  • Szabo L. NIH director: Budget cuts put U.S. science at risk. [Retrieved June 2014, 2015]; 2014 from http://www.usatoday.com/story/news/nation/2014/04/23/nih-budget-cuts/8056113/ [ Google Scholar ]
  • The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. [10.1038/nature11632] Nature. 2012; 491 (7422):56–65. doi: http://www.nature.com/nature/journal/v491/n7422/abs/nature11632.html#supplementary-information . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Widiger TA, Crego C, Oltmanns JR. The validation of a classification of psychopathology. Psychological Inquiry. (this issue) [ Google Scholar ]
  • Yong E. Psychologists strike a blow for reproducibility. Nature. 2013; 11 :26. [ Google Scholar ]
  • Zachar P. Popper, Meehl, and progress: The evolving concept of risky test in the science of psychopathology. Psychological Inquiry. (this issue) [ Google Scholar ]

COMMENTS

  1. The Economics of Reproducibility in Preclinical Research

    An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible-in the United States alone. We outline a framework for solutions and a plan for long-term improvements ...

  2. The economics of reproducibility in preclinical research

    Freedman, Leonard P. Cockburn, Iain M. Simcoe, Timothy S. Export Citation. Download to BibTex. Download to EndNote/RefMan (RIS) Metadata Show full item record ... Version. Published version. Citation (published version) Leonard P Freedman, Iain M Cockburn, Timothy S Simcoe. 2015. "The Economics of Reproducibility in Preclinical Research." PLoS ...

  3. The Economics of Reproducibility in Preclinical Research

    Leonard P Freedman & Iain M Cockburn & Timothy S Simcoe, 2015. "The Economics of Reproducibility in Preclinical Research," PLOS Biology, Public Library of Science, vol. 13(6), pages 1-9, June. Handle: RePEc:plo:pbio00:1002165 DOI: 10.1371/journal.pbio.1002165

  4. The Economics of Reproducibility in Preclinical Research

    The Economics of Reproducibility in Preclinical Research. Leonard P Freedman, Iain Cockburn and Timothy S Simcoe. PLOS Biology, 2015, vol. 13, issue 6, 1-9 . Abstract: Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative ...

  5. The Economics of Reproducibility in Preclinical Research

    L. Freedman, I. Cockburn, Timothy S. Simcoe; Published in PLoS Biology 1 June 2015; Economics, Medicine; ... An experiment to ascertain the "identifiability" of research resources in the biomedical literature and provides recommendations to authors, reviewers, journal editors, vendors, and publishers show that identifiability is a serious ...

  6. PDF The Economics of Reproducibility in Preclinical Research

    DefiningReproducibility Studies ofreproducibility define thephenomenon inanumber ofways[10]. Forexample, somestudies define reproducibility astheability toreplicate thesame resultsdemonstratedin

  7. The Economics of Reproducibility in Preclinical Research

    All content in this area was uploaded by Timothy S. Simcoe on Nov 20, 2015 . ... Freedman LP, Cockburn IM, Simcoe TS ... unique identification of research resources in the biomedical literature.

  8. The Economics of Reproducibility in Preclinical Research

    The economics literature on standardization posits that unless there is a clearly dominant platform leader willing to impose a solution, complex challenges such as irreproducibility that require a coordinated response are best solved by internally organized and driven, dynamic, and self-regulating collaborations of key stakeholders who ...

  9. The Economics of Reproducibility in

    A framework for solutions and a plan for long-term improvements in reproducibility rates are outlined that will help to accelerate the discovery of life-saving therapies and cures within life science research. Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development.

  10. The Economics of Reproducibility in Preclinical Research

    An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone. We outline a framework for solutions and a plan for long-term ...

  11. The role of replication studies in ecology

    psychology, (Camerer et al., 2018; Open Science Collaboration, 2015), from 11% to 49% in preclinical biomedicine (Freedman, Cockburn, & Simcoe, 2015), and from 67% to 78% in economics research (Camerer et al., 2016) depending on the study, and the measure of "successful" used (see Fidler et al., 2017 for a summary).

  12. Improving transparency and scientific rigor in academic publishing

    Open Science, peer review, policy, publishing, scientific rigor, transparency ... Freedman, Cockburn, & Simcoe, 2015). These factors and others might contribute to between 50% and 90% of the published papers ... (Freedman et al., 2015; Freedman, Venugopalan, & Wisman, 2017), yet poor descriptions of the pub‐ ...

  13. PDF Timothy S. Simcoe

    L. Freedman, I. Cockburn and T. Simcoe. The Economics of Reproducibility in Preclinical Research. PLoS Biology, 13(6): e1002165, June 2015. T. Simcoe and M. Toffel. Government Green Procurement Spillovers: Evidence from Municipal Building Policies in California. Journal of Environmental Economics and ... American Economic Review, 102(1): 305 ...

  14. ‪Iain M. Cockburn‬

    LP Freedman, IM Cockburn, TS Simcoe. PLoS biology 13 (6), e1002165, 2015. 1086: 2015: Industry Effects and Appropriability Measures in the Stock Market's Valuation of R&D and Patents. ... The American Economic Review 84 (5), 1213-1232, 1994. 276: 1994: The system can't perform the operation now. Try again later.

  15. Irreproducible biology research costs put at $28 billion per year

    The team estimates the overall rate of irreproducibility at 53%, but cautions that the true rate could be anywhere between 18% and 89%. That puts the potential economic cost of irreproducibility ...

  16. The Economics of Reproducibility in Preclinical Research

    The economics literature on standardization posits that unless there is a clearly dominant platform leader willing to impose a solution, complex challenges such as irreproducibility that require a coordinated response are best solved by internally organized and driven, dynamic, and self-regulating collaborations of key stakeholders who ...

  17. What Could Possibly Go Wrong? The Impact of Poor Data Management

    through federal grants. Freedman, Cockburn, and Simcoe9 estimate that the lack of reproducibility in scientific research costs $28 billion per year. The authors did not posit why these errors occurred or how they could have been avoided; it is possible rigorous data management practices may have mitigated some of the errors.

  18. Improving transparency and scientific rigor in academic publishing

    & Parker, 2017; Freedman, Cockburn, & Simcoe, 2015). These ... be able to adequately review methods and tools and subsequently might fail to notice that key details are missing. This can lead to a lack ... the literature, and reinforces the perception that negative findings

  19. The Art of Smart Science: Weaving Theory and Risky Study Design Into

    The true incidence of falsification is unknown, though estimates of irreproducible research from fields such as preclinical research exceed 50% and cost about $28 billion per year (Freedman, Cockburn, & Simcoe, 2015).

  20. Bridging statistics and life sciences undergraduate education

    A recent meta-analysis concluded that over 50% of all preclinical biomedical research findings are irreproducible, in part due to the improper understanding, use and reporting of statistical methods (Baker Citation 2016; Freedman, Cockburn, and Simcoe Citation 2015). The misuse of statistics permeates all stages of research, from literature ...

  21. A literature review by Freedman, Cockburn, and Simcoe in 2015 estimates

    A literature review is simply known to be books, scholarly articles, area of research, etc. that is related to a research problem being studied.A literature review by Freedman, Cockburn, and Simcoe in 2015 estimates that in the U.S., the failure of research to replicate is associated with a cost of Billions of dollars per year.. Freedman in his book in 2015 stated that the price we as a ...

  22. Improving transparency and scientific rigor in academic publishing

    Attempts to reproduce published results cost the United States approximately $28B annually (Freedman et al., 2015; Freedman, Venugopalan, & Wisman, 2017), yet poor descriptions of the published studies lead to a majority of studies becoming non‐replicable (Glasziou et al., 2014). The next subsections will break down some of the more common ...