National Institutes of Health (NIH) - Turning Discovery into Health

Research Methods Resources

Methods at a glance.

This section provides information and examples of methodological issues to be aware of when working with different study designs. Virtually all studies face methodological issues regarding the selection of the primary outcome(s), sample size estimation, missing outcomes, and multiple comparisons. Randomized studies face additional challenges related to the method for randomization. Other studies face specific challenges associated with their study design such as those that arise in effectiveness-implementation research; multiphase optimization strategy (MOST) studies; sequential, multiple assignment, randomized trials (SMART); crossover designs; non-inferiority trials; regression discontinuity designs; and paired availability designs. Some face issues involving exact tests, adherence to behavioral interventions, noncompliance in encouragement designs, evaluation of risk prediction models, or evaluation of surrogate endpoints.

Learn more about broadly applicable methods

Experiments, including clinical trials, differ considerably in the methods used to assign participants to study conditions (or study arms) and to deliver interventions to those participants.

This section provides information related to the design and analysis of experiments in which (1) participants are assigned in groups (or clusters) and individual observations are analyzed to evaluate the effect of the intervention, (2) participants are assigned individually but receive at least some of their intervention with other participants or through an intervention agent shared with other participants, and (3) participants are assigned in groups (or clusters) but groups cross-over to the intervention condition at pre-determined time points in sequential, staggered fashion until all groups receive the intervention.

This material is relevant for both human and animal studies as well as basic and applied research. And while it is important for investigators to become familiar with the issues presented on this website, it is even more important that they collaborate with a methodologist who is familiar with these issues.

In a parallel group-randomized trial, also called a parallel cluster-randomized trial, groups or clusters are randomized to study conditions, and observations are taken on the members of those groups with no crossover of groups or clusters to a different condition or study arm during the trial.  

Learn more about GRTs

In an individually randomized group-treatment trial, also called a partially clustered design, individuals are randomized to study conditions but receive at least some of their intervention with other participants or through an intervention agent shared with other participants.

Learn more about IRGTs

In a stepped wedge group-randomized trial, also called a stepped wedge cluster-randomized trial, groups or clusters are randomized to sequences which cross-over to the intervention condition at predetermined time points in a sequential, staggered fashion until all groups receive the intervention.

Learn more about SWGRTs

NIH Clinical Trial Requirements

The NIH launched a series of initiatives to enhance the accountability and transparency of clinical research. These initiatives target key points along the entire clinical trial lifecycle, from concept to reporting the results.

Check out the  Frequently Asked Questions  section or send us a message . 

Disclaimer: Substantial effort has been made to provide accurate and complete information on this website. However, we cannot guarantee that there will be no errors. Neither the U.S. Government nor the National Institutes of Health (NIH) assumes any legal liability for the accuracy, completeness, or usefulness of any information, products, or processes disclosed herein, or represents that use of such information, products, or processes would not infringe on privately owned rights. The NIH does not endorse or recommend any commercial products, processes, or services. The views and opinions of authors expressed on NIH websites do not necessarily state or reflect those of the U.S. Government, and they may not be used for advertising or product endorsement purposes.

  • Skip to main content
  • Skip to FDA Search
  • Skip to in this section menu
  • Skip to footer links

U.S. flag

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

U.S. Food and Drug Administration

  •   Search
  •   Menu
  • For Patients
  • Clinical Trials: What Patients Need to Know

What Are the Different Types of Clinical Research?

Different types of clinical research are used depending on what the researchers are studying. Below are descriptions of some different kinds of clinical research.

Treatment Research generally involves an intervention such as medication, psychotherapy, new devices, or new approaches to surgery or radiation therapy. 

Prevention Research looks for better ways to prevent disorders from developing or returning. Different kinds of prevention research may study medicines, vitamins, vaccines, minerals, or lifestyle changes. 

Diagnostic Research refers to the practice of looking for better ways to identify a particular disorder or condition. 

Screening Research aims to find the best ways to detect certain disorders or health conditions. 

Quality of Life Research explores ways to improve comfort and the quality of life for individuals with a chronic illness. 

Genetic studies aim to improve the prediction of disorders by identifying and understanding how genes and illnesses may be related. Research in this area may explore ways in which a person’s genes make him or her more or less likely to develop a disorder. This may lead to development of tailor-made treatments based on a patient’s genetic make-up. 

Epidemiological studies seek to identify the patterns, causes, and control of disorders in groups of people. 

An important note: some clinical research is “outpatient,” meaning that participants do not stay overnight at the hospital. Some is “inpatient,” meaning that participants will need to stay for at least one night in the hospital or research center. Be sure to ask the researchers what their study requires. 

Phases of clinical trials: when clinical research is used to evaluate medications and devices Clinical trials are a kind of clinical research designed to evaluate and test new interventions such as psychotherapy or medications. Clinical trials are often conducted in four phases. The trials at each phase have a different purpose and help scientists answer different questions. 

Phase I trials Researchers test an experimental drug or treatment in a small group of people for the first time. The researchers evaluate the treatment’s safety, determine a safe dosage range, and identify side effects. 

Phase II trials The experimental drug or treatment is given to a larger group of people to see if it is effective and to further evaluate its safety.

Phase III trials The experimental study drug or treatment is given to large groups of people. Researchers confirm its effectiveness, monitor side effects, compare it to commonly used treatments, and collect information that will allow the experimental drug or treatment to be used safely. 

Phase IV trials Post-marketing studies, which are conducted after a treatment is approved for use by the FDA, provide additional information including the treatment or drug’s risks, benefits, and best use.

Examples of other kinds of clinical research Many people believe that all clinical research involves testing of new medications or devices. This is not true, however. Some studies do not involve testing medications and a person’s regular medications may not need to be changed. Healthy volunteers are also needed so that researchers can compare their results to results of people with the illness being studied. Some examples of other kinds of research include the following: 

A long-term study that involves psychological tests or brain scans

A genetic study that involves blood tests but no changes in medication

A study of family history that involves talking to family members to learn about people’s medical needs and history.

Research methods & reporting

Process guide for inferential studies using healthcare data from routine clinical practice to evaluate causal effects of drugs, updated recommendations for the cochrane rapid review methods guidance for rapid reviews of effectiveness, avoiding conflicts of interest and reputational risks associated with population research on food and nutrition, the estimands framework: a primer on the ich e9(r1) addendum, evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study, evaluation of clinical prediction models (part 2): how to undertake an external validation study, evaluation of clinical prediction models (part 1): from development to external validation, emulation of a target trial using electronic health records and a nested case-control design, rob-me: a tool for assessing risk of bias due to missing evidence in systematic reviews with meta-analysis, enhancing reporting quality and impact of early phase dose-finding clinical trials: consort dose-finding extension (consort-define) guidance, enhancing quality and impact of early phase dose-finding clinical trial protocols: spirit dose-finding extension (spirit-define) guidance, understanding how health interventions or exposures produce their effects using mediation analysis, a guide and pragmatic considerations for applying grade to network meta-analysis, a framework for assessing selection and misclassification bias in mendelian randomisation studies: an illustrative example between bmi and covid-19, practical thematic analysis: a guide for multidisciplinary health services research teams engaging in qualitative analysis, selection bias due to conditioning on a collider, the imprinting effect of covid-19 vaccines: an expected selection bias in observational studies, a step-by-step approach for selecting an optimal minimal important difference, recommendations for the development, implementation, and reporting of control interventions in trials of self-management therapies, methods for deriving risk difference (absolute risk reduction) from a meta-analysis, transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses, consort harms 2022 statement, explanation, and elaboration: updated guideline for the reporting of harms in randomised trials, transparent reporting of multivariable prediction models: : explanation and elaboration, transparent reporting of multivariable prediction models: tripod-cluster checklist, bias by censoring for competing events in survival analysis, code-ehr best practice framework for the use of structured electronic healthcare records in clinical research, validation of prediction models in the presence of competing risks, reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence, searching clinical trials registers: guide for systematic reviewers, how to design high quality acupuncture trials—a consensus informed by evidence, early phase clinical trials extension to guidelines for the content of statistical analysis plans, incorporating dose effects in network meta-analysis, consolidated health economic evaluation reporting standards 2022 statement, strengthening the reporting of observational studies in epidemiology using mendelian randomisation (strobe-mr): explanation and elaboration, a new framework for developing and evaluating complex interventions, adapting interventions to new contexts—the adapt guidance, recommendations for including or reviewing patient reported outcome endpoints in grant applications, consort extension for the reporting of randomised controlled trials conducted using cohorts and routinely collected data (consort-routine): checklist with explanation and elaboration, consort extension for the reporting of randomised controlled trials conducted using cohorts and routinely collected data, guidance for the design and reporting of studies evaluating the clinical performance of tests for present or past sars-cov-2 infection, the prisma 2020 statement: an updated guideline for reporting systematic reviews, prisma 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews, preferred reporting items for journal and conference abstracts of systematic reviews and meta-analyses of diagnostic test accuracy studies (prisma-dta for abstracts): checklist, explanation, and elaboration, designing and undertaking randomised implementation trials: guide for researchers, start-rwe: structured template for planning and reporting on the implementation of real world evidence studies, methodological standards for qualitative and mixed methods patient centered outcomes research, grade approach to drawing conclusions from a network meta-analysis using a minimally contextualised framework, grade approach to drawing conclusions from a network meta-analysis using a partially contextualised framework, use of multiple period, cluster randomised, crossover trial designs for comparative effectiveness research, when to replicate systematic reviews of interventions: consensus checklist, reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the consort-ai extension, guidelines for clinical trial protocols for interventions involving artificial intelligence: the spirit-ai extension, preferred reporting items for systematic review and meta-analysis of diagnostic test accuracy studies (prisma-dta): explanation, elaboration, and checklist, non-adherence in non-inferiority trials: pitfalls and recommendations, the adaptive designs consort extension (ace) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design, machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness, calculating the sample size required for developing a clinical prediction model, spirit extension and elaboration for n-of-1 trials: spent 2019 checklist, synthesis without meta-analysis (swim) in systematic reviews: reporting guideline, alternative approaches for confounding adjustment in observational studies using weighting based on the propensity score: a primer for practitioners, a guide to prospective meta-analysis, rob 2: a revised tool for assessing risk of bias in randomised trials, consort 2010 statement: extension to randomised crossover trials, when and how to use data from randomised trials to develop or validate prognostic models, guide to presenting clinical prediction models for use in clinical settings, a guide to systematic review and meta-analysis of prognostic factor studies, when continuous outcomes are measured using different scales: guide for meta-analysis and interpretation, the reporting of studies conducted using observational routinely collected health data statement for pharmacoepidemiology (record-pe), reporting of stepped wedge cluster randomised trials: extension of the consort 2010 statement with explanation and elaboration, delta,2, guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial, outcome reporting bias in trials: a methodological approach for assessment and adjustment in systematic reviews, reading mendelian randomisation studies: a guide, glossary, and checklist for clinicians, how to use fda drug approval documents for evidence syntheses, how to avoid common problems when using in research: 10 issues to consider, tidier-php: a reporting guideline for population health and policy interventions, analysis of cluster randomised trials with an assessment of outcome at baseline, key design considerations for adaptive clinical trials: a primer for clinicians, population attributable fraction, how to estimate the effect of treatment duration on survival outcomes using observational data, concerns about composite reference standards in diagnostic research, statistical methods to compare functional outcomes in randomized controlled trials with high mortality, consort-equity 2017 extension and elaboration for better reporting of health equity in randomised trials, handling time varying confounding in observational research, four study design principles for genetic investigations using next generation sequencing, amstar 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both, multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples, stard for abstracts: essential items for reporting diagnostic accuracy studies in journal or conference abstracts, statistics notes: percentage differences, symmetry, and natural logarithms, statistics notes: what is a percentage difference, gripp2 reporting checklists: tools to improve reporting of patient and public involvement in research, enhancing the usability of systematic reviews by improving the consideration and description of interventions, how to design efficient cluster randomised trials, consort 2010 statement: extension checklist for reporting within person randomised trials, life expectancy difference and life expectancy ratio: two measures of treatment effects in randomised trials with non-proportional hazards, standards for reporting implementation studies (stari) statement, meta-analytical methods to identify who benefits most from treatments: daft, deluded, or deft approach, a guide to systematic review and meta-analysis of prediction model performance, consort 2010 statement: extension to randomised pilot and feasibility trials, robins-i: a tool for assessing risk of bias in non-randomised studies of interventions, follow us on, content links.

  • Collections
  • Health in South Asia
  • Women’s, children’s & adolescents’ health
  • News and views
  • BMJ Opinion
  • Rapid responses
  • Editorial staff
  • BMJ in the USA
  • BMJ in South Asia
  • Submit your paper
  • BMA members
  • Subscribers
  • Advertisers and sponsors

Explore BMJ

  • Our company
  • BMJ Careers
  • BMJ Learning
  • BMJ Masterclasses
  • BMJ Journals
  • BMJ Student
  • Academic edition of The BMJ
  • BMJ Best Practice
  • The BMJ Awards
  • Email alerts
  • Activate subscription


Book cover

  • © 2012

Principles of Research Methodology

A Guide for Clinical Investigators

  • Phyllis G. Supino 0 ,
  • Jeffrey S. Borer 1

, Cardiovascular Medicine, SUNY Downstate Medical Center, Brooklyn, USA

You can also search for this editor in PubMed   Google Scholar

, Cardiovascualr Medicine, SUNY Downstate Medical Center, Brooklyn, USA

Based on a highly regarded and popular lecture series on research methodology

Comprehensive guide written by experts in the field

Emphasizes the essentials and fundamentals of research methodologies

75k Accesses

20 Citations

7 Altmetric

  • Table of contents

About this book

Editors and affiliations, bibliographic information.

  • Publish with us

Buying options

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (13 chapters)

Front matter, overview of the research process.

Phyllis G. Supino

Developing a Research Problem

  • Phyllis G. Supino, Helen Ann Brown Epstein

The Research Hypothesis: Role and Construction

Design and interpretation of observational studies: cohort, case–control, and cross-sectional designs.

  • Martin L. Lesser

Fundamental Issues in Evaluating the Impact of Interventions: Sources and Control of Bias

Protocol development and preparation for a clinical trial.

  • Joseph A. Franciosa

Data Collection and Management in Clinical Research

  • Mario Guralnik

Constructing and Evaluating Self-Report Measures

  • Peter L. Flom, Phyllis G. Supino, N. Philip Ross

Selecting and Evaluating Secondary Data: The Role of Systematic Reviews and Meta-analysis

  • Lorenzo Paladino, Richard H. Sinert

Sampling Methodology: Implications for Drawing Conclusions from Clinical Research Findings

  • Richard C. Zink

Introductory Statistics in Medical Research

  • Todd A. Durham, Gary G. Koch, Lisa M. LaVange

Ethical Issues in Clinical Research

  • Eli A. Friedman

How to Prepare a Scientific Paper

Jeffrey S. Borer

Back Matter

Principles of Research Methodology: A Guide for Clinical Investigators is the definitive, comprehensive guide to understanding and performing clinical research. Designed for medical students, physicians, basic scientists involved in translational research, and other health professionals, this indispensable reference also addresses the unique challenges and demands of clinical research and offers clear guidance in becoming a more successful member of a medical research team and critical reader of the medical research literature. The book covers the entire research process, beginning with the conception of the research problem to publication of findings. Principles of Research Methodology: A Guide for Clinical Investigators comprehensively and concisely presents concepts in a manner that is relevant and engaging to read. The text combines theory and practical application to familiarize the reader with the logic of research design and hypothesis construction, the importance of research planning, the ethical basis of human subjects research, the basics of writing a clinical research protocol and scientific paper, the logic and techniques of data generation and management, and the fundamentals and implications of various sampling techniques and alternative statistical methodologies. Organized in thirteen easy to read chapters, the text emphasizes the importance of clearly-defined research questions and well-constructed hypothesis (reinforced throughout the various chapters) for informing methods and in guiding data interpretation. Written by prominent medical scientists and methodologists who have extensive personal experience in biomedical investigation and in teaching key aspects of research methodology to medical students, physicians and other health professionals, the authors expertly integrate theory with examples and employ language that is clear and useful for a general medical audience. A major contribution to the methodology literature, Principles of Research Methodology: A Guide for Clinical Investigators is an authoritative resource for all individuals who perform research, plan to perform it, or wish to understand it better.

From the reviews:

Book Title : Principles of Research Methodology

Book Subtitle : A Guide for Clinical Investigators

Editors : Phyllis G. Supino, Jeffrey S. Borer


Publisher : Springer New York, NY

eBook Packages : Medicine , Medicine (R0)

Copyright Information : Springer Science+Business Media, LLC 2012

Hardcover ISBN : 978-1-4614-3359-0 Published: 22 June 2012

Softcover ISBN : 978-1-4939-4292-3 Published: 23 August 2016

eBook ISBN : 978-1-4614-3360-6 Published: 22 June 2012

Edition Number : 1

Number of Pages : XVI, 276

Topics : Oncology , Cardiology , Internal Medicine , Endocrinology , Neurology

Policies and ethics

  • Find a journal
  • Track your research

Clinical research methods for treatment, diagnosis, prognosis, etiology, screening, and prevention: A narrative review


  • 1 Department of Oncology, McMaster University, Hamilton, Ontario, Canada.
  • 2 Center for Clinical Practice Guideline Conduction and Evaluation, Children's Hospital of Fudan University, Shanghai, P.R. China.
  • 3 Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada.
  • 4 Department of Pediatrics, University of Antioquia, Colombia.
  • 5 Editorial Office, Chinese Journal of Evidence-Based Pediatrics, Children's Hospital of Fudan University, Shanghai, P.R. China.
  • 6 Division of Thoracic Surgery, Xuanwu Hospital, Capital Medical University, Beijing, P.R. China.
  • 7 Division of Neuropsychiatry and Behavioral Neurology and Clinical Psychology, Beijing Tiantan Hospital, Capital Medical University, Beijing, P.R. China.
  • 8 Division of Respirology, Tongren Hospital, Capital Medical University, Beijing, P.R. China.
  • 9 Division of Respirology, Xuanwu Hospital, Capital Medical University, Beijing, P.R. China.
  • 10 Division of Orthopedic Surgery, Juravinski Cancer Centre, McMaster University, Hamilton, Ontario, Canada.
  • PMID: 32445266
  • DOI: 10.1111/jebm.12384

This narrative review is an introduction for health professionals on how to conduct and report clinical research on six categories: treatment, diagnosis/differential diagnosis, prognosis, etiology, screening, and prevention. The importance of beginning with an appropriate clinical question and the exploration of how appropriate it is through a literature search are explained. There are three methodological directives that can assist clinicians in conducting their studies from a methodological perspective: (1) how to conduct an original study or a systematic review, (2) how to report an original study or a systematic review, and (3) how to assess the quality or risk of bias for a previous relevant original study or systematic review. This methodological overview article would provide readers with the key points and resources regarding how to perform high-quality research on the six main clinical categories.

Keywords: clinical research methods; diagnosis; literature search; prognosis; treatment.

© 2020 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.

Publication types

  • Biomedical Research / methods*
  • Biomedical Research / standards
  • Mass Screening
  • Preventive Medicine / methods*
  • Systematic Reviews as Topic
  • Therapeutics / methods*

Skip to content

Read the latest news stories about Mailman faculty, research, and events. 


We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health. 

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health. 

Clinical Research Methods

Director: Todd Ogden, PhD

The Mailman School offers the degree of  Master of Science in Biostatistics, with an emphasis on issues in the statistical analysis and design of clinical studies. The Clinical Research Methods track was conceived and designed for clinicians who are pursuing research careers in academic medicine.  Candidacy in the CRM program is open to anyone who holds a medical/doctoral degree and/or has several years of clinical research experience.


In addition to achieving the MS in Biostatistics core competencies, graduates of the 30 credit MS Clinical Research Methods Track develop specific competencies in data analysis and computing, public health and collaborative research, and data management. MS/CRM graduates will be able to:

Data Analysis and Computing

  • Apply the basic tenets of research design and analysis for the purpose of critically reviewing research and programs in disciplines outside of biostatistics;
  • Differentiate between quantitative problems that can be addressed with standard methods and those requiring input from a professional biostatistician.

Public Health and Collaborative Research

  • Formulate and prepare a written statistical plan for analysis of public health research data that clearly reflects the research hypotheses of the proposal in a manner that resonates with both co-investigators and peer reviewers;
  • Prepare written summaries of quantitative analyses for journal publication, presentations at scientific meetings, grant applications, and review by regulatory agencies;

Data Management

  • Identify the uses to which data management can be put in practical statistical analysis, including the establishment of standards for documentation, archiving, auditing, and confidentiality; guidelines for accessibility; security; structural issues; and data cleaning;
  • Differentiate between analytical and data management functions through knowledge of the role and functions of databases, different types of data storage, and the advantages and limitations of rigorous database systems in conjunction with statistical tools;
  • Describe the different types of database management systems, the ways these systems can provide data for analysis and interact with statistical software, and methods for evaluating technologies pertinent to both; and
  • Assess database tools and the database functions of statistical software, with a view to explaining the impact of data management processes and procedures on their own research. 

Required Courses

The required courses enable degree candidates to gain proficiency in study design, application of commonly-used statistical procedures, use of statistical software packages, and successful interpretation and communication of analysis results. A required course may be waived for students with demonstrated expertise in that field of study. If a student places out of one or more required courses, that student must substitute other courses, perhaps a more advanced course in the same area or another elective course in biostatistics or another discipline, with the approval of the student’s faculty advisor.

The program, which consists of 30 credits of coursework and research, may be completed in one year, provided the candidate begins study during the summer semester of his or her first year. If preferred, candidates may pursue the MS/CRM on a part-time basis. The degree program must be completed within five years of the start date.

The curriculum, described below, is comprised of 24 credits of required courses, including a 3-credit research project (the “Master’s essay”) to be completed during the final year of study, and two electives of 6 credits. Note that even if a course is waived, students must still complete a minimum of 30 credits to be awarded the MS degree.

Commonly chosen elective courses include:

Master's Essay

As part of MS/CRM training, each student is required to register for the 3-credit Master's essay course (P9160). This course provides direct support and supervision for the completion of the required research project, or Master's essay, consisting of a research paper of publishable quality. CRM candidates should register for the Master's essay during the spring semester of their final year of study. Students are required to come to the Master's essay course with research data in hand for analysis and interpretation.

CRM graduates have written excellent Master's essays over the years, many of which were ultimately published in the scientific literature. Some titles include:

  • A Comprehensive Analysis of the Natural History and the Effect of Treatment on Patients with Malignant Pleural Mesothelioma
  • Prevalence and Modification of Cardiovascular Risk Factors in Early Chronic Kidney Disease: Data from the Third National Health and Nutrition Examination Survey
  • Perspectives on Pediatric Outcomes: A Comparison of Parents' and Children's Ratings of Health-Related Quality of Life
  • Clinical and Demographic Profiles of Cancer Discharges throughout New York State Compared to Corresponding Incidence Rates, 1990-1994

Sample Timeline

Candidates may choose to complete the CRM program track on a part-time basis, or complete all requirements within one year (July through May). To complete the degree in one year, coursework must commence during the summer term. 

Note that course schedules change from year to year, so that class days/times in future years will differ from the sample schedule below; you must check the current course schedule for each year on the course directory page .

Paul McCullough Director of Academic Programs Department of Biostatistics Columbia University [email protected] 212-342-3417

More information on Admission Requirements and Deadlines.

Types of Primary Medical Research

Medical research may be classified as either primary or secondary research. Primary research entails conducting studies and collecting raw data. Secondary research evaluates or synthesizes data collected during primary research.

Primary medical research is categorized into three main fields: laboratorial, clinical, and epidemiological. Laboratory scientists analyze the fundamentals of diseases and treatments. Clinical researchers collaborate with participants to test new and established forms of treatment. Epidemiologists focus on populations to identify the cause and distribution of diseases.

Hierarchy of primary medical research

Basic/Laboratory Research

Laboratory, or basic, research involves scientific investigation and experimentation in a controlled environment to establish or confirm an understanding of chemical interactions, genetic material, cells, and biologic agents—more specifically, the agent’s relationships, behaviors, or properties. Basic science forms the knowledge-base and foundation upon which other types of research are built. Laboratory scientists investigate specific hypotheses which contribute to the development of new medical treatments.

An advantage of this type of research is that scientists can control the variables within a laboratory setting. Such a high level of control is often not possible outside of the laboratory. This leads to greater internal validity of a hypothesis and allows the testing of various aspects of disease and potential treatments. The key to laboratory research is to establish at least one independent variable, while holding all others constant. The standardized conditions of a laboratory setting also support the development of new medical imaging and diagnostic tools.

Applied research aims to solve problems such as treating a particular disease that is under investigation. There a number of different study types within applied research, including:

  • Animal studies: Animals are often induced to have a particular disease model so that the disease and potential treatments can be better understood for use with humans.
  • Biochemistry: Focuses upon the chemical processes that occur within the body; biochemistry also explores the metabolic basis of disease.
  • Cell study: Examines how cells develop and each cell type’s potential role in disease or treatment.
  • Genomics: Explores how all genes interact to influence the growth, health, and potential disease development of an organism/human.
  • Pharmacogenetics: Pharmacogenetics seeks to better understand the influence genes have upon how a patient might respond to any treatments they may receive. 3


Clinical Research

Clinical research is conducted to improve the understanding, treatment, or prevention of disease. Clinical studies examine individuals within a selected patient population. This type of research is usually interventional, but may also be observational or preventional. In order to categorize clinical research, it is useful to look at two factors: 1) the timing of the data collection (whether the study is retrospective or prospective) and 2) the study design (e.g. case-control, cohort). 4 Study integrity is improved through randomization, blinding, and statistical analysis. Researchers often test the efficacy and safety of drugs in clinical drug studies. Many clinical trials have a pharmacological basis. In addition, clinical studies may examine surgical, physical, or psychological procedures as well as new or conventional uses for medical devices. Researchers may perform diagnostic, retrospective, or case series observational studies to diagnose, treat, and monitor patients.

Treatments, dosages, and population can be exactly specified to control or minimize internal differences aside from the treatment.

Interventional/Clinical Trials

Clinical trials are defined by phases, with the first phase (Phase I) being the introduction of a new drug in to the human population. Before Phase I, animal testing will have been undertaken. 5 Phase I is conducted to assess the safety and maximum dosage that a majority or a significant portion of patients are able to tolerate. The following list describes the key elements of each clinical trial phase . 7

  • Phase I: This is the initial step in any drug development, a Phase I clinical trial includes a small number of people (usually 20-100) to determine the safety of a drug and the appropriate dosage.
  • Phase II: After success at Phase I, Phase II trials include larger groups of individuals (~100-300) and work to determine both efficacy as well as potential adverse reactions.
  • Phase III: At this stage, larger numbers of individuals (~300-3,000) with a specific condition are included within the trial. Trials seek to establish intervention effectiveness in treating a condition under normal use and to establish more robust safety and side effect data.
  • Phase IV: Following approval for public use, Phase IV trials are undertaken to understand the long-term impact of an intervention. At this stage, the drug may also be tested on “at-risk” populations, such as the elderly, to make sure that it is safe for a broader population.


In observational studies, the researcher does not seek to control any variables. Instead, the researcher observes participants (often retrospectively) over a specified period of time. In contrast to controlled and randomized interventional studies, treatment decisions are left to the doctor and patient. Comparisons may be made between individuals given two different types of therapy or having different prognostic variables (e.g. a particular condition). Diagnostic studies evaluate the accuracy of a diagnostic test or method in predicting or identifying a specific condition. Once a number of studies have undertaken an analysis of a single variable, a secondary analysis can take place either via a meta-analysis or literature review in order to see if there is consistency across study results.

Epidemiological Research

Epidemiologists investigate the causes, distribution, and historical changes in the frequency of disease. For example, researchers have looked for trends in cancer or flu outbreaks to determine their cause and ways to prevent or reduce the spread either of these types of disease. These studies can be interventional, but are usually observational due to ethical, social, political, and health risk factors.


  • Intervention Study: These studies explore changes in health or disease outcomes after the introduction of a specific intervention. For example, the effect of adding fluoride to drinking water was studied through interventional epidemiologic studies in the United States in the 1940s. Another study undertaken in the U.S. sought to assess how a diet high in fruit and vegetables and low in red meat and processed food might impact sodium levels of individuals when compared with a traditional American diet. 6
  • Cohort (Follow-up) Study: Observational studies can include many thousands of individuals and because of this, they can be time-consuming and expensive to undertake. To overcome some of these costs, researchers may choose to focus upon a particular group of people (known as a cohort) and explore the health of this group in relation to specific variables. For example, studies have sought to understand how different levels of exercise improve health outcomes.
  • Case control: Particularly useful when seeking to explore rare diseases because the population with the disease has already been identified. The group of individuals identified with the disease is then compared to individuals without the disease with the purpose of exploring how the health outcomes differ between the two groups.
  • Cross-sectional: Used to explore the levels of disease within a population (prevalence). Cross-sectional studies provide a snapshot of what is happening within a particular population at one period of time.
  • Ecological: Tend to analyze data from previously published sources in order to explore the health of populations and the potential causes of ill health.
  • Monitoring/Surveillance: Many countries record and survey populations in order to fully understand the health of their populations.
  • Description with registry data: In the United States, cancer registries collect data about the numbers of cases of site-specific cancers each year. This information can then be used to explore rates of cancer at a local level to examine whether incidence and prevalence are changing over time.
  • Röhrig, B., du Prel, J.-B., Wachtlin, D. & Blettner, M. Types of study in medical research: part 3 of a series on evaluation of scientific publications. Dtsch Arztebl Int 106, 262–268 (2009).
  • Haidich, A. B. Meta-analysis in medical research. Hippokratia 14, 29–37 (2010).
  • Ma, Q. & Lu, A. Y. H. Pharmacogenetics, pharmacogenomics, and individualized medicine. Pharmacol. Rev. 63, 437–459 (2011).
  • Sessler, D. I. & Imrey, P. B. Clinical Research Methodology 1: Study Designs and Methodologic Sources of Error. Anesth. Analg. 121, 1034–1042 (2015).
  • Umscheid, C. A., Margolis, D. J. & Grossman, C. E. Key concepts of clinical trials: a narrative review. Postgrad Med 123, 194–204 (2011).
  • Svetkey, L. P. et al. The DASH Diet, Sodium Intake and Blood Pressure Trial (DASH-Sodium). Journal of the American Dietetic Association 99, S96–S104 (1999).
  • U.S. Food & Drug Administration. The Drug Development Process.


Vanessa Gordon-Dseugo, MPH, PhD; Grace Satterfield, MS

Published: January 17, 2019 Revised: September 2, 2020

  • Open access
  • Published: 16 February 2024

Comparing Bayesian hierarchical meta-regression methods and evaluating the influence of priors for evaluations of surrogate endpoints on heterogeneous collections of clinical trials

  • Willem Collier 1 ,
  • Benjamin Haaland 2 , 3 ,
  • Lesley A. Inker 4 ,
  • Hiddo J.L. Heerspink 5 &
  • Tom Greene 2  

BMC Medical Research Methodology volume  24 , Article number:  39 ( 2024 ) Cite this article

104 Accesses

Metrics details

Surrogate endpoints, such as those of interest in chronic kidney disease (CKD), are often evaluated using Bayesian meta-regression. Trials used for the analysis can evaluate a variety of interventions for different sub-classifications of disease, which can introduce two additional goals in the analysis. The first is to infer the quality of the surrogate within specific trial subgroups defined by disease or intervention classes. The second is to generate more targeted subgroup-specific predictions of treatment effects on the clinical endpoint.

Using real data from a collection of CKD trials and a simulation study, we contrasted surrogate endpoint evaluations under different hierarchical Bayesian approaches. Each approach we considered induces different assumptions regarding the relatedness (exchangeability) of trials within and between subgroups. These include partial-pooling approaches, which allow subgroup-specific meta-regressions and, yet, facilitate data adaptive information sharing across subgroups to potentially improve inferential precision. Because partial-pooling models come with additional parameters relative to a standard approach assuming one meta-regression for the entire set of studies, we performed analyses to understand the impact of the parameterization and priors with the overall goals of comparing precision in estimates of subgroup-specific meta-regression parameters and predictive performance.

In the analyses considered, partial-pooling approaches to surrogate endpoint evaluation improved accuracy of estimation of subgroup-specific meta-regression parameters relative to fitting separate models within subgroups. A random rather than fixed effects approach led to reduced bias in estimation of meta-regression parameters and in prediction in subgroups where the surrogate was strong. Finally, we found that subgroup-specific meta-regression posteriors were robust to use of constrained priors under the partial-pooling approach, and that use of constrained priors could facilitate more precise prediction for clinical effects in trials of a subgroup not available for the initial surrogacy evaluation.

Partial-pooling modeling strategies should be considered for surrogate endpoint evaluation on collections of heterogeneous studies. Fitting these models comes with additional complexity related to choosing priors. Constrained priors should be considered when using partial-pooling models when the goal is to predict the treatment effect on the clinical endpoint.

Peer Review reports

There is broad interest in the use of validated surrogate endpoints to expedite clinical trials in areas of slowly progressing disease, such as chronic kidney disease (CKD) [ 1 , 2 , 3 , 4 , 5 ]. A surrogate endpoint is typically a measure of disease progression captured earlier than an established clinical endpoint and should have the property that the treatment effect on the surrogate accurately predicts the treatment effect on the clinical endpoint [ 6 , 7 , 8 ]. This predictive potential is commonly established in a meta-regression analysis of previously conducted trials, where the meta-regression quantifies the strength of the association between treatment effects on the clinical and surrogate endpoints [ 3 , 4 , 5 , 6 , 7 , 8 ]. Accurate estimation of the meta-regression parameters requires variability in the treatment effects on the surrogate and clinical endpoints across trials used for analysis. To achieve this, the collection of trials can contain heterogeneity in terms of interventions and sub-classifications of disease [ 3 , 4 ]. There is often interest among entities such as regulatory agencies regarding the performance of the surrogate in pre-specified, clinically or biologically motivated, and mutually exclusive subgroups defined by intervention or disease classes [ 1 ]. These interests introduce two specific goals the analytical approach must facilitate: The first is accurate estimation of subgroup-specific meta-regression parameters. The second is accurate prediction of treatment effects on the clinical endpoint, either for subgroups used in model fitting or for those not available for model fitting (e.g., for a novel intervention).

One meta-regression methodology involves a Bayesian hierarchical model, which can be used to account for estimation error of the treatment effects on both endpoints as well as the correlation of the sampling errors (a frequently used weighted generalized linear regression approach accounts only for sampling error of the effect estimate on one of the two endpoints) [ 6 , 8 , 9 ]. Under the hierarchical Bayesian approach, it is common to assume all trials used in the analysis to be fully exchangeable despite underlying differences in interventions or diseases across trials [ 4 , 5 , 6 , 8 ]. In effect, this is accomplished by fitting a model with a single meta-regression relating treatment effects on the clinical endpoint to those of the surrogate endpoint to all trials available for the analysis, which we refer to as the “full-pooling” approach. Alternatively, distinct meta-regressions can be fit within subgroups in what we will refer to as the “no-pooling” approach [ 4 , 7 ]. There are often too few trials and insufficient variability in treatment effects within subgroups to estimate the meta-regression parameters with satisfactory precision under a strict no-pooling strategy. An additional limitation to the full and no-pooling strategies is that each induces limitations to model-based prediction of the treatment effect on the clinical endpoint in a future trial. This is especially the case when there is interest in prediction for a trial which is of a “new subgroup”, one that was not available for the initial surrogacy evaluation. Afterall, in the ideal scenario a surrogate can be used for a trial evaluating a novel intervention or when applying an approved indication in a new patient population. Use of a full-pooling model requires the assumption that any future trial is fully exchangeable with the previous trials. Use of a no-pooling approach requires the future trial to be of a subgroup used for the surrogacy evaluation (“existing subgroup”).

Bayesian hierarchical meta-regression lends naturally to a “partial-pooling” compromise to these earlier approaches, where a between subgroup distribution is assumed for some or all subgroup-specific model parameters [ 7 ]. The partial-pooling approach relaxes the assumption of full-exchangeability of all trials used for the analysis, can improve precision of inference on subgroup-specific parameters due to data adaptive information sharing across subgroups, and provides a framework for model-based prediction of an effect on a clinical endpoint for a trial of either an existing or a new subgroup. However, critical decisions needed to fit models of this class are without empirical guidance in the literature. For example, use of fixed and random effects approaches are used interchangeably when employing full-pooling models, and the implications of these two approaches are not well understood under a partial-pooling model [ 8 ]. To our knowledge, there is also not yet work evaluating the impact of the choice of priors under partial-pooling strategies, even though the role of certain prior distributions is likely to be amplified in likely scenarios in which the number of subgroups is small.

In this paper, we provide results from a series of analyses intended to help guide practical decision making for surrogate endpoint evaluations on collections of heterogeneous studies. We explore the extent to which partial-pooling approaches improve precision in key posteriors of interest in surrogate evaluation, the extent to which bias occurs, contrast fixed and random effects variants of models described, and explore the impact of priors. In the Methods section, we describe the modeling approaches evaluated, priors, and how these methods can be used for prediction. In the Results section, we provide results of a limited simulation study and of an applied analysis of CKD trials. We then conclude with the Discussion section.

Modeling approaches to the trial-level analysis of a surrogate

For the trial-level evaluation of a surrogate endpoint, a two stage approach to the analysis is often used [ 6 , 7 , 8 ]. In the first stage, treatment effects on both the clinical and surrogate endpoint as well as standard errors and a within-study correlation between the error of the estimated effects are calculated for each trial. These trial-level measures are used as the data input in the meta-regression evaluation (the second stage). A two-level hierarchical model for the meta-regression can be used to account for within-study estimation error for both treatment effects [ 4 , 5 , 6 , 7 , 8 ].

Under the two-stage approach, one key distinction between commonly used second-stage models involves whether true treatment effects on the surrogate endpoint are viewed as fixed or random [ 6 , 8 ]. Under the fixed effects approach, the true treatment effects on the surrogate endpoint are fixed and the true effects on the clinical endpoint are regressed on the true effects on the surrogate assuming Gaussian residuals. Under the random effects approach, the true treatment effects on both the surrogate and the clinical endpoints are assumed to follow a bivariate normal distribution [ 4 , 5 , 8 ]. The within-study joint distribution can be reasonably approximated with a bivariate normal distribution due to asymptotic normality, but the bivariate normality assumption for the between-study model is made for modeling convenience. Bujkiewicz et al. contrast the predictive performance of a surrogate under fixed and random effects approaches when using the full-pooling approach, but do not summarize differences in estimates of key parameters such as the meta-regression slope [ 8 ]. Papanikos et al. evaluate and contrast different fixed effects approaches in subgroup analyses of a surrogate, but do not compare fixed and random effects approaches [ 7 ]. We hypothesized that the fixed and random effects approaches could produce differing results because there may be more or less shrinkage in the true effects on the surrogate across trials (the “x-axis” variable in the regression) depending on the method used.

We next introduce the full pooling random and fixed effects models, which are applicable when the clinical trials being analyzed can be regarded as exchangeable. Let there be N total clinical trials, each of which compares an active treatment to a control. For trials \(j = 1, \dots , N\) , \((\widehat{\theta }_{1j}, \widehat{\theta }_{2j})'\) jointly represents the suitably scaled within study estimates of treatment effects on the clinical and surrogate endpoints for trial j . The pair \((\theta _{1j}, \theta _{2j})'\) represents the latent joint true treatment effects on the clinical and surrogate endpoints in study j . We let \(\Sigma _j\) denote a within study variance-covariance matrix for study j ( \(\Sigma _{j1,1} = SE(\widehat{\theta }_{1j})^2\) is the squared standard error of the estimated clinical effect, \(\Sigma _{j2,2} = SE(\widehat{\theta }_{2j})^2\) the squared standard error of the estimated surrogate effect, \(\widehat{r}_j\) is the estimated within trial correlation for study j , implying \(\Sigma _{j1,2} = \Sigma _{j2,1} = \widehat{r}_j SE(\widehat{\theta }_{1j}) SE(\widehat{\theta }_{2j})\) ). When the standard errors and within study correlation are available, it is customary to consider all entries of \(\Sigma _j\) fixed and known [ 6 , 7 , 8 , 10 , 11 ]. For the random effects model, \(\mu _s\) represents a population average true treatment effect on the surrogate, and \(\sigma _s^2\) the between trial variance in true effects on the surrogate. We parameterize the model such that \(\alpha\) denotes the meta-regression intercept, \(\beta\) the slope, and \(\sigma _e\) the residual standard deviation. The following represents the full-pooling random effects model (FP-RE).

To fit a full-pooling fixed effects model (FP-FE), rather than assuming a Gaussian distribution for which parameters will be estimated for \(\theta _{2j}\) as above, an independent prior is assigned directly to each \(\theta _{2j}\) .

Next, suppose that the N trials are to be divided into I total subgroups because exchangeability is plausible for the trials within each subgroup but not necessarily between trials in different subgroups. In our experience, regulatory agencies have expressed concern of heterogeneity in surrogate quality across pre-specified subgroups present in the data being used to evaluate CKD-relevant surrogate endpoints. The models discussed throughout the remainder of this paper are thus intended for similar scenarios where: the I subgroups which motivate concern over the full exchangeability of trials (i.e., there might be a different association between treatment effects on the clinical and surrogate endpoint depending on the subgroup a trial pertains to) are presented to the statistical analyst independent of any statistical criteria, subgroup assignment for the trials available for model fitting is not ambiguous (e.g., the inclusion and exclusion criteria of a trial would easily determine the subgroup assignment if disease-based subgroups are of interest), and there can not be misclassification of trials into the wrong subgroups. When such an analytical scenario is presented, we might first consider fitting separate models within each subgroup. For \(i = 1,\dots , I\) , the following represents what we refer to as a no-pooling random effects (NP-RE) model for the \(j^{\text {th}}\) trial within the \(i^{\text {th}}\) subgroup.

We note that one could fit a no-pooling fixed-effects model by placing a prior directly on each \(\theta _{2ji}\) , rather than assuming the Gaussian distribution as above.

For the partial pooling approach, we can incorporate between-subgroup distributions as an intermediate layer in the Bayesian analysis to induce information sharing across subgroups [ 7 , 12 ]. The terms controlling heterogeneity between subgroups are informed by the data. For example, if the data suggests a lack of between-subgroup heterogeneity for any given term, fitting this model should result in substantial information sharing and similar subgroup-specific parameter estimates. The partial pooling model may generate some amount of bias, but could counter-balance this bias with increased precision due to information sharing [ 12 ]. Among other reasons, because between-subgroup variation drives the data-adaptive information sharing, between-subgroup variance terms were of primary interest in our investigation of the influence of priors.

A partial-pooling random effects (PP-RE) model is displayed below. Consider there are additional model parameters necessary to define this model. We let \(\mu _s\) and \(\sigma _s^2\) represent the between subgroup average and variance of true treatment effects on the surrogate; \(\alpha\) and \(\sigma _{\alpha }^2\) and \(\beta\) and \(\sigma _{\beta }^2\) represent the between subgroup average and variance of the meta-regression intercept and slope, respectively; \(\tau _s\) and \(\tau _e\) denote the between-subgroup mean log-transformed true surrogate effects standard deviation and meta-regression residual standard deviation, respectively; \(\gamma _s^2\) and \(\gamma _e^2\) denote the between subgroup variance of the log-transformed within-subgroup true surrogate treatment effects standard deviation and meta-regression residual standard deviation, respectively.

If fitting a partial-pooling fixed effects (PP-FE) model, a prior can be placed directly on each \(\theta _{2ji}\) , rather than assuming the hierarchical Gaussian distribution displayed above. We display an example of a PP-FE model here to contrast it with the PP-RE model more clearly. In this example, we place a N(0,10 \(^2\) ) prior on each trial’s true treatment effect on the surrogate.

To our knowledge, there has been just one other paper to evaluate partial-pooling strategies for the trial-level analysis of a surrogate. As discussed in the introduction, Papanikos et al. evaluated different fixed effects partial-pooling approaches [ 7 ]. An additional difference between the PP-FE model displayed above and those considered by Papaniko’s et al. is that there was not a between-subgroup distribution assumed for \(\sigma _{ei}\) in their models. One advantage of allowing a between-subgroup distribution for \(\sigma _{ei}\) is that it enables estimating posteriors for parameters defining between-subgroup distributions for all meta-regression parameters (intercept, slope, and residual variance). This subsequently facilitates prediction for a trial of a new subgroup, as is discussed in the Generating posterior predictive distributions section.

Analysis set 1: simulation study

We generated trial level summary data (estimated treatment effects, standard errors, and the within-study correlations) based on four broad simulation setups, where within each we introduced two variants depending on the distribution used to simulate true treatment effects on the surrogate. The setups considered were motivated by applied data used to evaluate GFR slope. We consider three subgroups of trials as in previous evaluations of GFR slope and to reflect the likely scenarios where the available data limits the number of subgroups, stressing the potential for benefit from data adaptive partial-pooling [ 4 ]. We simulated 15 medium-to-large trials per subgroup (standard errors on either endpoint reflect trials with roughly 300-2000 patients). Within-study correlations were drawn equally at random from the range of values present in our application data. Without loss of generalizability, we modeled a negative trial-level association. As discussed in the section titled Analysis set 2: application analysis of CKD trials , there is a negative association between treatment effects on the clinical endpoint and treatment effects on GFR slope. We also varied the sizes of subgroups and the degree of between-study variability in true effects on the surrogate. Broadly, we consider one setup (S1) where there is homogeneity in the quality of the surrogate across subgroups, another setup (S2) where the surrogate is weak in two subgroups and strong in another, another setup (S3) where the surrogate is weak in one subgroup and strong in the other two, and a final setup (S4) where surrogate quality is different in all three subgroups. The strength of the surrogate was defined by the true meta-regression \(R^2\) . Earlier work has proposed that \(R^2 \in (0,0.49)\) , \(R^2 \in (0.5,0.72)\) , and \(R^2 \in (0.73,1)\) suggest a weak, moderate, and strong surrogate, respectively [ 13 ]. For our purposes, we simulated data from true parameter values to obtain \(R^2 = 0.35,0.65,0.95\) to define the surrogate as weak, moderate or strong within subgroups, respectively.

Consider the data generating model below for the first variant (V1) of the four simulation setups. To simulate estimated clinical and surrogate effects for trial j ( \(j = 1,\dots , 15\) ) in subgroup i ( \(i = 1,2,3\) ) when true surrogate effects are Gaussian, we first drew true surrogate effects from (9), then drew conditional true clinical effects from (10), and finally drew a pair of estimated effects using (11). The standard errors and within-study correlations forming the matrices \(\Sigma _{ji}\) were drawn according to the rules described above using uniform distributions to reflect variation in trial sizes.

We also sought to contrast results under the different models when true treatment effects on the surrogate were distinctly non-Gaussian (V2). We used the following data generating model, where true effects on the surrogate for each trial were drawn from a bimodal distribution (12).

To summarize results, we provide simulation average posterior medians, \(2.5^{\text {th}}\) and \(97.5^{\text {th}}\) percentiles for models fit across 100 simulated datasets per setup. We also summarize posterior predictive distributions (PPDs - described further below).

Analysis set 2: application analysis of CKD trials

We compare analyses using the models discussed above on a set of 66 CKD studies. Data from these studies was collected by the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI), an international research consortium [ 3 , 4 ]. Evaluations of GFR slope on this collection of studies have been described extensively [ 3 , 4 ]. For the purposes of this paper, we focus on the GFR “chronic slope” as the surrogate [ 4 ]. Time-to-doubling of serum creatinine or kidney failure is used as the clinical endpoint, which is accepted by regulatory agencies and is widely used as the primary endpoint in pivotal phase 3 clinical trials of CKD [ 3 ]. Treatment effects on the clinical endpoint were expressed as log transformed hazard ratios (HRs), estimated using proportional hazards regression. A shared parameter mixed effects model was used to jointly model longitudinal GFR trajectories and the time of termination of GFR follow-up due to kidney failure or death for each randomized patient. Treatment effects on the chronic GFR slope are expressed as the mean difference in the treatment arm slope minus the control arm slope, expressed in ml/min/1.73 m \(^2\) per-year. Further detail on the methods used to estimate effects on GFR slope-based endpoints are described elsewhere in the literature [ 4 , 14 ]. Finally, we obtained robust sandwich estimates of the within-study correlations using a joint model as in previous work by CKD-EPI [ 4 ].

Heterogeneity across the CKD-EPI trials can be attributed to many study level factors. We consider four disease-defined subgroups (CKD with unspecified cause (CKD-UC), diabetes (DM), glomerular diseases (GN), and cardiovascular diseases (CVD)) and 16 intervention-defined subgroups (listed in the Additional file 1 : Section 1). For the application analyses, we focus on fitting the FP-RE and PP-RE models, and use different sets of priors under the PP-RE model (we also contrast results under the PP-RE and PP-FE models where subgroups are defined by disease to complement certain simulation analyses). To capture the scenario where there is interest in prediction for a future trial of a new subgroup, we first fit models by leaving out CVD studies, and we generated PPDs for those studies left-out. For intervention-defined subgroups, we fit the model for trials of 7 subgroups for which there were at least 3 studies, and we then generated PPDs for studies of the remaining left-out, smaller subgroups. We also summarize PPDs obtained for studies of the subgroups used for model fitting under these two subgroup schema.

For the purposes of the simulation study, we utilized diffuse priors, which is a common practice in surrogate endpoint evaluations [ 4 , 6 , 7 , 8 ]. For the full-pooling and no-pooling models, we used the \(N(0,10^2)\) prior for the intercept ( \(\alpha\) or \(\alpha _i\) ) and slope ( \(\beta\) or \(\beta _i\) ), and for the mean true treatment effect on the surrogate ( \(\mu _s\) , \(\mu _{si}\) under random effects models) or for trial-specific true effects on the surrogate when fitting the fixed effects models ( \(\theta _{2ji}\) ). As in previous work in CKD, we used inverse-gamma priors on variance terms ( \(\text {IG(a,b)}\) for shape \(\text {a}\) and scale \(\text {b}\) ) [ 4 , 5 ]. For the full-pooling and no-pooling models, we used \(\sigma _{ei}^2,\sigma _{e}^2 \sim \text {IG}(0.001,0.001)\) . Where appropriate (random effects models), we also used \(\sigma _s^2,\sigma _{si}^2 \sim\) \(\text {IG}(0.001,0.001)\) . The \(\text {IG}(0.001,0.001)\) prior is considered an approximation to the Jeffery’s prior. For partial-pooling models, we let \(\tau _e^2 \sim \text {IG}(0.0025,0.001)\) and \(\gamma _e \sim \text {half-normal}(0,3^{2})\) , and for the random effects variants \(\tau _s^2 \sim \text {IG}(0.0025,0.001)\) and \(\gamma _s \sim \text {half-normal}(0,3^2)\) . This combination translates to priors for within subgroup standard deviations in the partial-pooling models matching those of the no-pooling models to the extent that the 25 \(^{\text {th}}\) , 50 \(^{\text {th}}\) , and 75 \(^{\text {th}}\) prior percentiles differed by less than 0.05. For \(\sigma _{\alpha }\) , \(\sigma _{\beta }\) , \(\sigma _{s}\) , we used \(\text {half-normal}(0,2^2)\) . These specific half-normal priors should be considered highly diffuse for all of our analyses.

For our application analyses, we considered three variations on priors when employing the PP-RE model. We considered different priors for partial-pooling models because we hypothesized that not only narrow priors, but also highly diffuse priors could unduly influence certain results of our analyses. This is because there is often a limited number of studies available for meta-analysis, which can limit the number of subgroups. The categorization of studies based on constructs such as disease subtype or treatment comparison class may also provide a small number of subgroups. When there are just a few subgroups, the data provides very little information on subgroup-to-subgroup variation. The posteriors for between-subgroup variance terms may be more likely to exhibit minimal updates from the priors based on the data. As such, if priors are so diffuse that they represent a range of variability that is beyond practical reality, so too could the posteriors. As described below, this is also important because between-subgroup variance parameters are utilized in generating posterior predictive distributions for a trial of a new subgroup. A practical degree of narrowing certain priors could be seen as a necessary middle ground between use of overly narrow or overly diffuse priors. While we narrowed all priors for our constrained “sets” considered, the priors we focused on were for between-subgroup standard deviations for meta-regression parameters. We first used the fully diffuse priors displayed above. We then employed an iterative procedure, where we narrowed priors (emphasizing between-subgroup standard deviation parameters such as \(\sigma _{\alpha },\sigma _{\beta },\gamma _e\) ) until a set was found that produced no more than 0.05 difference in the posterior median, 2.5 \(^{\text {th}}\) , and 97.5 \(^{\text {th}}\) percentiles for the within-subgroup meta-regression posteriors, no matter how much narrower posteriors on between-subgroup parameters became (referred to as “Constrained Priors Set 1”, which were ultimately the same for either subgroup classification). Finally, we chose what we will refer to as “domain-constrained” priors (“Constrained Priors Set 2”). It is reasonable to choose a prior that constrains between-subgroup variability to a range that is actually plausible in reality based on subject matter expertise (e.g., through a prior elicitation process). For example, in our case the intercept is the expected true log-HR on the clinical endpoint when the true effect on the surrogate is the null effect. When there is a null-effect on the surrogate, we may suspect a low probability of an expected HR on the clinical endpoint that is very strong in either direction (e.g., below 0.5 or above 2.0), and this logic can be used to provide a moderate to low probability for subgroup-specific intercepts to go beyond these values. Domain-constrained priors were the narrowest among those considered for our analyses, and further detail on choosing these priors is provided in Section 2 of Additional file 1 .

We wish to also emphasize that there is an important distinction between narrowing priors for the terms that define variability in the treatment effects on the surrogate across studies, and for the meta-regression parameters. The degree of variability of treatment effects on the surrogate influences the extent to which the data allows the quality of the surrogate to be inferred. Priors for the distribution(s) of true treatment effects on the surrogate should be left sufficiently diffuse so as not to restrict variation in effects across studies. In our cases, these were narrowed because the diffuse priors typically used are excessively wide relative to the range of treatment effects that are reasonable. The priors of primary interest are again those governing the degree of variability between subgroups in the meta-regression terms (e.g., \(\sigma _{\beta }\) ).

Generating posterior predictive distributions

There are a number of strategies that can be used to generate PPDs for the treatment effect on the clinical endpoint based on the treatment effect on the surrogate. In our simulation study, we compare summaries of PPDs for the true treatment effect on the clinical endpoint, which only takes into account uncertainty in the estimated meta-regression parameters. This is possible in a simulation analysis because we actually know the true effect on the surrogate [ 7 ]. For each study left-out of model fitting, let the true effect on the surrogate for that study be denoted \(\theta _{2}^N\) . Then, the PPD for the true effect on the clinical endpoint is generated by taking \(m=1,\dots ,M\) draws (for each of M posterior draws obtained in model fitting) from \(N(\alpha ^{*m} + \beta ^{*m}\theta _{2}^N,\sigma _e^{*m2})\) , where \(\alpha ^{*m}, \beta ^{*m}, \sigma _e^{*m}\) represent draws from posteriors from either the full-pooling, no-pooling or partial-pooling models. For our purposes, subgroup-specific parameters were used when trials were simulated from the same subgroup if using no-pooling or partial-pooling.

In application analyses, it is only possible to obtain the PPD for the estimated effect on the clinical endpoint, which involves a procedure that takes into account not only uncertainty in the meta-regression posteriors, but also uncertainty due to sampling error in the treatment effect estimates. Section 3 of the Additional file 1 provides further detail on the procedures used for prediction in our application analyses. We provide an overview here. For one part of our application analyses, we generated PPDs for trials of existing subgroups. Under full-pooling models, we directly used the single set of estimated meta-regression posteriors to map the effect on the surrogate to a predicted effect on the clinical endpoint. Under no-pooling and partial-pooling models, we used the appropriate subgroup-specific meta-regression posteriors estimated directly in model fitting (e.g., to make a prediction for a trial of subgroup \(i \in \{1,\dots ,I\}\) we directly use a draw from the posterior for \(\beta _i\) obtained through model fitting). In our second prediction exercise we generated PPDs for trials of a new subgroup. Only the full-pooling and partial-pooling models were used as no-pooling models do not facilitate estimation of parameters which allow the surrogate to be applied in a new subgroup. Again, under full-pooling models we used the single set of estimated meta-regression posteriors, which induces the assumption that the new study is fully exchangeable with those used for model fitting despite that it pertains to a new subgroup. Under partial-pooling models we used draws from population subgroup distributions (e.g., we draw a new \(\beta _{\text {new}}\) from \(N(\beta ,\sigma _{\beta }^2)\) ) to map the effect on the surrogate to the predicted clinical effect (that this process requires \(\sigma _{\beta }\) , which again may be influenced by the choice of priors in practical scenarios where the number of subgroups is small, is what motivated our interest in careful choosing of priors). This way, for all prediction exercises we were using subgroup-specific meta-regression posteriors for prediction, just that these were random draws from the population distribution when applying the surrogate to a new setting under the partial-pooling approach. When we are extrapolating the trial-level association to a new subgroup, drawing from the population distribution for each meta-regression posterior induces an additional degree of uncertainty into the prediction. This could be seen as a reasonable compromise between applying the fitted full-pooling model, which ignores that the new study represents a new scenario, and not applying the surrogate at all (i.e., the no-pooling approach). As discussed when introducing the PP-RE approach, the reason why we assume between-subgroup distributions for \(\sigma _e\) is to facilitate the possibility of drawing subgroup-specific residual standard deviations needed in prediction for a trial of a new subgroup.

For simulation and applied analyses, we used the University of Utah Center for High Performance Computing Linux cluster. On the cluster, we used R version 4.0.3 for data preparation and for generating model summaries. The mcmc sampling algorithms for model fitting were implemented using RStan version 2.21.12 [ 15 ]. We utilized the Gelman-Rubin statistic to assess adequate convergence of chains and the effective sample size to evaluate whether there were sufficient mcmc draws to utilize certain posterior summaries such as tail percentiles (as well as additional visual summaries such as rank plots) [ 16 , 17 ]. We landed on 10,000-20,000 mcmc iterations and 3 independent chains across all analyses. Finally, for the application analyses, the SAS NLMixed procedure was used to estimate treatment effects on the clinical and surrogate endpoints, standard errors, and within-study correlations within each study [ 18 ]. Example RStan code (PP-RE model) and R code (for simulating data) is provided in Section 4 of Additional file 1 .

Simulation study results

Contrasting different random effects approaches under gaussian surrogate effects.

Table  1 provides summaries of posterior distributions obtained from fitting models on simulation setups 1-4 (V1 and V2). When there was no heterogeneity in the true meta-regression parameters across subgroups (Setup 1), the PP-RE model resulted in limited additional uncertainty in posteriors relative to the FP-RE model, and also resulted in negligible additional bias via the posterior medians. Across Setups 2-4, where the strength of the association between effects on the clinical and surrogate endpoint varied across subgroups, for any given meta-regression parameter summarized, use of the FP-RE model naturally obscured such heterogeneity. The NP-RE and PP-RE models more adequately produced subgroup-specific meta-regression posteriors that suggested heterogeneity in the quality of the surrogate, but in every case the PP-RE model produced more precise posteriors than that of the NP-RE model. Benefits were especially evident when focusing on posteriors for the meta-regression slope. While the PP-RE model typically resulted in a small degree of bias, between-subgroup heterogeneity was potentially more evident due to improved precision. Precision gains under the PP-RE over the NP-RE model were also observed in the sensitivity analyses considered (Tables 2 and 3 of Additional file 1 ), including where there was heterogeneity in subgroup sizes. There was a larger degree of pooling away from parameter values true for smaller subgroups under partial-pooling, but the PP-RE model still allowed for heterogeneity in posterior medians and 95% credible intervals to aid in understanding variations in surrogate quality across subgroups. One potential drawback of all approaches considered was that \(R^2\) posterior medians appeared biased in every scenario evaluated, reflecting the challenge associated with accurate estimation of \(R^2\) with limited data. The average posterior median \(R^2\) under partial-pooling was more biased than under no-pooling in certain scenarios such as where the surrogate was weak, possibly due to information sharing. The challenges associated with estimating \(R^2\) emphasize why it is important to consider not only reporting \(R^2\) point estimates but also credible intervals. The credible intervals under the PP-RE approach remained wide in subgroups where the surrogate was weak. Differences in model performance were also evident in evaluations of model-based prediction of treatment effects on the clinical endpoint (Table  2 ). Coverage of true clinical effects by 95% posterior prediction intervals was lower when using the FP-RE model even where meta-regression parameters were truly the same across subgroups. The NP-RE model resulted in highest coverage because of excessively wide prediction intervals, whereas prediction under the PP-RE model resulted in improved precision with adequate coverage.

Contrasting fixed vs. random effects partial-pooling models under non-Gaussian surrogate effects

Where the true treatment effects on the surrogate were non-Gaussian, the PP-FE model resulted in downward bias in meta-regression intercept posteriors (e.g., via the posterior median), whereas the PP-RE model either did not result in any bias or resulted in a lesser degree of bias. The PP-FE model also resulted in downward bias in the meta-regression slope posteriors (regression dilution bias) in subgroups where the surrogate was simulated to be moderate-to-strong. We hypothesize that this downward bias was due to the absence of shrinkage of true treatment effects on the surrogate (the “x-axis” variable in the meta-regression) towards one another. Because no common distribution is assumed for true effects on the surrogate across studies, the true effects are likely to be more dispersed in contrast to use of the random effects model, where the Gaussian distributional assumption could result in pooling of true treatment effects on the surrogate across studies. Although the random effects model resulted in a small degree of upward bias in the meta-regression slope in subgroups where the surrogate was weak, the \(R^2\) posteriors were wider and their median’s lower than under the fixed effects model. This means that the risk of concluding a stronger surrogate than was true in reality was mitigated due to the less optimistic \(R^2\) posteriors. The implications of these biases observed in meta-regression posteriors are also evidenced in summaries of prediction in Table  2 . Despite the use of fixed effects, coverage of the true treatment effect on the clinical endpoint by 95% posterior predictive intervals under the PP-FE model was poorer than under the PP-RE model, to the largest extent in subgroups where the surrogate was strongest, which is likely where prediction is of greatest interest.

Application analysis results

The primary goal of the application analysis was to compare meta-regression posteriors and PPDs obtained after fitting the PP-RE model with different priors. However, we also note that Fig. 7 in the Additional file 1 indicates differences in the meta-regression slope estimates under the PP-RE and PP-FE models from the analysis where models were fit to disease-defined subgroups. The discrepancy in the posterior median between the two models grew larger for subgroups with a stronger meta-regression slope under the PP-RE model (under the PP-RE model, medians were -0.25, -0.30, -0.35, whereas, under the PP-FE model, these were -0.27, -0.29, -0.29).

Table  3 summarizes meta-regression slope posteriors from the application analyses (3 disease-defined subgroups, with 59 studies for model fitting in one analysis and 7 intervention-defined subgroups with 51 studies used for model fitting in the other). Additional file 1 : Tables 5 and 6 contain posterior summaries for the full set of meta-regression parameters from these analyses. When there were three disease-defined subgroups, using increasingly narrow priors resulted not only in narrower posteriors for between-subgroup standard deviation parameters but also for the between-subgroup mean parameters (even when priors for between-subgroup means were left the same). However, priors could be narrowed considerably before the within-subgroup posteriors narrowed. In most cases, even the narrowest priors used did not meaningfully change the inference on subgroup-specific posteriors. When there were 7 subgroups, narrower priors again resulted in equivalent or narrower posteriors for between-subgroup means and standard deviations, but to a lesser extent when compared to the analysis with fewer subgroups. Similarly, the use of narrower priors resulted in little, if any change in the within-subgroup posteriors under the options considered for intervention-defined subgroups.

Figures  1 and 2 display and illustrate the implications of the choice of priors on prediction for trials of a new subgroup or an existing subgroup. A subset of trials is displayed in the figures to be concise, and the remaining results are displayed in Additional file 1 : Tables 7-12. Firstly, consider the trials of novel subgroups. For every study, the PP-RE model resulted in wider PPDs than the FP-RE model. When there were fewer subgroups, predictive distributions for left-out studies were excessively and unrealistically wide when using completely diffuse priors under the PP-RE model. The use of constrained priors, especially those motivated by domain-specific reasoning (P3), resulted in PPDs which were narrowest among those obtained, but still wider than those under the FP-RE model with diffuse priors. Increasingly constrained priors resulted in more realistic uncertainty in HRs relative to the use of diffuse priors. When predicting for a trial of a novel intervention class (Fig.  2 ), where more subgroups were available for model-fitting, PPDs were narrower under the PP-RE approach (contrast PPDs in Fig.  1 relative to Fig. 2 ). This could be because of improved inferential precision for parameters associated with between-subgroup variability when more subgroups are present. These results indicate the PP-RE model may be more suitable for prediction to induce an appropriate degree of added uncertainty in predicting a clinical effect in a trial meaningfully different than those used to evaluate the surrogate. However, these results also suggest that PPDs can be excessively wide due to overly diffuse and unrealistic priors and not due to the true quality of the surrogate or its applicability to a new setting. Next, when trials were of a subgroup available for model fitting, the summaries of PPDs under the PP-RE model were more robust to the choice of priors relative to prediction for studies of a new subgroup (even for subgroups with few trials). In our setting, predictive distributions were also similar in width under the PP-RE relative to FP-RE model (evidenced by the 2.5 \(^{\text {th}}\) and 97.5 \(^{\text {th}}\) percentiles). The PP-RE model may thus increase accuracy and precision in prediction of clinical effects for future trials of existing subgroups over use of the FP-RE model by allowing subgroup-specific meta-regression parameters.

figure 1

Posterior predictive median and 95% interval are summarized. FP-RE: Full-pooling random effects. PP-RE: Partial-pooling random effects. P1: Diffuse priors used in fitting the PP-RE model. P2: Constrained priors set 1 in fitting the PP-RE model. P3: Constrained priors set 2 (narrowest) in fitting the PP-RE model. Studies listed are described further in Additional file 1 . The “ESG” (existing subgroup) studies were used for model fitting. The “NSG” (new subgroup) studies were left-out of model fitting

figure 2

Trial-level surrogate endpoint evaluations are often performed on collections of heterogeneous clinical trials. Standard methodology that yields estimates of a single set of meta-regression parameters may not be appropriate when trials meaningfully differ across pre-specified subgroups, and may also provide unrealistic precision in prediction of clinical effects in new studies that differ from those used to evaluate the surrogate. In this paper, we explored a class of models we refer to as “partial-pooling” models, where subgroup-specific meta-regressions are assumed, and yet between-subgroup distributions facilitate data adaptive information sharing across subgroups. Partial-pooling models provide a framework both for prediction of treatment effects on the clinical endpoint for a trial that meaningfully differs (is of a new subgroup) from those used for the surrogate evaluation itself and for prediction of future studies of an existing subgroup. There are various challenges in the implementation of a partial-pooling approach, such as the choice of priors and distribution for the true treatment effects on the surrogate. We conducted analyses to help guide such decision making.

Under the scenarios considered (e.g., unless there are a large number, exceeding at least 30, of large trials within a given subgroup), our analyses indicated that fitting separate models for surrogate endpoint evaluation within subgroups (no-pooling) can result in excessive uncertainty in posteriors. We found that partial-pooling methods can be a practical solution with noteworthy benefits (we saw improved precision in posteriors with limited bias due to information sharing in our analyses). If interest is in inference for subgroup-specific meta-regression posteriors, our results showed key differences in interpretations when using fixed versus random effects under the partial-pooling approach. In our analyses, the partial-pooling fixed effect variant produced downward bias in the meta-regression slope in subgroups of trials where the surrogate was strong, which translated to more biased prediction. The partial-pooling random effects approach did not produce such biases in subgroups where the surrogate was strong. We also did not see noteworthy biases under the partial-pooling random effects approach when the Gaussian distributional assumption of the true treatment effects on the surrogate was definitively violated.

A key theme of our results is that posterior distributions of the meta-regression parameters within each subgroup under the partial-pooling random effects model were robust to a degree of narrowing of priors on between-subgroup parameters. Similarly, inferences which apply the meta-regressions fit under the partial pooling model to estimate the posterior predictive distribution for the treatment effect on the clinical endpoint in a new trial were robust to the prior distributions when the new trial belonged to one of the same subgroups included when fitting the meta-regression. Conversely, however, inferences to a new trial which did not belong to one of the subgroups of the prior trials could be highly dependent on the prior distributions, especially for priors on the between subgroup standard deviations of the meta-regression parameters. Notably, when highly diffuse priors were used, the posterior predictive distributions for the new trial exhibited very high dispersion, indicating poor ability to extend the relationship between the treatment effects on the surrogate and clinical endpoints from the previous trials to the new trial. The extent to which the choice of priors influenced dispersion of posterior predictive distributions for a trial of a new subgroup was greater when there were fewer subgroups used in model fitting (e.g., if there were 3 as opposed to 7 subgroups, as in our analyses). This suggests that when fitting partial-pooling models, not only the use of overly constrained, but also the use of overly diffuse priors can unduly influence certain predictive analyses, and it is thus important to consider a strategy to identify more practical priors.

These quantitative findings are consistent with the general concept that the relationship between treatment effects on the surrogate and clinical endpoints observed in previously conducted trials can be reasonably applied to a new trial if at least one of the following three conditions hold: 1) there is strong evidence for a high-quality surrogate with a lack of heterogeneity in performance across a large number of subgroups representing an exhaustive array of intervention types and disease sub-classifications; 2) the new trial can be viewed as a member of the same subgroups used to evaluate the surrogate; 3) subject matter knowledge is sufficiently strong to support informative prior distributions, which mitigate heterogeneity in the meta-regression parameters between subgroups. This third condition appears related to the stress regulatory agencies place on the strength of evidence for a strong biological relationship between the surrogate and clinical endpoints. If the new trial is evaluating a novel treatment or disease subtype which is fundamentally distinct from any of the previous subgroups of trials, and subject matter knowledge cannot rule out heterogeneity in the meta-regression parameters between subgroups, application of the relationship between the surrogate and clinical endpoints observed in the prior trials to the new trial is tenuous. Of course, priors which drive the applicability of the meta-regression for prediction to a trial of a new subgroup can be tuned with multiple considerations in mind. In one regard, even without strong subject matter knowledge, basic logic can be used to narrow priors to some degree (such as for the meta-regression intercept, a log hazard ratio in our case, which is a commonly used metric and need not be expected to vary excessively). On the other hand, priors could be further constrained if there is strong subject matter knowledge indicating to do so, ideally from multiple stakeholders. Key is that the use of completely diffuse priors is likely to be highly impractical when employing partial-pooling models for surrogate evaluation, and the applicability of the surrogate should not depend on the excessive uncertainty imposed by the use of such priors as opposed to those that are realistic according to sound subject matter reasoning.

A noteworthy implication of our findings is that use of a partial-pooling model on a diverse collection of studies may be more useful than highly targeted surrogate evaluations on small subsets of studies. For example, there have been many evaluations of surrogates such as tumor response or progression free survival for highly specific tumor types in cancer [ 19 , 20 , 21 , 22 ]. However, there may be insufficient data in such settings to truly infer the quality of the surrogate. Partial-pooling models (with appropriately defined priors) fit to data sets with more tumor types, for example, may yield more useful information than fitting separate models within the small subgroups.

There are potential limitations to our analyses and findings. The use of Bayesian methods for surrogate evaluation is computationally demanding and we thus considered a limited number of scenarios in our application and simulation analyses. There may also be many additional distributions that could provide further benefit over the Gaussian or fixed-effects approaches we considered. For example, Bujkiewicz et al. showed potential benefits of using a t-distribution for certain terms [ 8 ]. Other strategies to refine priors may also be appropriate in other disease settings. Our analyses and discussion are embedded within the context where we initiate the analysis by assuming (through our priors) there may be some heterogeneity in the meta-regression across subgroups, but that priors on terms related to between-subgroup heterogeneity can be narrowed to some degree to ensure the inference is not unduly influenced by unrealistically wide priors. An alternative approach may be to use priors which, to some degree, induce the assumption that there is no between-subgroup heterogeneity in the quality of the surrogate to start the analysis, forcing the data to provide strong evidence for heterogeneity for the meta-regression posteriors to differ at all across subgroups. For example, spike and slab priors could be considered in future work, if the use of such priors aligns with the analytical goals in a given surrogate evaluation.

It is also important to note that there are many approaches to trial-level surrogate endpoint evaluation. For example, Buyse et al. have proposed joint models that can be fit in a single-stage analysis to simultaneously estimate within and between-study surrogacy metrics [ 23 ]. While joint modeling strategies have a number of advantages, their uptake appears less common than two-stage approaches in practice [ 9 ]. Other authors have also used network meta-regression strategies for surrogate endpoint evaluations on collections of heterogeneous studies [ 24 ]. Finally, within the context of evaluating whether there is heterogeneity in trial-level associations, alternative model structures may be useful depending on the ultimate scientific question. For example, one might consider a single linear regression with interaction terms. One potential drawback to such an approach is that with increasing trial-level factors (e.g., subgroups), such models become increasingly complex, potentially over-parameterized, and may pose challenges for non-statisticians to interpret. On the other hand, an advantage of the partial-pooling approaches discussed is that these maintain the linear regression structure within subgroups, which is again an approach that is already familiar to many investigators.


The methods discussed in this paper are applicable to the two-stage approach often used to establish the trial-level validity of a surrogate endpoint. Because establishing trial-level surrogacy requires a collection of clinical trials, analysts are often confronted with limited data. A strategy to overcome such data limitations is to incorporate a broad collection of studies with various disease and therapy sub-categories. However, analyses on such data in, for example, chronic kidney disease has encouraged regulatory agencies to question whether surrogate performance varies across pre-specified and clinically motivated subgroups of trials defined by disease or intervention classes. Analyses requiring sub-dividing available trials into subgroups will only exacerbate issues associated with model fitting on small amounts of data. We performed analyses that showed that partial-pooling modeling approaches may improve the potential to infer the quality of the surrogate within subgroups of trials even on limited datasets. However, our analyses also showed that even diffuse priors used for partial-pooling analyses can strongly influence the perceived quality of the surrogate as well as the ability to predict the treatment effect on the clinical endpoint. We discussed strategies that can be used to constrain priors used for the analysis to obtain more realistic estimates of key parameters for surrogate endpoint evaluation. Ultimately, analyses of a surrogate endpoint could result in appropriately expanding the feasibility of trials in an entire disease area, or could lead to the use of an endpoint that is not ultimately useful for patients. Partial-pooling models should be considered for surrogate endpoint evaluation on heterogeneous collections of trials, but the choice of a given model and priors to implement the model should be handled rigorously.

Availability of data and materials

Data restrictions apply to the data used for the application analyses presented, for which we were given access under license for this manuscript. These data are not publicly available due to privacy or ethical restrictions. The programs used to generate data used for the purposes of the simulation study is provided in the supplemental materials.


  • Chronic kidney disease

Glomerular filtration rate

Random effects




Posterior predictive distribution

Diabetes mellitus

Glomerular disease

Cardiovascular disease


Thompson A, Smith K, Lawrence J. Change in estimated GFR and albuminuria as end points in clinical trials: a viewpoint from the FDA. Am J Kidney Dis. 2020;75(1):4–5.

Article   PubMed   Google Scholar  

Food and Drug Administration US. Guidance for industry: expedited programs for serious conditions - drugs and biologics. 2014. . Accessed 1 Jan 2022.

Levey AS, Gansevoort RT, Coresh J, Inker LA, Heerspink HL, Grams M, et al. Change in albuminuria and GFR as end points for clinical trials in early stages of CKD: a scientific workshop sponsored by the National Kidney Foundation in collaboration with the US Food and Drug Administration and European Medicines Agency. Am J Kidney Dis. 2020;75(1):84–104.

Article   CAS   PubMed   Google Scholar  

Inker LA, Heerspink HJL, Tighiouart H, Levey AS, Coresh J, Gansevoort RT, et al. GFR slope as a surrogate end point for kidney disease progression in clinical trials: a meta-analysis of treatment effects of randomized controlled trials. J Am Soc Nephrol. 2019;30(9):1735–45.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Heerspink HJL, Greene T, Tighiourt H, Gansevoort RT, Coresh J, Simon AL, et al. Change in albuminuria as a surrogate endpoint for progression of kidney disease: a meta-analysis of treatment effects in randomised clinical trials. Lancet Diabetes Endocrinol. 2019;7(2):128–39.

Daniels MJ, Hughes MD. Meta-analysis for the evaluation of potential surrogate markers. Stat Med. 1997;16(17):1965–82.

Papanikos T, Thompson JR, Abrams KR, Stadler N, O C, Taylor R, et al. Bayesian hierarchical meta-analytic methods for modeling surrogate relationships that vary across treatment classes using aggregate data. Stat Med. 2020;39(8):1103–1124.

Bujkiewicz S, Thompson JR, Spata E, Abrams KR. Uncertainty in the Bayesian meta-analysis of normally distributed surrogate endpoints. Stat Methods Med Res. 2017;26(5):2287–318.

Article   MathSciNet   PubMed   Google Scholar  

Belin L, Tan A, De Rycke Y, Dechartress A. Progression-free survival as a surrogate for overall survival in oncology: a methodological systematic review. Br J Cancer. 2022;122(11):1707–14.

Article   Google Scholar  

Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol. 2007;7(3):1471–2288.

Google Scholar  

Riley RD. Multivariate meta-analysis: the effect of ignoring within-study correlation. J R Stat Soc Series A Stat Soc. 2009;172(4):789–811.

Article   MathSciNet   Google Scholar  

Jones HE, Ohlssen DI, Neuenschwander B, Racine A, Branson M. Bayesian models for subgroup analysis in clinical trials. Clin Trials. 2011;8(2):129–43.

Prasad V, Kim C, Burotto M, Vandross A. The strength of association between surrogate end points and survival in oncology: a systematic review of trial-level meta-analyses. JAMA Intern Med. 2015;175(8):1389–98. .

Vonesh E, Tighiouart H, Ying J, Heerspink HJL, Lewis J, Staplin N, et al. Mixed-effects models for slope-based endpoints in clinical trials of chronic kidney disease. Stat Med. 2019;38(22):4218–39.

RStan Development Team. Rstan: The R interface to Stan. 2020. . Accessed 1 Dec 2022.

Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. New York: Chapman and Hall; 1995.

Book   Google Scholar  

Vehtari A, Gelman A, Simpson D, Carpenter B, Burkner PC. Rank-normalization, folding, and localization: an improved rhat for assessing convergence of MCMC (with discussion). Bayesian Anal. 2021;16(2):667–718. .

The SAS Institute. The NLMIXED procedure. 2015. . Accessed 1 Dec 2022.

Kataoka K, Nakamura K, Mizusawa J, Kato K, Eba J, Katayama H, et al. Surrogacy of progression-free survival (PFS) for overall survival (OS) in esophageal cancer trials with preoperative therapy: Literature-based meta-analysis. Eur J Surg Oncol. 2017;43(10):1956–61.

Chen YP, Sun Y, Chen L, Mao YP, Tang LL, Li WF, et al. Surrogate endpoints for overall survival in combined chemotherapy and radiotherapy trials in nasopharyngeal carcinoma: Meta-analysis of randomised controlled trials. Radiother Ooncol. 2015;116(2):157–66.

Gharzai LA, Jiang R, Wallington D, Jones G, Birer S, Jairath N, et al. Intermediate clinical endpoints for surrogacy in localised prostate cancer: an aggregate meta-analysis. Lancet Oncol. 2021;22(3):402–10.

Michiels S, Pugliano L, Marguet S, Grun D, Barinoff J, Cameron D, et al. Progression-free survival as surrogate end point for overall survival in clinical trials of HER2-targeted agents in HER2-positive metastatic breast cancer. Ann Oncol. 2016;27(6):1029–34.

Buyse M, Molenberghs G, Paoletti X, Oba K, Alonso A, Elst WV, et al. Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J. 2016;58(1):104–32.

Bujkiewicz S, Jackson D, Thompson JR, Turner RM, Stadler N, Abrams KR, et al. Bivariate network meta-analysis for surrogate endpoint evaluation. Stat Med. 2019;38(18):3322–41.

Article   MathSciNet   PubMed   PubMed Central   Google Scholar  

Download references


The support and resources from the Center for High Performance Computing at the University of Utah are gratefully acknowledged. We thank all investigators, study teams, and participants of the studies included in the Analysis set 2: application analysis of CKD trials and  Application analysis results sections. Specific details for the same studies used in our analyses have been detailed in previous work by CKD-EPI [ 4 , 5 ].

We also thank the following CKD-EPI investigators/collaborators representing their respective studies (study acronyms/abbreviations are listed in Table 13 of Additional file 1 ): AASK: Tom Greene; ABCD: Robert W. Schrier, Raymond O. Estacio; ADVANCE: Mark Woodward, John Chalmers, Min Jun; AIPRI (Maschio): Giuseppe Maschio, Francesco Locatelli; ALTITUDE: Hans-Henrik Parving, Hiddo JL Heerspink; Bari (Schena): Francesco Paolo Schena, Manno Carlo; Bologna (Zucchelli): Pietro Zucchelli, Tazeen H Jafar; Boston (Brenner): Barry M. Brenner; canPREVENT: Brendan Barrett; Copenhagen (Kamper): Anne-Lise Kamper, Svend Strandgaard; CSG (Lewis 1992, 1993): Julia B. Lewis, Edmund Lewis; EMPA-REG OUTCOME: Christoph Wanner, Maximilian von Eynatten; Fukuoka (Katafuchi): Ritsuko Katafuchi; Groningen (van Essen): Paul E. de Jong, GG van Essen, Dick de Zeeuw; Guangzhou (Hou): Fan Fan Hou, Di Xie; HALT-PKD: Arlene Chapman, Vicente Torres, Alan Yu, Godela Brosnahan; HKVIN: Philip KT Li, Kai-Ming Chow, Cheuk-Chun Szeto, Chi-Bon Leung; IDNT: Edmund Lewis, Lawrence G. Hunsicker, Julia B. Lewis; Lecco (Pozzi): Lucia Del Vecchio, Simeone Andrulli, Claudio Pozzi, Donatella Casartelli; Leuven (Maes): Bart Maes; Madrid (Goicoechea): Marian Goicoechea, Eduardo Verde, Ursula Verdalles, David Arroyo; Madrid (Praga): Fernando Caravaca-Fontán, Hernando Trujillo, Teresa Cavero, Angel Sevillano; MASTERPLAN: Jack FM Wetzels, Jan van den Brand, Peter J Blankestijn, Arjan van Zuilen; MDRD Study: Gerald Beck, Tom Greene, John Kusek, Garabed Eknoyan; Milan (Ponticelli): Claudio Ponticelli, Giuseppe Montagnino, Patrizia Passerini, Gabriella Moroni ORIENT: Fumiaki Kobayashi, Hirofumi Makino, Sadayoshi Ito, Juliana CN Chan; Hong Kong Lupus Nephritis (Chan): Tak Mao Chan; REIN: Giuseppe Remuzzi, Piero Ruggenenti, Aneliya Parvanova, Norberto Perico; RENAAL: Dick De Zeeuw, Hiddo JL Heerspink, Barry M. Brenner, William Keane; ROAD: Fan Fan Hou, Di Xie; Rochester (Donadio): James Donadio, Fernando C. Fervenza; SHARP: Colin Baigent, Martin Landray, William Herrington, Natalie Staplin; STOP-IgAN: Jürgen Floege, Thomas Rauen, Claudia Seikrit, Stefanie Wied; Strasbourg (Hannedouche): Thierry P. Hannedouche; SUN-MACRO: Julia B. Lewis, Jamie Dwyer, Edmund Lewis; Texas (Toto): Robert D. Toto; Victoria (Ihle): Gavin J. Becker, Benno U. Ihle, Priscilla S. Kincaid-Smith.

The study was funded by the National Kidney Foundation (NKF). NKF has received consortium support from the following companies: AstraZeneca, Bayer, Cerium, Chinook, Boehringer Ingelheim, CSL Behring, Novartis and Travere. This work also received support from the Utah Study Design and Biostatistics Center, with funding in part from the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002538.

Author information

Authors and affiliations.

Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

Willem Collier

Department Population Health Sciences, University of Utah School of Medicine, Salt Lake City, UT, USA

Benjamin Haaland & Tom Greene

Pentara Corporation, Millcreek, UT, USA

Benjamin Haaland

Division of Nephrology, Tufts University Medical Center, Boston, MA, USA

Lesley A. Inker

Department of Clinical Pharmacy and Pharmacology, Department of Nephrology, University of Groningen, Groningen, Netherlands

Hiddo J.L. Heerspink

You can also search for this author in PubMed   Google Scholar


Willem Collier was the primary author for all sections of the manuscript, worked on the design and implementation of all analyses, wrote the programs used for analyses and results reporting, and generated summaries. Tom Greene contributed to writing and editing in all sections throughout the manuscript and helped in the design of all analyses. Benjamin Haaland contributed to writing and editing in all sections throughout the manuscript and helped in the design of all analyses. Lesley Inker contributed to writing and editing of the introduction, application analysis, and discussion sections, and helped to design the application analyses. Hiddo Heerspink contributed to writing and editing of the introduction, application analysis, and discussion sections, and helped to design the application analyses.

Corresponding author

Correspondence to Willem Collier .

Ethics declarations

Ethics approval and consent to participate.

The analyses presented in this study were deemed exempt from review by the Tufts Medical Center Institutional Review Board. The research presented in this paper complies with all relevant ethical regulations (Declaration of Helsinki). Only aggregated data from previously conducted clinical trials are presented. The protocol and consent documents of the individual trials used were reviewed and approved by each trial’s participating centers’ institutional review board, and informed consent was provided by all participants of the studies for which results were aggregated for our analyses.

Consent for publication

Not applicable.

Competing interests

Willem Collier received funding from the National Kidney Foundation for his graduate studies while working on aspects of the submitted work.

Benjamin Haaland is a full time employee of Pentara Corporation and consults for the National Kidney Foundation.

Hiddo JL Heerspink received grant support from the National Kidney Foundation to his institute and is a consultant for AbbVie, AstraZeneca, Bayer, Boehringer Ingelheim, Chinook, CSL Behring, Dimerix, Eli Lilly, Gilead, GoldFinch, Janssen, Merck, Novo Nordisk and Travere Pharmaceuticals.

Lesley A Inker reports funding from National Institutes of Health, National Kidney Foundation, Omeros, Chinnocks, and Reata Pharmaceuticals for research and contracts to Tufts Medical Center; consulting agreements to Tufts Medical Center with Tricida; and consulting agreements with Diamerix.

Tom Greene reports grant support from the National Kidney Foundation, Janssen Pharmaceuticals, Durect Corporation and Pfizer and statistical consulting from AstraZeneca, CSL and Boehringer Ingleheim.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit . The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Collier, W., Haaland, B., Inker, L. et al. Comparing Bayesian hierarchical meta-regression methods and evaluating the influence of priors for evaluations of surrogate endpoints on heterogeneous collections of clinical trials. BMC Med Res Methodol 24 , 39 (2024).

Download citation

Received : 13 September 2023

Accepted : 04 February 2024

Published : 16 February 2024


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Surrogate endpoint
  • Meta-regression
  • Bayesian hierarchical modeling

BMC Medical Research Methodology

ISSN: 1471-2288

medical research methods

share this!

February 20, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:


peer-reviewed publication

trusted source

Research team develops universal and accurate method to calculate how proteins interact with drugs

by Institute of Organic Chemistry and Biochemistry of the CAS

Can it take just a few minutes to calculate how proteins interact with drugs?

A research team from the Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences / IOCB Prague has developed a novel computational method that can accurately describe how proteins interact with molecules of potential drugs and can do so in a mere tens of minutes. This new quantum-mechanical scoring function can thus markedly expedite the search for new drugs. The research has been published in the journal Nature Communications .

The study demonstrates that this is the first universally applicable method of its kind. IOCB Prague computational experts tested it on 10 proteins of different levels of structural complexity, each binding a large variety of small molecules (usually referred to as ligands). They then compared their results not only with those of other corresponding methods, but also with findings of laboratory experiments, and both comparisons turned out very favorably.

"Of course, we are not the only ones working on this. There are several such methods. Usually, however, their speed is offset by low accuracy whereas more accurate calculations can take several days. Our methods are unique in that they can process information about large molecular systems within tens of minutes while retaining the benefits of much more demanding quantum-mechanical calculations," explains Jan Řezáč, corresponding author of the article from the Non-Covalent Interactions group led by Prof. Pavel Hobza.

Experts from this group have been studying intermolecular interactions for a long time. In this research they focus mainly on biomolecules, and the results of their work directly bear on the computer-aided design of drugs. The reason is that when scientists work toward a new drug, they often look for molecules that bind strongly to a particular protein.

Identifying them, however, is akin to finding needles in a haystack, as large numbers of molecules have to be tested to set apart those that show promise. This considerably slows down the discovery of medicinal substances and makes it more expensive. By predicting the strength of protein –ligand binding, and thus singling out molecules that best satisfy a defined set of criteria, computational chemists spare the work of experimenters, which, in turn, significantly accelerates drug discovery .

Journal information: Nature Communications

Provided by Institute of Organic Chemistry and Biochemistry of the CAS

Explore further

Feedback to editors

medical research methods

Scientists discover exotic quantum interference effect in a topological insulator device

51 minutes ago

medical research methods

Researchers harness 2D magnetic materials for energy-efficient computing

53 minutes ago

medical research methods

Study investigates chemical composition of metal-poor star HD 1936

4 hours ago

medical research methods

Research shows how air pollution has offset expected increases in rainfall

medical research methods

Damage to cell membranes causes cell aging, finds new study

medical research methods

Carbon emissions from the destruction of mangrove forests predicted to increase by 50,000% by the end of the century

5 hours ago

medical research methods

New realistic computer model will help robots collect moon dust

9 hours ago

medical research methods

New approach to carbon-14 dating corrects the age of a prehistoric burial site

15 hours ago

medical research methods

Researchers develop molecules for a new class of antibiotics that can overcome drug resistant bacteria

16 hours ago

medical research methods

Utah's Bonneville Salt Flats has long been in flux, new research finds

Relevant physicsforums posts, force that causes ions to move to a lower concentration.

Feb 20, 2024

Differences between ph meters for solutions, creams and oils

Feb 16, 2024

Freon filled balloons go flat QUICKLY! Why?

Feb 13, 2024

Help, I have made a huge mistake with copper sulfate!

Feb 9, 2024

Trying to impress my 8th grade students, made some unknown stuff

Feb 8, 2024

Regenerating ion exchange resin

Jan 29, 2024

More from Chemistry

Related Stories

medical research methods

Faster modeling of interactions between ligands and proteins

Nov 29, 2019

medical research methods

New mapping method illuminates druggable sites on proteins

Jan 2, 2024

medical research methods

Artificial intelligence for drug discovery offers up unexpected results

Nov 13, 2023

medical research methods

New tool developed to efficiently predict relative ligand binding affinity in drug discovery

Oct 25, 2023

Computer simulation of receptors reveals a new ligand-binding site

Jul 26, 2018

medical research methods

From molecule to medicine via machine learning

Dec 16, 2020

Recommended for you

medical research methods

New class of 'intramolecular bivalent glue' could transform cancer drug discovery

21 hours ago

medical research methods

Unraveling the pH-dependent oxygen reduction performance on single-atom catalysts

Feb 21, 2024

medical research methods

Magnetic effects at the origin of life? It's the spin that makes the difference

medical research methods

Researchers synthesize a new manganese-fluorine catalyst with exceptional oxidizing power

medical research methods

AI-assisted robot lab develops new catalysts to synthesize methanol from CO₂

medical research methods

Chemists produce all eight possible variants of polypropionate building blocks from one starting material

Feb 19, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

  • Introduction to Genomics
  • Educational Resources
  • Policy Issues in Genomics
  • The Human Genome Project
  • Funding Opportunities
  • Funded Programs & Projects
  • Division and Program Directors
  • Scientific Program Analysts
  • Contact by Research Area
  • News & Events
  • Research Areas
  • Research investigators
  • Research Projects
  • Clinical Research
  • Data Tools & Resources
  • Genomics & Medicine
  • Family Health History
  • For Patients & Families
  • For Health Professionals
  • Jobs at NHGRI
  • Training at NHGRI
  • Funding for Research Training
  • Professional Development Programs
  • NHGRI Culture
  • Social Media
  • Broadcast Media
  • Image Gallery
  • Press Resources
  • Organization
  • NHGRI Director
  • Mission & Vision
  • Policies & Guidance
  • Institute Advisors
  • Strategic Vision
  • Leadership Initiatives
  • Diversity, Equity, and Inclusion
  • Partner with NHGRI
  • Staff Search

Researchers optimize genetic tests for diverse populations to tackle health disparities

  • Share on Facebook
  • Submit to Reddit
  • Share on LinkedIn

Improved genetic tests more accurately assess disease risk regardless of genetic ancestry.

To prevent an emerging genomic technology from contributing to health disparities, a scientific team funded by the National Institutes of Health has devised new ways to improve a genetic testing method called a polygenic risk score . Since polygenic risk scores have not been effective for all populations, the researchers recalibrated these genetic tests using ancestrally diverse genomic data. As reported in Nature Medicine , the optimized tests provide a more accurate assessment of disease risk across diverse populations.

Genetic tests look at the small differences between individuals’ genomes, known as genomic variants , and polygenic risk scores are tools for assessing many genomic variants across the genome to determine a person’s risk for disease. As the use of polygenic risk scores grows, one major concern is that the genomic datasets used to calculate the scores often heavily overrepresent people of European ancestry.

“Recently, more and more studies incorporate multi-ancestry genomic data into the development of polygenic risk scores,” said Niall Lennon, Ph.D., a scientist at the Broad Institute in Cambridge, Massachusetts and first author of the publication. “However, there are still gaps in genetic ancestral representation in many scores that have been developed to date.”

These “gaps” or missing data can cause false results, where a person could be at high risk for a disease but not receive a high-risk score because their genomic variants are not represented. Although health disparities often stem from systemic discrimination, not genetics, these false results are a way that inequitable genetic tools can exacerbate existing health disparities.

Recently, more and more studies incorporate multi-ancestry genomic data into the development of polygenic risk scores. However, there are still gaps in genetic ancestral representation in many scores that have been developed to date.

In the new study, the researchers improved existing polygenic risk scores using health records and ancestrally diverse genomic data from the All of Us  Research Program, an NIH-funded initiative to collect health data from over a million people from diverse backgrounds.

The All of Us dataset represented about three times as many individuals of non-European ancestry compared to other major datasets previously used for calculating polygenic risk scores. It also included eight times as many individuals with ancestry spanning two or more global populations. Strong representation of these individuals is key as they are more likely than other groups to receive misleading results from polygenic risk scores.

The researchers selected polygenic risk scores for 10 common health conditions, including breast cancer, prostate cancer, chronic kidney disease, coronary heart disease, asthma and diabetes. Polygenic risk scores are particularly useful for assessing risk for conditions that result from a combination of several genetic factors, as is the case for the 10 conditions selected. Many of these health conditions are also associated with health disparities.

The researchers assembled ancestrally diverse cohorts from the All of Us data, including individuals with and without each disease. The genomic variants represented in these cohorts allowed the researchers to recalibrate the polygenic risk scores for individuals of non-European ancestry.

With the optimized scores, the researchers analyzed disease risk for an ancestrally diverse group of 2,500 individuals. About 1 in 5 participants were found to be at high risk for at least one of the 10 diseases.

Most importantly, these participants ranged widely in their ancestral backgrounds, showing that the recalibrated polygenic risk scores are not skewed towards people of European ancestry and are effective for all populations.

“Our model strongly increases the likelihood that a person in the high-risk end of the distribution should receive a high-risk result regardless of their genetic ancestry,” said Dr. Lennon. “The diversity of the All of Us dataset was critical for our ability to do this.”

However, these optimized scores cannot address health disparities alone. “Polygenic risk score results are only useful to patients who can take action to prevent disease or catch it early, and people with less access to healthcare will also struggle to get the recommended follow-up activities, such as more frequent screenings,” said Dr. Lennon.

Still, this work is an important step towards routine use of polygenic risk scores in the clinic to benefit all people. The 2,500 participants in this study represent just an initial look at the improved polygenic risk scores. NIH’s Electronic Medical Health Records and Genomics (eMERGE) Network will continue this research by enrolling a total of 25,000 participants from ancestrally diverse populations in the study’s next phase.

About NHGRI and NIH

About the National Human Genome Research Institute (NHGRI):  At NHGRI, we are focused on advances in genomics research. Building on our leadership role in the initial sequencing of the human genome, we collaborate with the world's scientific and medical communities to enhance genomic technologies that accelerate breakthroughs and improve lives. By empowering and expanding the field of genomics, we can benefit all of humankind. For more information about NHGRI and its programs, visit . About the National Institutes of Health (NIH):  NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit .

Press Contact

Related content.

All of Us Research Program

Last updated: February 19, 2024

Register Now: Co-Clinical Imaging Research Resource Program Annual Meeting

The page you recommended will be added to the "what others are reading" feed on "My ACR".

The page you bookmarked will be added to the "my reading list" feed on "My ACR".

The National Cancer Institute (NCI) will hold its Co-Clinical Imaging Research Resource Program (CIRP) meeting May 20–21. The meeting, which offers free registration, will focus on how quantitative imaging methods are optimized to improve the quality of imaging results for co-clinical trials of adult and pediatric cancers; what co-clinical quantitative imaging information is currently available at NCI co-clinical imaging research resources; and applications of co-clinical imaging to oncology precision medicine.

The CIRP network was formed in 2018, based on a trans-NCI initiative, with a joint effort of the Cancer Imaging Program at the Division of Cancer Treatment and Diagnosis, the Division of Cancer Biology, and the Division of Cancer Prevention. The network’s mission is to advance precision medicine by establishing consensus-based best practices for co-clinical quantitative imaging to enable disease detection, risk stratification and assessment/prediction of response to therapy.

The meeting is hybrid, with in-person attendance at NCI’s Shady Grove, MD campus or virtual attendance. A call for posters is open to all CIRP and non-CIRP affiliated investigators pursuing research within the scientific scope of the CIRP network. Poster abstracts are due April 8, and registration for the meeting closes May 6.

For more information, contact Katie Grady , American College of Radiology® Government Affairs Director.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • v.11(2); 2019 Feb

Logo of cureus

Planning and Conducting Clinical Research: The Whole Process

Boon-how chew.

1 Family Medicine, Universiti Putra Malaysia, Serdang, MYS

The goal of this review was to present the essential steps in the entire process of clinical research. Research should begin with an educated idea arising from a clinical practice issue. A research topic rooted in a clinical problem provides the motivation for the completion of the research and relevancy for affecting medical practice changes and improvements. The research idea is further informed through a systematic literature review, clarified into a conceptual framework, and defined into an answerable research question. Engagement with clinical experts, experienced researchers, relevant stakeholders of the research topic, and even patients can enhance the research question’s relevance, feasibility, and efficiency. Clinical research can be completed in two major steps: study designing and study reporting. Three study designs should be planned in sequence and iterated until properly refined: theoretical design, data collection design, and statistical analysis design. The design of data collection could be further categorized into three facets: experimental or non-experimental, sampling or census, and time features of the variables to be studied. The ultimate aims of research reporting are to present findings succinctly and timely. Concise, explicit, and complete reporting are the guiding principles in clinical studies reporting.

Introduction and background

Medical and clinical research can be classified in many different ways. Probably, most people are familiar with basic (laboratory) research, clinical research, healthcare (services) research, health systems (policy) research, and educational research. Clinical research in this review refers to scientific research related to clinical practices. There are many ways a clinical research's findings can become invalid or less impactful including ignorance of previous similar studies, a paucity of similar studies, poor study design and implementation, low test agent efficacy, no predetermined statistical analysis, insufficient reporting, bias, and conflicts of interest [ 1 - 4 ]. Scientific, ethical, and moral decadence among researchers can be due to incognizant criteria in academic promotion and remuneration and too many forced studies by amateurs and students for the sake of research without adequate training or guidance [ 2 , 5 - 6 ]. This article will review the proper methods to conduct medical research from the planning stage to submission for publication (Table ​ (Table1 1 ).

a Feasibility and efficiency are considered during the refinement of the research question and adhered to during data collection.

Epidemiologic studies in clinical and medical fields focus on the effect of a determinant on an outcome [ 7 ]. Measurement errors that happen systematically give rise to biases leading to invalid study results, whereas random measurement errors will cause imprecise reporting of effects. Precision can usually be increased with an increased sample size provided biases are avoided or trivialized. Otherwise, the increased precision will aggravate the biases. Because epidemiologic, clinical research focuses on measurement, measurement errors are addressed throughout the research process. Obtaining the most accurate estimate of a treatment effect constitutes the whole business of epidemiologic research in clinical practice. This is greatly facilitated by clinical expertise and current scientific knowledge of the research topic. Current scientific knowledge is acquired through literature reviews or in collaboration with an expert clinician. Collaboration and consultation with an expert clinician should also include input from the target population to confirm the relevance of the research question. The novelty of a research topic is less important than the clinical applicability of the topic. Researchers need to acquire appropriate writing and reporting skills from the beginning of their careers, and these skills should improve with persistent use and regular reviewing of published journal articles. A published clinical research study stands on solid scientific ground to inform clinical practice given the article has passed through proper peer-reviews, revision, and content improvement.

Systematic literature reviews

Systematic literature reviews of published papers will inform authors of the existing clinical evidence on a research topic. This is an important step to reduce wasted efforts and evaluate the planned study [ 8 ]. Conducting a systematic literature review is a well-known important step before embarking on a new study [ 9 ]. A rigorously performed and cautiously interpreted systematic review that includes in-process trials can inform researchers of several factors [ 10 ]. Reviewing the literature will inform the choice of recruitment methods, outcome measures, questionnaires, intervention details, and statistical strategies – useful information to increase the study’s relevance, value, and power. A good review of previous studies will also provide evidence of the effects of an intervention that may or may not be worthwhile; this would suggest either no further studies are warranted or that further study of the intervention is needed. A review can also inform whether a larger and better study is preferable to an additional small study. Reviews of previously published work may yield few studies or low-quality evidence from small or poorly designed studies on certain intervention or observation; this may encourage or discourage further research or prompt consideration of a first clinical trial.

Conceptual framework

The result of a literature review should include identifying a working conceptual framework to clarify the nature of the research problem, questions, and designs, and even guide the latter discussion of the findings and development of possible solutions. Conceptual frameworks represent ways of thinking about a problem or how complex things work the way they do [ 11 ]. Different frameworks will emphasize different variables and outcomes, and their inter-relatedness. Each framework highlights or emphasizes different aspects of a problem or research question. Often, any single conceptual framework presents only a partial view of reality [ 11 ]. Furthermore, each framework magnifies certain elements of the problem. Therefore, a thorough literature search is warranted for authors to avoid repeating the same research endeavors or mistakes. It may also help them find relevant conceptual frameworks including those that are outside one’s specialty or system. 

Conceptual frameworks can come from theories with well-organized principles and propositions that have been confirmed by observations or experiments. Conceptual frameworks can also come from models derived from theories, observations or sets of concepts or even evidence-based best practices derived from past studies [ 11 ].

Researchers convey their assumptions of the associations of the variables explicitly in the conceptual framework to connect the research to the literature. After selecting a single conceptual framework or a combination of a few frameworks, a clinical study can be completed in two fundamental steps: study design and study report. Three study designs should be planned in sequence and iterated until satisfaction: the theoretical design, data collection design, and statistical analysis design [ 7 ]. 

Study designs

Theoretical Design

Theoretical design is the next important step in the research process after a literature review and conceptual framework identification. While the theoretical design is a crucial step in research planning, it is often dealt with lightly because of the more alluring second step (data collection design). In the theoretical design phase, a research question is designed to address a clinical problem, which involves an informed understanding based on the literature review and effective collaboration with the right experts and clinicians. A well-developed research question will have an initial hypothesis of the possible relationship between the explanatory variable/exposure and the outcome. This will inform the nature of the study design, be it qualitative or quantitative, primary or secondary, and non-causal or causal (Figure ​ (Figure1 1 ).

An external file that holds a picture, illustration, etc.
Object name is cureus-0011-00000004112-i01.jpg

A study is qualitative if the research question aims to explore, understand, describe, discover or generate reasons underlying certain phenomena. Qualitative studies usually focus on a process to determine how and why things happen [ 12 ]. Quantitative studies use deductive reasoning, and numerical statistical quantification of the association between groups on data often gathered during experiments [ 13 ]. A primary clinical study is an original study gathering a new set of patient-level data. Secondary research draws on the existing available data and pooling them into a larger database to generate a wider perspective or a more powerful conclusion. Non-causal or descriptive research aims to identify the determinants or associated factors for the outcome or health condition, without regard for causal relationships. Causal research is an exploration of the determinants of an outcome while mitigating confounding variables. Table ​ Table2 2 shows examples of non-causal (e.g., diagnostic and prognostic) and causal (e.g., intervention and etiologic) clinical studies. Concordance between the research question, its aim, and the choice of theoretical design will provide a strong foundation and the right direction for the research process and path. 

A problem in clinical epidemiology is phrased in a mathematical relationship below, where the outcome is a function of the determinant (D) conditional on the extraneous determinants (ED) or more commonly known as the confounding factors [ 7 ]:

For non-causal research, Outcome = f (D1, D2…Dn) For causal research, Outcome = f (D | ED)

A fine research question is composed of at least three components: 1) an outcome or a health condition, 2) determinant/s or associated factors to the outcome, and 3) the domain. The outcome and the determinants have to be clearly conceptualized and operationalized as measurable variables (Table ​ (Table3; 3 ; PICOT [ 14 ] and FINER [ 15 ]). The study domain is the theoretical source population from which the study population will be sampled, similar to the wording on a drug package insert that reads, “use this medication (study results) in people with this disease” [ 7 ].

The interpretation of study results as they apply to wider populations is known as generalization, and generalization can either be statistical or made using scientific inferences [ 16 ]. Generalization supported by statistical inferences is seen in studies on disease prevalence where the sample population is representative of the source population. By contrast, generalizations made using scientific inferences are not bound by the representativeness of the sample in the study; rather, the generalization should be plausible from the underlying scientific mechanisms as long as the study design is valid and nonbiased. Scientific inferences and generalizations are usually the aims of causal studies. 

Confounding: Confounding is a situation where true effects are obscured or confused [ 7 , 16 ]. Confounding variables or confounders affect the validity of a study’s outcomes and should be prevented or mitigated in the planning stages and further managed in the analytical stages. Confounders are also known as extraneous determinants in epidemiology due to their inherent and simultaneous relationships to both the determinant and outcome (Figure ​ (Figure2), 2 ), which are usually one-determinant-to-one outcome in causal clinical studies. The known confounders are also called observed confounders. These can be minimized using randomization, restriction, or a matching strategy. Residual confounding has occurred in a causal relationship when identified confounders were not measured accurately. Unobserved confounding occurs when the confounding effect is present as a variable or factor not observed or yet defined and, thus, not measured in the study. Age and gender are almost universal confounders followed by ethnicity and socio-economic status.

An external file that holds a picture, illustration, etc.
Object name is cureus-0011-00000004112-i02.jpg

Confounders have three main characteristics. They are a potential risk factor for the disease, associated with the determinant of interest, and should not be an intermediate variable between the determinant and the outcome or a precursor to the determinant. For example, a sedentary lifestyle is a cause for acute coronary syndrome (ACS), and smoking could be a confounder but not cardiorespiratory unfitness (which is an intermediate factor between a sedentary lifestyle and ACS). For patients with ACS, not having a pair of sports shoes is not a confounder – it is a correlate for the sedentary lifestyle. Similarly, depression would be a precursor, not a confounder.

Sample size consideration: Sample size calculation provides the required number of participants to be recruited in a new study to detect true differences in the target population if they exist. Sample size calculation is based on three facets: an estimated difference in group sizes, the probability of α (Type I) and β (Type II) errors chosen based on the nature of the treatment or intervention, and the estimated variability (interval data) or proportion of the outcome (nominal data) [ 17 - 18 ]. The clinically important effect sizes are determined based on expert consensus or patients’ perception of benefit. Value and economic consideration have increasingly been included in sample size estimations. Sample size and the degree to which the sample represents the target population affect the accuracy and generalization of a study’s reported effects. 

Pilot study: Pilot studies assess the feasibility of the proposed research procedures on small sample size. Pilot studies test the efficiency of participant recruitment with minimal practice or service interruptions. Pilot studies should not be conducted to obtain a projected effect size for a larger study population because, in a typical pilot study, the sample size is small, leading to a large standard error of that effect size. This leads to bias when projected for a large population. In the case of underestimation, this could lead to inappropriately terminating the full-scale study. As the small pilot study is equally prone to bias of overestimation of the effect size, this would lead to an underpowered study and a failed full-scale study [ 19 ]. 

The Design of Data Collection

The “perfect” study design in the theoretical phase now faces the practical and realistic challenges of feasibility. This is the step where different methods for data collection are considered, with one selected as the most appropriate based on the theoretical design along with feasibility and efficiency. The goal of this stage is to achieve the highest possible validity with the lowest risk of biases given available resources and existing constraints. 

In causal research, data on the outcome and determinants are collected with utmost accuracy via a strict protocol to maximize validity and precision. The validity of an instrument is defined as the degree of fidelity of the instrument, measuring what it is intended to measure, that is, the results of the measurement correlate with the true state of an occurrence. Another widely used word for validity is accuracy. Internal validity refers to the degree of accuracy of a study’s results to its own study sample. Internal validity is influenced by the study designs, whereas the external validity refers to the applicability of a study’s result in other populations. External validity is also known as generalizability and expresses the validity of assuming the similarity and comparability between the study population and the other populations. Reliability of an instrument denotes the extent of agreeableness of the results of repeated measurements of an occurrence by that instrument at a different time, by different investigators or in a different setting. Other terms that are used for reliability include reproducibility and precision. Preventing confounders by identifying and including them in data collection will allow statistical adjustment in the later analyses. In descriptive research, outcomes must be confirmed with a referent standard, and the determinants should be as valid as those found in real clinical practice.

Common designs for data collection include cross-sectional, case-control, cohort, and randomized controlled trials (RCTs). Many other modern epidemiology study designs are based on these classical study designs such as nested case-control, case-crossover, case-control without control, and stepwise wedge clustered RCTs. A cross-sectional study is typically a snapshot of the study population, and an RCT is almost always a prospective study. Case-control and cohort studies can be retrospective or prospective in data collection. The nested case-control design differs from the traditional case-control design in that it is “nested” in a well-defined cohort from which information on the cohorts can be obtained. This design also satisfies the assumption that cases and controls represent random samples of the same study base. Table ​ Table4 4 provides examples of these data collection designs.

Additional aspects in data collection: No single design of data collection for any research question as stated in the theoretical design will be perfect in actual conduct. This is because of myriad issues facing the investigators such as the dynamic clinical practices, constraints of time and budget, the urgency for an answer to the research question, and the ethical integrity of the proposed experiment. Therefore, feasibility and efficiency without sacrificing validity and precision are important considerations in data collection design. Therefore, data collection design requires additional consideration in the following three aspects: experimental/non-experimental, sampling, and timing [ 7 ]:

Experimental or non-experimental: Non-experimental research (i.e., “observational”), in contrast to experimental, involves data collection of the study participants in their natural or real-world environments. Non-experimental researches are usually the diagnostic and prognostic studies with cross-sectional in data collection. The pinnacle of non-experimental research is the comparative effectiveness study, which is grouped with other non-experimental study designs such as cross-sectional, case-control, and cohort studies [ 20 ]. It is also known as the benchmarking-controlled trials because of the element of peer comparison (using comparable groups) in interpreting the outcome effects [ 20 ]. Experimental study designs are characterized by an intervention on a selected group of the study population in a controlled environment, and often in the presence of a similar group of the study population to act as a comparison group who receive no intervention (i.e., the control group). Thus, the widely known RCT is classified as an experimental design in data collection. An experimental study design without randomization is referred to as a quasi-experimental study. Experimental studies try to determine the efficacy of a new intervention on a specified population. Table ​ Table5 5 presents the advantages and disadvantages of experimental and non-experimental studies [ 21 ].

a May be an issue in cross-sectional studies that require a long recall to the past such as dietary patterns, antenatal events, and life experiences during childhood.

Once an intervention yields a proven effect in an experimental study, non-experimental and quasi-experimental studies can be used to determine the intervention’s effect in a wider population and within real-world settings and clinical practices. Pragmatic or comparative effectiveness are the usual designs used for data collection in these situations [ 22 ].

Sampling/census: Census is a data collection on the whole source population (i.e., the study population is the source population). This is possible when the defined population is restricted to a given geographical area. A cohort study uses the census method in data collection. An ecologic study is a cohort study that collects summary measures of the study population instead of individual patient data. However, many studies sample from the source population and infer the results of the study to the source population for feasibility and efficiency because adequate sampling provides similar results to the census of the whole population. Important aspects of sampling in research planning are sample size and representation of the population. Sample size calculation accounts for the number of participants needed to be in the study to discover the actual association between the determinant and outcome. Sample size calculation relies on the primary objective or outcome of interest and is informed by the estimated possible differences or effect size from previous similar studies. Therefore, the sample size is a scientific estimation for the design of the planned study.

A sampling of participants or cases in a study can represent the study population and the larger population of patients in that disease space, but only in prevalence, diagnostic, and prognostic studies. Etiologic and interventional studies do not share this same level of representation. A cross-sectional study design is common for determining disease prevalence in the population. Cross-sectional studies can also determine the referent ranges of variables in the population and measure change over time (e.g., repeated cross-sectional studies). Besides being cost- and time-efficient, cross-sectional studies have no loss to follow-up; recall bias; learning effect on the participant; or variability over time in equipment, measurement, and technician. A cross-sectional design for an etiologic study is possible when the determinants do not change with time (e.g., gender, ethnicity, genetic traits, and blood groups). 

In etiologic research, comparability between the exposed and the non-exposed groups is more important than sample representation. Comparability between these two groups will provide an accurate estimate of the effect of the exposure (risk factor) on the outcome (disease) and enable valid inference of the causal relation to the domain (the theoretical population). In a case-control study, a sampling of the control group should be taken from the same study population (study base), have similar profiles to the cases (matching) but do not have the outcome seen in the cases. Matching important factors minimizes the confounding of the factors and increases statistical efficiency by ensuring similar numbers of cases and controls in confounders’ strata [ 23 - 24 ]. Nonetheless, perfect matching is neither necessary nor achievable in a case-control study because a partial match could achieve most of the benefits of the perfect match regarding a more precise estimate of odds ratio than statistical control of confounding in unmatched designs [ 25 - 26 ]. Moreover, perfect or full matching can lead to an underestimation of the point estimates [ 27 - 28 ].

Time feature: The timing of data collection for the determinant and outcome characterizes the types of studies. A cross-sectional study has the axis of time zero (T = 0) for both the determinant and the outcome, which separates it from all other types of research that have time for the outcome T > 0. Retrospective or prospective studies refer to the direction of data collection. In retrospective studies, information on the determinant and outcome have been collected or recorded before. In prospective studies, this information will be collected in the future. These terms should not be used to describe the relationship between the determinant and the outcome in etiologic studies. Time of exposure to the determinant, the time of induction, and the time at risk for the outcome are important aspects to understand. Time at risk is the period of time exposed to the determinant risk factors. Time of induction is the time from the sufficient exposure to the risk or causal factors to the occurrence of a disease. The latent period is when the occurrence of a disease without manifestation of the disease such as in “silence” diseases for example cancers, hypertension and type 2 diabetes mellitus which is detected from screening practices. Figure ​ Figure3 3 illustrates the time features of a variable. Variable timing is important for accurate data capture. 

An external file that holds a picture, illustration, etc.
Object name is cureus-0011-00000004112-i03.jpg

The Design of Statistical Analysis

Statistical analysis of epidemiologic data provides the estimate of effects after correcting for biases (e.g., confounding factors) measures the variability in the data from random errors or chance [ 7 , 16 , 29 ]. An effect estimate gives the size of an association between the studied variables or the level of effectiveness of an intervention. This quantitative result allows for comparison and assessment of the usefulness and significance of the association or the intervention between studies. This significance must be interpreted with a statistical model and an appropriate study design. Random errors could arise in the study resulting from unexplained personal choices by the participants. Random error is, therefore, when values or units of measurement between variables change in non-concerted or non-directional manner. Conversely, when these values or units of measurement between variables change in a concerted or directional manner, we note a significant relationship as shown by statistical significance. 

Variability: Researchers almost always collect the needed data through a sampling of subjects/participants from a population instead of a census. The process of sampling or multiple sampling in different geographical regions or over different periods contributes to varied information due to the random inclusion of different participants and chance occurrence. This sampling variation becomes the focus of statistics when communicating the degree and intensity of variation in the sampled data and the level of inference in the population. Sampling variation can be influenced profoundly by the total number of participants and the width of differences of the measured variable (standard deviation). Hence, the characteristics of the participants, measurements and sample size are all important factors in planning a study.

Statistical strategy: Statistical strategy is usually determined based on the theoretical and data collection designs. Use of a prespecified statistical strategy (including the decision to dichotomize any continuous data at certain cut-points, sub-group analysis or sensitive analyses) is recommended in the study proposal (i.e., protocol) to prevent data dredging and data-driven reports that predispose to bias. The nature of the study hypothesis also dictates whether directional (one-tailed) or non-directional (two-tailed) significance tests are conducted. In most studies, two-sided tests are used except in specific instances when unidirectional hypotheses may be appropriate (e.g., in superiority or non-inferiority trials). While data exploration is discouraged, epidemiological research is, by nature of its objectives, statistical research. Hence, it is acceptable to report the presence of persistent associations between any variables with plausible underlying mechanisms during the exploration of the data. The statistical methods used to produce the results should be explicitly explained. Many different statistical tests are used to handle various kinds of data appropriately (e.g., interval vs discrete), and/or the various distribution of the data (e.g., normally distributed or skewed). For additional details on statistical explanations and underlying concepts of statistical tests, readers are recommended the references as cited in this sentence [ 30 - 31 ]. 

Steps in statistical analyses: Statistical analysis begins with checking for data entry errors. Duplicates are eliminated, and proper units should be confirmed. Extremely low, high or suspicious values are confirmed from the source data again. If this is not possible, this is better classified as a missing value. However, if the unverified suspicious data are not obviously wrong, they should be further examined as an outlier in the analysis. The data checking and cleaning enables the analyst to establish a connection with the raw data and to anticipate possible results from further analyses. This initial step involves descriptive statistics that analyze central tendency (i.e., mode, median, and mean) and dispersion (i.e., (minimum, maximum, range, quartiles, absolute deviation, variance, and standard deviation) of the data. Certain graphical plotting such as scatter plot, a box-whiskers plot, histogram or normal Q-Q plot are helpful at this stage to verify data normality in distribution. See Figure ​ Figure4 4 for the statistical tests available for analyses of different types of data.

An external file that holds a picture, illustration, etc.
Object name is cureus-0011-00000004112-i04.jpg

Once data characteristics are ascertained, further statistical tests are selected. The analytical strategy sometimes involves the transformation of the data distribution for the selected tests (e.g., log, natural log, exponential, quadratic) or for checking the robustness of the association between the determinants and their outcomes. This step is also referred to as inferential statistics whereby the results are about hypothesis testing and generalization to the wider population that the study’s sampled participants represent. The last statistical step is checking whether the statistical analyses fulfill the assumptions of that particular statistical test and model to avoid violation and misleading results. These assumptions include evaluating normality, variance homogeneity, and residuals included in the final statistical model. Other statistical values such as Akaike information criterion, variance inflation factor/tolerance, and R2 are also considered when choosing the best-fitted models. Transforming raw data could be done, or a higher level of statistical analyses can be used (e.g., generalized linear models and mixed-effect modeling). Successful statistical analysis allows conclusions of the study to fit the data. 

Bayesian and Frequentist statistical frameworks: Most of the current clinical research reporting is based on the frequentist approach and hypotheses testing p values and confidence intervals. The frequentist approach assumes the acquired data are random, attained by random sampling, through randomized experiments or influences, and with random errors. The distribution of the data (its point estimate and confident interval) infers a true parameter in the real population. The major conceptual difference between Bayesian statistics and frequentist statistics is that in Bayesian statistics, the parameter (i.e., the studied variable in the population) is random and the data acquired is real (true or fix). Therefore, the Bayesian approach provides a probability interval for the parameter. The studied parameter is random because it could vary and be affected by prior beliefs, experience or evidence of plausibility. In the Bayesian statistical approach, this prior belief or available knowledge is quantified into a probability distribution and incorporated into the acquired data to get the results (i.e., the posterior distribution). This uses mathematical theory of Bayes’ Theorem to “turn around” conditional probabilities.

The goal of research reporting is to present findings succinctly and timely via conference proceedings or journal publication. Concise and explicit language use, with all the necessary details to enable replication and judgment of the study applicability, are the guiding principles in clinical studies reporting.

Writing for Reporting

Medical writing is very much a technical chore that accommodates little artistic expression. Research reporting in medicine and health sciences emphasize clear and standardized reporting, eschewing adjectives and adverbs extensively used in popular literature. Regularly reviewing published journal articles can familiarize authors with proper reporting styles and help enhance writing skills. Authors should familiarize themselves with standard, concise, and appropriate rhetoric for the intended audience, which includes consideration for journal reviewers, editors, and referees. However, proper language can be somewhat subjective. While each publication may have varying requirements for submission, the technical requirements for formatting an article are usually available via author or submission guidelines provided by the target journal. 

Research reports for publication often contain a title, abstract, introduction, methods, results, discussion, and conclusions section, and authors may want to write each section in sequence. However, best practices indicate the abstract and title should be written last. Authors may find that when writing one section of the report, ideas come to mind that pertains to other sections, so careful note taking is encouraged. One effective approach is to organize and write the result section first, followed by the discussion and conclusions sections. Once these are drafted, write the introduction, abstract, and the title of the report. Regardless of the sequence of writing, the author should begin with a clear and relevant research question to guide the statistical analyses, result interpretation, and discussion. The study findings can be a motivator to propel the author through the writing process, and the conclusions can help the author draft a focused introduction.

Writing for Publication

Specific recommendations on effective medical writing and table generation are available [ 32 ]. One such resource is Effective Medical Writing: The Write Way to Get Published, which is an updated collection of medical writing articles previously published in the Singapore Medical Journal [ 33 ]. The British Medical Journal’s Statistics Notes series also elucidates common and important statistical concepts and usages in clinical studies. Writing guides are also available from individual professional societies, journals, or publishers such as Chest (American College of Physicians) medical writing tips, PLoS Reporting guidelines collection, Springer’s Journal Author Academy, and SAGE’s Research methods [ 34 - 37 ]. Standardized research reporting guidelines often come in the form of checklists and flow diagrams. Table ​ Table6 6 presents a list of reporting guidelines. A full compilation of these guidelines is available at the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network website [ 38 ] which aims to improve the reliability and value of medical literature by promoting transparent and accurate reporting of research studies. Publication of the trial protocol in a publicly available database is almost compulsory for publication of the full report in many potential journals.

Graphics and Tables

Graphics and tables should emphasize salient features of the underlying data and should coherently summarize large quantities of information. Although graphics provide a break from dense prose, authors must not forget that these illustrations should be scientifically informative, not decorative. The titles for graphics and tables should be clear, informative, provide the sample size, and use minimal font weight and formatting only to distinguish headings, data entry or to highlight certain results. Provide a consistent number of decimal points for the numerical results, and with no more than four for the P value. Most journals prefer cell-delineated tables created using the table function in word processing or spreadsheet programs. Some journals require specific table formatting such as the absence or presence of intermediate horizontal lines between cells.

Decisions of authorship are both sensitive and important and should be made at an early stage by the study’s stakeholders. Guidelines and journals’ instructions to authors abound with authorship qualifications. The guideline on authorship by the International Committee of Medical Journal Editors is widely known and provides a standard used by many medical and clinical journals [ 39 ]. Generally, authors are those who have made major contributions to the design, conduct, and analysis of the study, and who provided critical readings of the manuscript (if not involved directly in manuscript writing). 

Picking a target journal for submission

Once a report has been written and revised, the authors should select a relevant target journal for submission. Authors should avoid predatory journals—publications that do not aim to advance science and disseminate quality research. These journals focus on commercial gain in medical and clinical publishing. Two good resources for authors during journal selection are Think-Check-Submit and the defunct Beall's List of Predatory Publishers and Journals (now archived and maintained by an anonymous third-party) [ 40 , 41 ]. Alternatively, reputable journal indexes such as Thomson Reuters Journal Citation Reports, SCOPUS, MedLine, PubMed, EMBASE, EBSCO Publishing's Electronic Databases are available areas to start the search for an appropriate target journal. Authors should review the journals’ names, aims/scope, and recently published articles to determine the kind of research each journal accepts for publication. Open-access journals almost always charge article publication fees, while subscription-based journals tend to publish without author fees and instead rely on subscription or access fees for the full text of published articles.


Conducting a valid clinical research requires consideration of theoretical study design, data collection design, and statistical analysis design. Proper study design implementation and quality control during data collection ensures high-quality data analysis and can mitigate bias and confounders during statistical analysis and data interpretation. Clear, effective study reporting facilitates dissemination, appreciation, and adoption, and allows the researchers to affect real-world change in clinical practices and care models. Neutral or absence of findings in a clinical study are as important as positive or negative findings. Valid studies, even when they report an absence of expected results, still inform scientific communities of the nature of a certain treatment or intervention, and this contributes to future research, systematic reviews, and meta-analyses. Reporting a study adequately and comprehensively is important for accuracy, transparency, and reproducibility of the scientific work as well as informing readers.


The author would like to thank Universiti Putra Malaysia and the Ministry of Higher Education, Malaysia for their support in sponsoring the Ph.D. study and living allowances for Boon-How Chew.

The content published in Cureus is the result of clinical experience and/or research by independent individuals or organizations. Cureus is not responsible for the scientific accuracy or reliability of data or conclusions published herein. All content published within Cureus is intended only for educational, research and reference purposes. Additionally, articles published within Cureus should not be deemed a suitable substitute for the advice of a qualified health care professional. Do not disregard or avoid professional medical advice due to content published within Cureus.

The materials presented in this paper is being organized by the author into a book.


  1. Types of Primary Medical Research

    medical research methods

  2. The Importance and Role of Biostatistics in Clinical Research

    medical research methods

  3. (PDF) Basic Methods of Medical Research

    medical research methods

  4. Medical research methodology

    medical research methods

  5. Certified Workshop on Clinical Research & Methodology

    medical research methods

  6. Buy Statistical Methods in Medical Research Journal Subscription

    medical research methods


  1. Proved by “ Clinical Research “

  2. Fundamentals of Clinical Research Series 3

  3. Introduction to research methodology for health sciences-Second day 2023


  5. Advances in Medical Education and Practice

  6. Difference between Research Methods and Research Methodology #research #researchmethodology


  1. Home page

    BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research.

  2. Types of Study in Medical Research

    Three main areas of medical research can be distinguished by study type: basic (experimental), clinical, and epidemiological research. Furthermore, clinical and epidemiological studies can be further subclassified as either interventional or noninterventional. Conclusions

  3. Methodology for clinical research

    Medical research can be divided into primary and secondary research, where primary research involves conducting studies and collecting raw data, which is then analysed and evaluated in secondary research. The successful deployment of clinical research methodology depends upon several factors.


    Health research methodology: A guide for training in research methods Chapter 1 Research and Scientific Methods 1.1 Definition Research is a quest for knowledge through diligent search or investigation or experimentation aimed at the discovery and interpretation of new knowledge. Scientific method is a systematic

  5. Qualitative Methods in Health Care Research

    Healthcare research is a systematic inquiry intended to generate trustworthy evidence about issues in the field of medicine and healthcare. The three principal approaches to health research are the quantitative, the qualitative, and the mixed methods approach.

  6. Home

    January 5, 2024: Methods for Handling Missing Data in Cluster Randomized Trials (recording and slides) (link is external) The Research Methods Resources website provides investigators with important research methods resources to help them design their studies using the best available methods. The material is relevant to both randomized and non ...

  7. Research Methods in Medicine & Health Sciences: Sage Journals

    Research Methods in Medicine & Health Sciences is a peer reviewed journal, publishing rigorous research on established "gold standard" methods and new cutting edge research methods in the health sciences and clinical medicine. This journal is a member of the Committee on Publication Ethics (COPE). Browse by Most recent Most read Most cited Trending

  8. What Are the Different Types of Clinical Research?

    Treatment Research generally involves an intervention such as medication, psychotherapy, new devices, or new approaches to surgery or radiation therapy. Prevention Research looks for better...

  9. A tutorial on methodological studies: the what, when, how and why

    Background Methodological studies - studies that evaluate the design, analysis or reporting of other research-related reports - play an important role in health research. They help to highlight issues in the conduct of research with the aim of improving health research methodology, and ultimately reducing research waste. Main body We provide an overview of some of the key aspects of ...

  10. The BMJ research methods & reporting

    Continue to all research methods & reporting articles. RMR articles discuss the nuts and bolts of doing and writing up research. For doctors interested in doing and interpreting clinical research. Also papers that present new or updated research reporting guidelines.

  11. Clinical Research What is It

    There are two main kinds of clinical research: Observational Studies Observational studies are studies that aim to identify and analyze patterns in medical data or in biological samples, such as tissue or blood provided by study participants. Clinical Trials

  12. Principles of Research Methodology: A Guide for Clinical ...

    Principles of Research Methodology: A Guide for Clinical Investigators is the definitive, comprehensive guide to understanding and performing clinical research. Designed for medical students, physicians, basic scientists involved in translational research, and other health professionals, this indispensable reference also addresses the unique challenges and demands of clinical research and ...

  13. Clinical research methods for treatment, diagnosis, prognosis, etiology

    This narrative review is an introduction for health professionals on how to conduct and report clinical research on six categories: treatment, diagnosis/differential diagnosis, prognosis, etiology, screening, and prevention. The importance of beginning with an appropriate clinical question and the e …

  14. Clinical Research Methods

    "Clinical Research Methods" is a section of the Journal of Clinical Medicine, which publishes invited editorials, as well as excellent research articles and reviews discussing key aspects in the areas of clinical research methodology. We invite experts in this field to consider publishing in our new section and take advantage of the high ...

  15. A practical guide for health researchers

    Overview This comprehensive guide to health research reaches out to a wide spectrum of people: students who wish to learn the basic principles of health research and how to conduct it, field researchers, and those involved in teaching and training of health research methodologies.

  16. Types of studies and research design

    Medical research has evolved, from individual expert described opinions and techniques, to scientifically designed methodology-based studies. Evidence-based medicine (EBM) was established to re-evaluate medical facts and remove various myths in clinical practice. Research methodology is now protocol based with predefined steps.

  17. Clinical Research Methods

    The Clinical Research Methods (CRM) track in Biostatistics responds to a pressing need for advanced training in clinical research design and analysis. As medical school curricula become increasingly full and apprenticeship prospects wane, pathways to becoming a clinical researcher have narrowed. This program offers talented-but-novice ...

  18. Clinical Research Methods

    The Clinical Research Methods track was conceived and designed for clinicians who are pursuing research careers in academic medicine. Candidacy in the CRM program is open to anyone who holds a medical/doctoral degree and/or has several years of clinical research experience. Competencies

  19. Types of Primary Medical Research

    In order to categorize clinical research, it is useful to look at two factors: 1) the timing of the data collection (whether the study is retrospective or prospective) and 2) the study design (e.g. case-control, cohort). 4 Study integrity is improved through randomization, blinding, and statistical analysis.

  20. Statistical Methods in Medical Research: Sage Journals

    Statistical Methods in Medical Research is a highly ranked, peer reviewed scholarly journal and is the leading vehicle for articles in all the main areas of medical statistics and therefore an essential reference for all medical statisticians.

  21. Comparing Bayesian hierarchical meta-regression methods and evaluating

    The methods discussed in this paper are applicable to the two-stage approach often used to establish the trial-level validity of a surrogate endpoint. ... and Reata Pharmaceuticals for research and contracts to Tufts Medical Center; consulting agreements to Tufts Medical Center with Tricida; and consulting agreements with Diamerix. Tom Greene ...

  22. Research team develops universal and accurate method to calculate how

    A research team from the Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences / IOCB Prague has developed a novel computational method that can accurately describe how ...

  23. Clinical research study designs: The essentials

    From an epidemiological standpoint, there are two major types of clinical study designs, observational and experimental. 3 Observational studies are hypothesis‐generating studies, and they can be further divided into descriptive and analytic.

  24. Researchers optimize genetic tests for diverse populations to tackle

    About the National Institutes of Health (NIH): NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes ...

  25. Simple blood test to detect Alzheimer's disease may replace more

    Reviewed. Feb 22 2024 Washington University School of Medicine in St. Louis. A simple blood test to diagnose Alzheimer's disease soon may replace more invasive and expensive screening methods such ...

  26. Register Now: Co-Clinical Imaging Research Resource Program Annual

    The National Cancer Institute (NCI) will hold its Co-Clinical Imaging Research Resource Program (CIRP) meeting May 20-21. The meeting, which offers free registration, will focus on how quantitative imaging methods are optimized to improve the quality of imaging results for co-clinical trials of adult and pediatric cancers; what co-clinical quantitative imaging information is currently ...

  27. Planning and Conducting Clinical Research: The Whole Process

    Medical and clinical research can be classified in many different ways. Probably, most people are familiar with basic (laboratory) research, clinical research, healthcare (services) research, health systems (policy) research, and educational research. Clinical research in this review refers to scientific research related to clinical practices.