Insights hub solutions

## Analyze data

Uncover deep customer insights with fast, powerful features, store insights, curate and manage insights in one searchable platform, scale research, unlock the potential of customer insights at enterprise scale.

Featured reads

Product updates

## Dovetail in the Details: 21 improvements to influence, transcribe, and store

Customer stories

## Okta securely scales customer insights across 30+ teams

Tips and tricks

## Five ways Dovetail helps ReOps scale research

Events and videos

© Dovetail Research Pty. Ltd.

## How does stratified sampling work? Guide & examples

Last updated

7 March 2023

Reviewed by

Cathy Heath

Stratified sampling, or stratified random sampling, is a way researchers choose sample members. It’s based on a defined formula whenever there are defined subgroups, known as stratum/strata.

This formula is:

Stratified random sampling = total sample size / entire population x population of stratum/strata

## Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

- What is stratified sampling?

In stratified sampling, researchers divide the population into homogeneous subgroups based on specific characteristics or attributes.

After creating the strata, researchers select a random sample from each stratum proportionate to its size or importance in the population.

- Four simple steps to stratified sampling

Step one: Define your sampling population and your strata.

Step two: Put your sampling population in their stratum.

Step three : Find your sampling size for that stratum.

Step four: Take random samples from each stratum.

- Examples of stratified sampling

Examples of stratum subgroups for stratified sampling include:

Nationality

Education level

Any special subgrouping section participants are members of

- When to use stratified sampling

Stratified sampling is the choice for probability sampling methods when the stratum members have different variable mean values.

To use stratified sampling as a research technique, you must be able to put every population member of your study into one subgrouping or stratum. Each subgroup should be mutually exclusive.

If participants fit into multiple subgroups, don’t use stratified random sampling.

- Choosing characteristics for stratification

When doing stratified random sampling, choose the characteristics that will divide up your subgroups or individual stratum.

Since you can only place each study participant into one subgroup, your chosen classification must be precise and obvious. Grouping according to gender, age, or education is one way to ensure that members can only be in one subgroup.

However, you can use multiple characteristics as a subgrouping if more than one needs to be part of the study. Just make sure your participants don’t fall into more than one.

- Stratifying by multiple characteristics

While you can only have participants in one subgroup, there are ways to stratify by multiple characteristics. To do this, you must multiply each characteristic by the number of strata.

For example, if you're designating both age and gender, using three groups for gender and ten for age groups, you need to multiply them together, making 30 subgroups.

This way, you designate your population sampling by age range and gender.

An example would be males 10–19, females 30–39, or nonbinary 20–29.

Each age range would have a subset for the gender, so each participant will fulfil only one subgroup while the subgroup deals with two characteristics. Clever, huh?

- Proportionate and disproportionate stratification

Two forms of sampling exist inside stratified sampling: Proportional and disproportional.

In proportionate sampling, the stratum sample size and the stratum's proportion to the population are equal. This means that a subgroup with a lesser percentage in the general population will have a lesser percentage in the sampling size.

For disproportionate sampling, the size of the strata sampling and the population representation is disproportionate. A researcher chooses this method when they want to highlight a minority or under-represented group. This keeps the subgroup's sample size from being too small to have a statistical conclusion.

- Stratified sampling strategies

Researchers can use certain strategies in stratified sampling to hone the project for mean and standard error and sample size allocation. This means the project or study is less fluid, rendering lower errors with a higher spectrum of the populace.

## Mean and standard error

The difference between the sample mean and the population mean is the standard error of the mean or standard error. This lets the researcher know how much variance there would be if they redid the research study with new samples in that population.

The standard error is inversely proportional to sample size, so there’s a smaller standard error with a larger sample size. The standard error of the mean is a part of inferential statistics and represents a dataset's standard deviation.

You can calculate confidence levels and test your hypothesis by using the standard error. A smaller standard error indicates that the sample mean or sample proportion is more precise and is more likely to be a good estimate of the true population mean or population proportion.

## Sample size allocation

Sample size allocation is either proportionate or disproportionate. The population’s practicality, scale, and representative accuracy also determine this.

Proportionate allocation means that the sample size of the stratum is the same as the population size of the stratum.

The equation nh = ( Nh / N ) * n applies to this sample size where:

nh is the total size of the 'h' stratum sample

Nh is the total size of the 'h' stratum population

N is the total size of the population

n is the total size of the sample

The disproportionate sample size allocation means you must divide the population into exhaustive strata and disproportionately pick some aspects from that stratum.

- Advantages of stratified sampling

There are several advantages to using stratified random sampling as a research method.

The main benefit is that the sample captures key characteristics of the population, much like a weighted average. With proportional sampling, the study results are proportional to the total population.

Another benefit is that the study cost should be less because of the administrative ease of formed strata instead of varying and non-uniform subgroups. You lower the strata variability, resulting in more efficient estimates.

There are smaller estimation errors than in a simple random method and greater precision for the estimations. The bigger the strata differences, the more precise the study will be.

When you divide the population into strata and take samples from each stratum, you drastically reduce the possibility of excluding a population group. This means you’re better representing a cross-section of the sample population.

Lastly, there can be survey execution efficiency with easier data collecting . When you’ve chosen the subgroups effectively, putting members into their groups is simple and precise. This creates a quicker turnaround for the study.

- Disadvantages of stratified sampling

Like advantages, choosing a stratified random sampling method for a research project carries disadvantages.

You can't use it in every situation because certain conditions must be in place. The biggest of these conditions is the subgrouping: No study member should be in more than one group. If you can classify a population member into more than one group, you can't use the stratified random sampling method.

Another disadvantage to this research method is that even with proper subgrouping, the population in that subgroup must be reasonably homogenous with the overall population. And if the subgroup members aren't incredibly similar, the sample study will not be useful to the researcher.

Lastly, the application of values to the strata needs to be accurate. You must ensure the groupings represent the population and the values of the strata are accurate. Without value accuracy, there can be bias in the results that lacks fairness to the overall population.

## What is stratified sampling (with example)?

Stratified sampling is a research technique that fairly represents subgroups in a study’s sample population. It is an appropriate research method when predefined and exclusive subgroups are already available.

An example would be age grouping, such as 10-19, 20-29, 30-39, etc. Using these subgroups, the researcher can collect data quicker and easier than other methods.

## What is stratified simple random sampling?

A variant of simple random sampling, stratified simple random sampling is where researchers randomly sample strata groups of the homogenous population.

The results infer the qualities of the population by each stratum. Various factors, such as accuracy representation, practicality, and scale, will determine the sample size selected for random sampling.

## Get started today

Go from raw data to valuable insights with a flexible research platform

## Editor’s picks

Last updated: 11 January 2024

Last updated: 15 January 2024

Last updated: 14 November 2023

Last updated: 25 November 2023

Last updated: 12 May 2023

Last updated: 18 May 2023

Last updated: 10 April 2023

## Latest articles

Related topics, log in or sign up.

Get started with a free trial.

## Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

- Knowledge Base

## Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

- State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a or H 1 ).
- Collect data in a way designed to test the hypothesis.
- Perform an appropriate statistical test .
- Decide whether to reject or fail to reject your null hypothesis.
- Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

## Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

- H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

## The only proofreading tool specialized in correcting academic writing - try for free!

The academic proofreading tool has been trained on 1000s of academic texts and by native English editors. Making it the most accurate and reliable proofreading tool for students.

Try for free

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

- an estimate of the difference in average height between the two groups.
- a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

- Normal distribution
- Descriptive statistics
- Measures of central tendency
- Correlation coefficient

Methodology

- Cluster sampling
- Stratified sampling
- Types of interviews
- Cohort study
- Thematic analysis

Research bias

- Implicit bias
- Cognitive bias
- Survivorship bias
- Availability heuristic
- Nonresponse bias
- Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

## Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved January 15, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

## Is this article helpful?

## Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

## Chapter 2: Summarizing and Visualizing Data

Chapter 3: measure of central tendency, chapter 4: measures of variation, chapter 5: measures of relative standing, chapter 6: probability distributions, chapter 7: estimates, chapter 8: distributions, chapter 9: hypothesis testing, chapter 10: analysis of variance, chapter 11: correlation and regression, chapter 12: statistics in practice.

The JoVE video player is compatible with HTML5 and Adobe Flash. Older browsers that do not support HTML5 and the H.264 video codec will still use a Flash-based video player. We recommend downloading the newest version of Flash here, but we support all versions 10 and above.

The stratified sampling method is commonly used while studying a heterogeneous population—a population with large variations.

Here, the population is divided into two or more subgroups or strata with shared characteristics—in this case, a common color. Each stratum represents a homogenous group for the shared character.

Strata are mutually exclusive—that means a subject must be present in only one stratum, like red must be present in only stratum 1. They must also be exhaustive—meaning all subjects with the shared characteristics, in this case all the balls of the same color, must be present in a single stratum.

Then, a few subjects are randomly drawn from each stratum and combined to form a sample.

For example, suppose one wants to know the average weight of students from classes 7 to 12. Since the population has students of different age groups, the weight varies greatly within the population.

So, students are divided into two strata. Then, students are randomly drawn from each stratum to form the sample, and the average weight is calculated.

## 1.14: Stratified Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.

To choose a stratified sample, divide the population into groups called strata and then take a proportionate number from each stratum. For example, you could stratify (group) your college population by department and then choose a proportionate simple random sample from each stratum (each department) to get a stratified random sample. To choose a simple random sample from each department, number each member of the first department, number each member of the second department, and do the same for the remaining departments. Then use simple random sampling to choose proportionate numbers from the first department and do the same for each of the remaining departments. Those numbers picked from the first department, picked from the second department, and so on represent the members who make up the stratified sample.

A survey of geographical regions can be done using stratified sampling where regions with similar habitat, elevation, and soil type can be divided into strata. Stratified random sampling can also be used to study elections' polling, people who work overtime hours, life expectancy, the income of varying populations, and income for different jobs across a nation.

This text is adapted from Openstax, Introductory Statistics, Section 1.2 Data, Sampling, and Variation in Data and Sampling

## You might already have access to this content!

Please enter your Institution or Company email below to check.

## has access to

Please create a free JoVE account to get access

## Login to access JoVE

Please login to your JoVE account to get access

We use/store this info to ensure you have proper access and that your account is secure. We may use this info to send you notifications about your account, your institutional access, and/or other related products. To learn more about our GDPR policies click here .

If you want more info regarding data storage, please contact [email protected] .

## Please enter your email address so we may send you a link to reset your password.

Your jove unlimited free trial.

Fill the form to request your free trial.

A JoVE representative will be in touch with you shortly.

You have already requested a trial and a JoVE representative will be in touch with you shortly. If you need immediate assistance, please email us at [email protected] .

Please enjoy a free 2-hour trial. In order to begin, please login .

You have unlocked a 2-hour free trial now. All JoVE videos and articles can be accessed for free. To get started, a verification email has been sent to [email protected]. Please follow the link in the email to activate your free trial account. If you do not see the message in your inbox, please check your "Spam" folder.

You have unlocked a 2-hour free trial now. All JoVE videos and articles can be accessed for free.

To get started, a verification email has been sent to [email protected] . Please follow the link in the email to activate your free trial account. If you do not see the message in your inbox, please check your "Spam" folder.

## Get cutting-edge science videos from J o VE sent straight to your inbox every month.

mktb-description

We use cookies to enhance your experience on our website.

By continuing to use our website or clicking “Continue”, you are agreeing to accept our cookies.

## Your free access has ended.

Thank you for taking us up on our offer of free access to JoVE Education until June 15th. Your access has now expired.

If you would like to continue using JoVE, please let your librarian know as they consider the most appropriate subscription options for your institution’s academic community.

Provide feedback to your librarian

If you have any questions, please do not hesitate to reach out to our customer success team .

## MA121: Introduction to Statistics

## Descriptive and Inferential Statistics

Read these sections and complete the questions at the end of each section. Here, we introduce descriptive statistics using examples and discuss the difference between descriptive and inferential statistics. We also talk about samples and populations, explain how you can identify biased samples, and define differential statistics.

## Inferential Statistics

Stratified sampling.

Since simple random sampling often does not ensure a representative sample, a sampling method called stratified random sampling is sometimes used to make the sample more representative of the population. This method can be used if the population has a number of distinct "strata" or groups. In stratified sampling, you first identify members of your sample who belong to each group. Then you randomly sample from each of those subgroups in such a way that the sizes of the subgroups in the sample are proportional to their sizes in the population.

Let's take an example: Suppose you were interested in views of capital punishment at an urban university. You have the time and resources to interview 200 students. The student body is diverse with respect to age; many older people work during the day and enroll in night courses (average age is 39), while younger students generally enroll in day classes (average age of 19). It is possible that night students have different views about capital punishment than day students. If 70% of the students were day students, it makes sense to ensure that 70% of the sample consisted of day students. Thus, your sample of 200 students would consist of 140 day students and 60 night students. The proportion of day students in the sample and in the population (the entire university) would be the same. Inferences to the entire population of students at the university would therefore be more secure.

## Spatial sampling with R

Chapter 4 stratified simple random sampling.

In stratified random sampling the population is divided into subpopulations, for instance, soil mapping units, areas with the same land use or land cover, administrative units, etc. The subareas are mutually exclusive, i.e., they do not overlap, and are jointly exhaustive, i.e., their union equals the entire population (study area). Within each subpopulation, referred to as a stratum, a probability sample is selected by some sampling design. If these probability samples are selected by simple random sampling, as described in the previous chapter, the design is stratified simple random sampling, the topic of this chapter. If sampling units are selected by cluster random sampling, then the design is stratified cluster random sampling.

Stratified simple random sampling is illustrated with Voorst (Figure 4.1 ). Tibble grdVoorst with simulated data contains variable stratum . The strata are combinations of soil class and land use, obtained by overlaying a soil map and a land use map. To select a stratified simple random sample, we set the total sample size \(n\) . The sampling units must be apportioned to the strata. I chose to apportion the units proportionally to the size (area, number of grid cells) of the strata (for details, see Section 4.3 ). The larger a stratum, the more units are selected from this stratum. The sizes of the strata, i.e., the total number of grid cells, are computed with function tapply .

The sum of the stratum sample sizes is 41; we want 40, so we reduce the largest stratum sample size by 1.

The stratified simple random sample is selected with function strata of package sampling ( Tillé and Matei 2021 ) . Argument size specifies the stratum sample sizes.

Within the strata, the grid cells are selected by simple random sampling with replacement ( method = "srswr" ), so that in principle more than one point can be selected within a grid cell, see Chapter 3 for a motivation of this. Function getdata extracts the observations of the selected units from the sampling frame, as well as the spatial coordinates and the stratum of these units. The coordinates of the centres of the selected grid cells are jittered by an amount equal to half the side of the grid cells. In the next code chunk, this is done with function mutate of package dplyr ( Hadley Wickham et al. 2021 ) which is part of package tidyverse ( Hadley Wickham et al. 2019 ) . We have seen the pipe operator %>% of package magrittr ( Bache and Wickham 2020 ) before in Subsection 3.1.2 . If you are not familiar with tidyverse I recommend reading the excellent book ( H. Wickham and Grolemund 2017 ) .

Figure 4.1 shows the selected sample.

Figure 4.1: Stratified simple random sample of size 40 from Voorst. Strata are combinations of soil class and land use.

## 4.1 Estimation of population parameters

With simple random sampling within strata, the estimator of the population mean for simple random sampling (Equation (3.2) ) is applied at the level of the strata. The estimated stratum means are then averaged, using the relative sizes or areas of the strata as weights:

\[\begin{equation} \hat{\bar{z}}= \sum\limits_{h=1}^{H} w_{h}\,\hat{\bar{z}}_{h} \;, \tag{4.1} \end{equation}\]

where \(H\) is the number of strata, \(w_{h}\) is the relative size (area) of stratum \(h\) (stratum weight): \(w_h = N_h/N\) , and \(\hat{\bar{z}}_{h}\) is the estimated mean of stratum \(h\) , estimated by the sample mean for stratum \(h\) :

\[\begin{equation} \hat{\bar{z}}_{h}=\frac{1}{n_h}\sum_{k \in \mathcal{S}_h} z_k\;, \tag{4.2} \end{equation}\]

with \(\mathcal{S}_h\) the sample selected from stratum \(h\) .

The same estimator is found when the \(\pi\) estimator is worked out for stratified simple random sampling. With stratified simple random sampling without replacement and different sampling fractions for the strata, the inclusion probabilities differ among the strata and equal \(\pi_{k} = n_h/N_h\) for all \(k\) in stratum \(h\) , with \(n_h\) the sample size of stratum \(h\) and \(N_h\) the size of stratum \(h\) . Inserting this in the \(\pi\) estimator of the population mean (Equation (2.4) ) gives

\[\begin{equation} \hat{\bar{z}}= \frac{1}{N}\sum\limits_{h=1}^{H}\sum\limits_{k \in \mathcal{S}_h} \frac{z_{k}}{\pi_{k}} = \frac{1}{N}\sum\limits_{h=1}^{H} \frac{N_h}{n_h}\sum\limits_{k \in \mathcal{S}_h} z_{k} = \sum\limits_{h=1}^{H} w_{h}\,\hat{\bar{z}}_{h} \;. \tag{4.3} \end{equation}\]

The sampling variance of the estimator of the population mean is estimated by first estimating the sampling variances of the estimated stratum means, followed by computing the weighted average of the estimated sampling variances of the estimated stratum means. Note that we must square the stratum weights:

\[\begin{equation} \widehat{V}\!\left(\hat{\bar{z}}\right)=\sum\limits_{h=1}^{H}w_{h}^{2}\,\widehat{V}\!\left(\hat{\bar{z}}_{h}\right)\;, \tag{4.4} \end{equation}\]

where \(\widehat{V}\!\left(\hat{\bar{z}}_{h}\right)\) is the estimated sampling variance of \(\hat{\bar{z}}_{h}\) :

\[\begin{equation} \widehat{V}\!\left(\hat{\bar{z}}_{h}\right)= (1-\frac{n_h}{N_h}) \frac{\widehat{S^2}_h(z)}{n_h}\;, \tag{4.5} \end{equation}\]

with \(\widehat{S^2}_h(z)\) the estimated variance of \(z\) within stratum \(h\) :

\[\begin{equation} \widehat{S^2}_h(z)=\frac{1}{n_h-1}\sum\limits_{k \in \mathcal{S}_h}\left(z_{k}-\hat{\bar{z}}_{h}\right)^{2}\;. \tag{4.6} \end{equation}\]

For stratified simple random sampling with replacement of finite populations and stratified simple random sampling of infinite populations the fpcs \(1-(n_h/N_h)\) can be dropped.

Table 4.1 shows per stratum the estimated mean, variance, and sampling variance of the estimated mean of the SOM concentration. We can see large differences in the within-stratum variances. For the stratified sample of Figure 4.1 , the estimated population mean equals 86.3 g kg -1 , and the estimated standard error of this estimator equals 5.8 g kg -1 .

The population mean can also be estimated directly using the basic \(\pi\) estimator (Equation (2.4) ). The inclusion probabilities are included in data.frame mysample , obtained with function getdata (see code chunk above), as variable Prob .

The population total is estimated first, and by dividing this estimated total by the total number of population units \(N\) an estimate of the population mean is obtained.

The two estimates of the population mean are not exactly equal. This is due to rounding errors in the inclusion probabilities. This can be shown by computing the sum of the inclusion probabilities over all population units. This sum should be equal to the sample size \(n=40\) , but as we can see below, this sum is slightly smaller.

Now suppose we ignore that the sample data come from a stratified sampling design and we use the (unweighted) sample mean as an estimate of the population mean.

The sample mean slightly differs from the proper estimate of the population mean (86.334). The sample mean is a biased estimator, but the bias is small. The bias is only small because the stratum sample sizes are about proportional to the sizes of the strata, so that the inclusion probabilities (sampling intensities) are about equal for all strata: 0.0050494, 0.0055344, 0.0052509, 0.006056, 0.005189. The probabilities are not exactly equal because the stratum sample sizes are necessarily rounded to integers and because we reduced the largest sample size by one unit. The bias would have been substantially larger if an equal number of units would have been selected from each stratum, leading to much larger differences in the inclusion probabilities among the strata. Sampling intensity in stratum BA, for instance, then would be much smaller compared to the other strata, and so would be the inclusion probabilities of the units in this stratum as compared to the other strata. Stratum BA then would be underrepresented in the sample. This is not a problem as long as we account for the difference in inclusion probabilities of the units in the estimation of the population mean. The estimated mean of stratum BA then gets the largest weight, equal to the inverse of the inclusion probability. If we do not account for these differences in inclusion probabilities, the estimator of the mean will be seriously biased.

The next code chunk shows how the population mean and its standard error can be estimated with package survey ( Lumley 2021 ) . Note that the stratum weights \(N_h/n_h\) must be passed to function svydesign using argument weight . These are first attached to data.frame mysample by creating a look-up table lut , which is then merged with function merge to data.frame mysample .

## 4.1.1 Population proportion, cumulative distribution function, and quantiles

The proportion of a population satisfying some condition can be estimated by Equations (4.1) and (4.2) , substituting for the study variable \(z_k\) an 0/1 indicator \(y_k\) with value 1 if for unit \(k\) the condition is satisfied, and 0 otherwise (Subsection 3.1.1 ). In general, with stratified simple random sampling the inclusion probabilities are not exactly equal, so that the estimated population proportion is not equal to the sample proportion.

These unequal inclusion probabilities must also be accounted for when estimating the cumulative distribution function (CDF) and quantiles (Subsection 3.1.2 ), as shown in the next code chunk for the CDF.

Figure 4.2 shows the estimated CDF, estimated from the stratified simple random sample of 40 units from Voorst (Figure 4.1 ).

Figure 4.2: Estimated cumulative distribution function of the SOM concentration (g kg -1 ) in Voorst, estimated from the stratified simple random sample of 40 units.

The estimated proportions, or cumulative frequencies, are used to estimate a quantile. These estimates are easily obtained with function svyquantile of package survey .

## 4.1.2 Why should we stratify?

There can be two reasons for stratifying the population:

- we are interested in the mean or total per stratum; or
- we want to increase the precision of the estimated mean or total for the entire population.

Figure 4.3 shows boxplots of the approximated sampling distributions of the \(\pi\) estimator of the mean SOM concentration for stratified simple random sampling and simple random sampling, both of size 40, obtained by repeating the random sampling with each design and estimation 10,000 times.

Figure 4.3: Approximated sampling distribution of the \(\pi\) estimator of the mean SOM concentration (g kg -1 ) in Voorst for stratified simple random sampling (STSI) and simple random sampling (SI) of size 40.

The approximated sampling distributions of the estimators of the population mean with the two designs are not very different. With stratified random sampling, the spread of the estimated means is somewhat smaller. The horizontal red line is the population mean (81.1 g kg -1 ). The gain in precision due to the stratification, referred to as the stratification effect, can be quantified by the ratio of the variance with simple random sampling and the variance with stratified simple random sampling. So, when this variance ratio is larger than 1, stratified simple random sampling is more precise than simple random sampling. For Voorst the stratification effect with proportional allocation (Section 4.3 ) equals 1.310. This means that with simple random sampling we need 1.310 more sampling units than with stratified simple random sampling to obtain an estimate of the same precision.

The stratification effect can be computed from the population variance \(S^2(z)\) (Equation (3.8) ) and the variances within the strata \(S^2_h(z)\) . In the sampling experiment, these variances are known without error because we know the \(z\) -values for all units in the population. In practice, we only know the \(z\) -values for the sampled units. However, a design-unbiased estimator of the population variance is ( de Gruijter et al. 2006 )

\[\begin{equation} \widehat{S^{2}}(z)= \widehat{\overline{z^{2}}}-\left(\hat{\bar{z}}\right)^{2}+ \widehat{V}\!\left(\hat{\bar{z}}\right) \;, \tag{4.7} \end{equation}\]

where \(\widehat{\overline{z^{2}}}\) denotes the estimated population mean of the study variable squared ( \(z^2\) ), obtained in the same way as \(\hat{\bar{z}}\) (Equation (4.1) ), but using squared values, and \(\widehat{V}\!\left(\hat{\bar{z}}\right)\) denotes the estimated variance of the estimator of the population mean (Equation (4.4) ).

The estimated population variance is then divided by the sum of the stratum sample sizes to get an estimate of the sampling variance of the estimator of the mean with simple random sampling of an equal number of units:

\[\begin{equation} \widehat{V}(\hat{\bar{z}}_{\text{SI}}) = \frac{\widehat{S^2}(z)}{\sum_{h=1}^{H}n_h}\;. \tag{4.8} \end{equation}\]

The population variance can be estimated with function s2 of package surveyplanning ( Breidaks, Liberts, and Jukams 2020 ) . However, this function is an implementation of an alternative, consistent estimator of the population variance ( Särndal, Swensson, and Wretman 1992 ) :

\[\begin{equation} \widehat{S^2}(z) = \frac{N-1}{N} \frac{n}{n-1} \frac{1}{N-1} \sum_{k \in \mathcal{S}} \frac{(z_k - \hat{\bar{z}}_{\pi})^2}{\pi_k} \;. \tag{4.9} \end{equation}\]

The design effect is defined as the variance of an estimator of the population mean with the sampling design under study divided by the variance of the \(\pi\) estimator of the mean with simple random sampling of an equal number of units (Section 12.4 ). So, the design effect of stratified random sampling is the reciprocal of the stratification effect. For the stratified simple random sample of Figure 4.1 , the design effect can then be estimated as follows. Function SE extracts the estimated standard error of the estimator of the mean from the output of function svymean . The extracted standard error is then squared to obtain an estimate of the sampling variance of the estimator of the population with stratified simple random sampling. Finally, this variance is divided by the variance with simple random sampling of an equal number of units.

The same value is obtained with argument deff of function svymean .

So, when using package survey , estimation of the population variance is not needed to estimate the design effect. I only added this to make clear how the design effect is computed with functions in package survey . In following chapters I will skip the estimation of the population variance.

The estimated design effect as estimated from the stratified sample is smaller than 1, showing that stratified simple random sampling is more efficient than simple random sampling. The reciprocal of the estimated design effect (1.448) is somewhat larger than the stratification effect as computed in the sampling experiment, but this is an estimate of the design effect from one stratified sample only. The estimated population variance varies among stratified samples, and so does the estimated design effect.

Stratified simple random sampling with proportional allocation (Section 4.3 ) is more precise than simple random sampling when the sum of squares of the stratum means is larger than the sum of squares within strata ( Lohr 1999 ) :

\[\begin{equation} SSB > SSW\;, \tag{4.10} \end{equation}\]

with SSB the weighted sum-of-squares between the stratum means:

\[\begin{equation} SSB = \sum_{h=1}^H N_h (\bar{z}_h-\bar{z})^2 \;, \tag{4.11} \end{equation}\]

and SSW the sum over the strata of the weighted variances within strata (weights equal to \(1-N_h/N\) ):

\[\begin{equation} SSW = \sum_{h=1}^H (1-\frac{N_h}{N})S^2_h\;. \tag{4.12} \end{equation}\]

In other words, the smaller the differences in the stratum means and the larger the variances within the strata, the smaller the stratification effect will be. Figure 4.4 shows a boxplot of the SOM concentration per stratum (soil-land use combination). The stratum means are equal to 83.0, 49.0, 68.8, 92.7, 122.3 g kg -1 . The stratum variances are 1799.2, 238.4, 1652.9, 1905.4, 2942.8 (g kg -1 ) 2 . The large stratum variances explain the modest gain in precision realised by stratified simple random sampling compared to simple random sampling in this case.

Figure 4.4: Boxplots of the SOM concentration (g kg -1 ) for the five strata (soil-land use combinations) in Voorst.

## 4.2 Confidence interval estimate

The \(100(1-\alpha)\) % confidence interval for \(\bar{z}\) is given by

\[\begin{equation} \hat{\bar{z}} \pm t_{\alpha /2, df}\cdot \sqrt{\widehat{V}\!\left(\hat{\bar{z}}\right)} \;, \tag{4.13} \end{equation}\]

where \(t_{\alpha /2,df}\) is the \((1-\alpha /2)\) quantile of a t distribution with \(df\) degrees of freedom. The degrees of freedom \(df\) can be approximated by \(n-H\) , as proposed by Lohr ( 1999 ) . This is the number of the degrees of freedom if the variances within the strata are equal. With unequal variances within strata, \(df\) can be approximated by Sattherwaite’s method ( Nanthakumar and Selvavel 2004 ) :

\[\begin{equation} df \approx \frac {\left(\sum_{h=1}^H w_h^2 \frac{\widehat{S^2}_h(z)}{n_h}\right)^2} {\sum_{h=1}^H w_h^4 \left(\frac{\widehat{S^2}_h(z)}{n_h}\right)^2 \frac {1}{n_h-1}} \;. \tag{4.14} \end{equation}\]

A confidence interval estimate of the population mean can be extracted with method confint of package survey . It uses \(n-H\) degrees of freedom.

## 4.3 Allocation of sample size to strata

After we have decided on the total sample size \(n\) , we must decide how to apportion the units to the strata. It is reasonable to allocate more sampling units to large strata and fewer to small strata. The simplest way to achieve this is proportional allocation:

\[\begin{equation} n_{h}=n \cdot \frac{N_{h}}{\sum N_{h}}\;, \tag{4.15} \end{equation}\]

with \(N_h\) the total number of population units (size) of stratum \(h\) . With infinite populations \(N_h\) is replaced by the area \(A_h\) . The sample sizes computed with this equation are rounded to the nearest integers.

If we have prior information on the variance of the study variable within the strata, then it makes sense to account for differences in variance. Heterogeneous strata should receive more sampling units than homogeneous strata, leading to Neyman allocation:

\[\begin{equation} n_{h}= n \cdot \frac{N_{h}\,S_{h}(z)}{\sum\limits_{h=1}^{H} N_{h}\,S_{h}(z)} \;, \tag{4.16} \end{equation}\]

with \(S_h(z)\) the standard deviation (square root of variance) of the study variable \(z\) in stratum \(h\) .

Finally, the costs of sampling may differ among the strata. It can be relatively expensive to sample nearly inaccessible strata, and we do not want to sample many units there. This leads to optimal allocation:

\[\begin{equation} n_{h}= n \cdot \frac{\frac{N_{h}\,S_{h}(z)}{\sqrt{c_{h}}}}{\sum\limits_{h=1}^{H} \frac{N_{h}\,S_{h}(z)}{\sqrt{c_{h}}}} \;, \tag{4.17} \end{equation}\]

with \(c_h\) the costs per sampling unit in stratum \(h\) . Optimal means that given the total costs this allocation type leads to minimum sampling variance, assuming a linear costs model:

\[\begin{equation} C = c_0 + \sum_{h=1}^H n_h c_h \;, \tag{4.18} \end{equation}\]

with \(c_0\) overhead costs. So, the more variable a stratum and the lower the costs, the more units will be selected from this stratum.

These optimal sample sizes can be computed with function optsize of package surveyplanning .

Table 4.2 shows the proportional and optimal sample sizes for the five strata of the study area Voorst, for a total sample size of 40. Stratum XF is the one-but-smallest stratum and therefore receives only seven sampling units. However, the standard deviation in this stratum is the largest, and as a consequence with optimal allocation the sample size in this stratum is increased by three points, at the cost of stratum EA which is relatively homogeneous.

Figure 4.5 shows the standard error of the \(\pi\) estimator of the mean SOM concentration as a function of the total sample size, for simple random sampling and for stratified simple random sampling with proportional and Neyman allocation. A small extra gain in precision can be achieved using Neyman allocation instead of proportional allocation. However, in practice often Neyman allocation is not achievable, because we do not know the standard deviations of the study variable within the strata. If a quantitative covariate \(x\) is used for stratification (see Sections 4.4 and 13.2 ), the standard deviations \(S_h(z)\) are approximated by \(S_h(x)\) , resulting in approximately optimal stratum sample sizes. The gain in precision compared to proportional allocation is then partly or entirely lost.

Figure 4.5: Standard error of the \(\pi\) estimator of the mean SOM concentration (g kg -1 ) as a function of the total sample size, for simple random sampling (SI) and for stratified simple random sampling with proportional (STSI(prop)) and Neyman allocation (STSI(Neyman)) for Voorst.

Optimal allocation and Neyman allocation assume univariate stratification, i.e., the stratified simple random sample is used to estimate the mean of a single study variable. If we have multiple study variables, optimal allocation becomes more complicated. In Bethel allocation, the total sampling costs, assuming a linear costs model (Equation (4.18) ), are minimised given a constraint on the precision of the estimated mean for each study variable ( Bethel 1989 ) , see Section 4.8 . Bethel allocation can be computed with function bethel of package SamplingStrata ( G. Barcaroli et al. 2020 ) .

- Estimate the population mean and the standard error of the estimator.
- Compute the true standard error of the estimator. Hint: compute the population variances of the study variable \(z\) per stratum, and divide these by the stratum sample sizes.
- Compute a 95% confidence interval estimate of the population mean, using function confint of package survey .
- Looking at Figure 4.4 , which strata do you expect can be merged without losing much precision of the estimated population mean?
- Compute the true sampling variance of the estimator of the mean for this new stratification, for a total sample size of 40 and proportional allocation.
- Compare this true sampling variance with the true sampling variance using the original five strata (same sample size, proportional allocation). What is your conclusion about the new stratification?
- Proof that the sum of the inclusion probabilities over all population units with stratified simple random sampling equals the sample size \(n\) .

## 4.4 Cum-root-f stratification

When we have a quantitative covariate \(x\) related to the study variable \(z\) and \(x\) is known for all units in the population, strata can be constructed with the cum-root-f method using this covariate as a stratification variable, see Dalenius and Hodges ( 1959 ) and Cochran ( 1977 ) . Population units with similar values for the covariate (stratification variable) are grouped into a stratum. Strata are computed as follows:

- Compute a frequency histogram of the stratification variable using a large number of bins.
- Compute the square root of the frequencies.
- Cumulate the square root of the frequencies, i.e., compute \(\sqrt{f_1}\) , \(\sqrt{f_1} + \sqrt{f_2}\) , \(\sqrt{f_1} + \sqrt{f_2} + \sqrt{f_3}\) , etc.
- Divide the cumulative sum of the last bin by the number of strata, multiply this value by \(1,2, \dots, H-1\) , with H the number of strata, and select the boundaries of the histogram bins closest to these values.

In cum-root-f stratification, it is assumed that the covariate values are nearly perfect predictions of the study variable, so that the prediction errors do not affect the stratification. Under this assumption the stratification is optimal with Neyman allocation of sampling units to the strata 4.3 .

Cum-root-f stratification is illustrated with the data of Xuancheng in China. We wish to estimate the mean organic matter concentration in the topsoil (SOM, g kg -1 ) of this area. Various covariates are available that are correlated with SOM, such as elevation, yearly average temperature, slope, and various other terrain attributes. Elevation, the name of this variable in the tibble is dem, is used as as a single stratification variable, see Figure 4.6 . The correlation coefficient of SOM and elevation in a sample of 183 observations is 0.59. The positive correlation can be explained as follows. Temperature is decreasing with elevation, leading to a smaller decomposition rate of organic matter in the soil.

Figure 4.6: Elevation used as a stratification variable in cum-root-f stratification of Xuancheng.

The strata can be constructed with the package stratification ( Baillargeon and Rivest 2011 ) . Argument n of this function is the total sample size, but this value has no effect on the stratification. Argument Ls is the number of strata. I arbitrarily chose to construct five strata. Argument nclass is the number of bins of the histogram. The output object of function strata.cumrootf is a list containing amongst others a numeric vector with the stratum bounds ( bh ) and a factor with the stratum levels of the grid cells ( stratumID ). Finally, note that the values of the stratification variable must be positive. The minimum elevation is -5 m, so I added the absolute value of this minimum to elevation.

Stratum bounds are threshold values of the stratification variable elevation; these stratum bounds are equal to 46.7, 108.3, 214.5, 384.4. Note that the number of stratum bounds is one less than the number of strata. The resulting stratification is shown in Figure 4.7 . Note that most strata are not single polygons, but are made up of many smaller polygons. This may be even more so if the stratification variable shows a noisy spatial pattern. This is not a problem at all, as a stratum is just a collection of population units (raster cells) and need not be spatially contiguous.

Figure 4.7: Stratification of Xuancheng obtained with the cum-root-f method, using elevation as a stratification variable.

- Compute ten cum-root-f strata, using function strata of package sampling .
- Select a stratified simple random sample of 100 units. First, compute the stratum sample sizes for proportional allocation.
- Estimate the population mean of AGB and its sampling variance.
- Compute the true sampling variance of the estimator of the mean for this sampling design (see Exercise 1 for a hint).
- Compute the stratification effect (gain in precision). Hint: compute the sampling variance for simple random sampling by computing the population variance of AGB, and divide this by the total sample size.

## 4.5 Stratification with multiple covariates

If we have multiple variables that are possibly related to the study variable, we may want to use them all or a subset of them as stratification variables. Using the quantitative variables one-by-one in cum-root-f stratification, followed by overlaying the maps with univariate strata, may lead to numerous cross-classification strata.

A simple solution is to construct homogeneous groups, referred to as clusters, of population units (raster cells). The units within a cluster are more similar to each other than to the units in other clusters. Various clustering techniques are available. Here, I use hard k-means.

This is illustrated again with the Xuancheng case study. Five quantitative covariates are used for constructing the strata. Besides elevation, which was used as a single stratification variable in the previous section, now also temperature, slope, topographic wetness index (twi), and profile curvature are used to construct clusters that are used as strata in stratified simple random sampling. To speed up the computations, a subgrid with a spacing of 0.4 km is selected, using function spsample of package sp , see Chapter 5 ( Bivand, Pebesma, and Gómez-Rubio 2013 ) .

Five clusters are computed with k-means using as clustering variables the five covariates mentioned above. The scale of these covariates is largely different, and for this reason they must be scaled before being used in clustering. The k-means algorithm is a deterministic algorithm, i.e., the same initial clustering will end in the same final, optimised clustering. This final clustering can be suboptimal, and therefore it is recommended to repeat the clustering as many times as feasible, with different initial clusterings. Argument nstart is the number of initial clusterings. The best clustering, i.e., the one with the smallest within-cluster sum-of-squares, is kept.

Figure 4.8 shows the five clusters obtained by k-means clustering of the raster cells. These clusters can be used as strata in random sampling.

Figure 4.8: Five clusters obtained by k-means clustering of the raster cells of Xuancheng, using five scaled covariates in clustering.

The size of the clusters used as strata is largely different (Table 4.3 ). This table also shows means of the unscaled covariates used in clustering.

Categorical variables can be accommodated in clustering using the technique proposed by Huang ( 1998 ) , implemented in package clustMixType ( Szepannek 2018 ) .

In the situation that we already have some data of the study variable, an alternative solution is to calibrate a model for the study variable, for instance a multiple linear regression model, using the covariates as predictors, and to use the predictions of the study variable as a single stratification variable in cum-root-f stratification or in optimal spatial stratification, see Section 13.2 .

## 4.6 Geographical stratification

When no covariate is available, we may still decide to apply a geographical stratification . For instance, a square study area can be divided into 4 \(\times\) 4 equal-sized subsquares that are used as strata. When we select one or two points per subsquare, we avoid strong spatial clustering of the sampling points. Geographical stratification improves the spatial coverage . When the study variable is spatially structured, think for instance of a spatial trend, then geographical stratification will lead to more precisely estimated means (smaller sampling variances).

A simple method for constructing geographical strata is k-means clustering ( Brus, Spätjens, and de Gruijter 1999 ) . See Section 17.2 for a simple illustrative example of how geographical strata are computed with k-means clustering. In this approach, the study area is discretised by a large number of grid cells. These grid cells are the objects that are clustered. The clustering variables are simply the spatial coordinates of the centres of the grid cells. This method leads to compact geographical strata, briefly referred to as geostrata. Geostrata can be computed with function kmeans , as shown in Section 4.5 . The two clustering variables have the same scale, so they should not be scaled because this would lead to an arbitrary distortion of geographical distances. The geostrata generally will not have the same number of grid cells. Geostrata of equal size can be attractive, as then the sample becomes selfweighting, i.e., the sample mean is an unbiased estimator of the population mean.

Geostrata of the same size can be computed with function stratify of the package spcosa ( D. Walvoort, Brus, and de Gruijter ( 2020 ) , D. J. J. Walvoort, Brus, and Gruijter ( 2010 ) ), with argument equalArea = TRUE .

If the total number of grid cells divided by the number of strata is an integer, the stratum sizes are exactly equal; otherwise, the difference is one grid cell. D. J. J. Walvoort, Brus, and Gruijter ( 2010 ) describe the k-means algorithms implemented in this package in detail. Argument object of function stratify specifies a spatial object of the population units. In the R code below the subgrid of grdXuancheng generated in Section 4.5 is converted to a SpatialPixelsDataFrame with function gridded of the package sp . The spatial object can also be of class SpatialPolygons . In that case, either argument nGridCells or argument cellSize must be set, so that the vector map in object can be discretised by a finite number of grid cells. Argument nTry specifies the number of initial stratifications in k-means clustering, and therefore is comparable with argument nstart of function kmeans . For more details on spatial stratification using k-means clustering, see Section 17.2 . The k-means algorithm used with equalArea = TRUE takes much more computing time than the one used with equalArea = FALSE .

Function spsample of package spcosa is used to select from each geostratum a simple random sample of two points.

Figure 4.9 shows fifty compact geostrata of equal size for Xuancheng with the selected sampling points. Note that the sampling points are reasonably well spread throughout the study area 2 .

Figure 4.9: Compact geostrata of equal size for Xuancheng and stratified simple random sample of two points per stratum.

Once the observations are done, the population mean can be estimated with function estimate . For Xuancheng I simulated data from a normal distribution, just to illustrate estimation with function estimate . Various statistics can be estimated, among which the population mean (spatial mean), the standard error, and the CDF. The CDF is estimated by transforming the data into indicators (Subsection 3.1.2 ).

The estimated population mean equals 9.8 with an estimated standard error of 0.2.

- Why is it attractive to select at least two points per geostratum?
- The alternative to 50 geostrata and two points per geostratum is 100 geostrata and one point per geostratum. Which sampling strategy will be more precise?
- The geostrata in Figure 4.9 have equal size (area), which can be enforced by argument equalArea = TRUE . Why are equal sizes attractive? Work out the estimator of the population mean for strata of equal size.
- If only one point per stratum is selected, the sampling variance can be approximated by the collapsed strata estimator. In this method, pairs of strata are formed, and the two strata of a pair are joined. In each new stratum we now have two points. With an odd number of strata there will be one group of three strata and three points. The sample is then analysed as if it were a random sample from the new collapsed strata. Suppose we group the strata on the basis of the measurements of the study variable. Do you think this is a proper way of grouping?
- In case you think this is not a proper way of grouping the strata, how would you group the strata?
- Is the sampling variance estimator unbiased? If not, is the sampling variance overestimated or underestimated?
- Can the sampling variance of the estimator of the mean be estimated for bulking within the strata?
- The alternative to analysing the concentration of four composite samples obtained by bulking across strata is to analyse all 20 \(\times\) 4 aliquots separately. The strata have equal size, so the inclusion probabilities are equal. As a consequence, the sample mean is an unbiased estimator of the population mean. Is the precision of this estimated population mean equal to that of the estimated population mean with composite sampling? If not, is it smaller or larger, and why?
- If you use argument equalArea = FALSE in combination with argument type = "composite" , you get an error message. Why does this combination of arguments not work?

## 4.7 Multiway stratification

In Section 4.5 multiple continuous covariates are used to construct clusters of raster cells using k-means. These clusters are then used as strata. This section considers the case where we have multiple categorical and/or continuous variables that we would like to use as stratification variables. The continuous stratification variables are first used to compute strata based on that stratification variable, e.g., using the cum-root-f method. What could be done then is to compute the cross-classification of each unit and use these cross-classifications as strata in random sampling. However, this may lead to numerous strata, maybe even more than the intended sample size. To reduce the total number of strata, we may aggregate cross-classification strata with similar means of the study variable, based on our prior knowledge.

An alternative to aggregation of cross-classification strata is to use the separate strata, i.e., the strata based on an individual stratification variable, as marginal strata in random sampling. How this works is explained in Subsection 9.1.4 .

## 4.8 Multivariate stratification

Another situation is where we have multiple study variables and would like to optimise the stratification and allocation for estimating the population means of all study variables. Optimal stratification for multiple study variables is only relevant if we would like to use different stratification variables for the study variables. In many cases, we do not have reliable prior information about the different study variables justifying the use of multiple stratification variables. We are already happy to have one stratification variable that may serve to increase the precision of the estimated means of all study variables.

However, in case we do have multiple stratification variables tailored to different study variables, the objective is to partition the population in strata, so that for a given allocation, the total sampling costs, assuming a linear costs model (Equation (4.18) ), are minimised given a constraint on the precision of the estimated mean for each study variable.

Package SamplingStrata ( G. Barcaroli et al. 2020 ) can be used to optimise multivariate strata. Giulio Barcaroli ( 2014 ) gives details about the objective function and the algorithm used for optimising the strata. Sampling units are allocated to the strata by Bethel allocation ( Bethel 1989 ) . The required precision is specified in terms of a coefficient of variation, one per study variable.

Multivariate stratification is illustrated with the Meuse data set of package gstat ( E. J. Pebesma 2004 ) . The prior data of heavy metal concentrations of Cd and Zn are used in spatial prediction to create maps of these two study variables.

The maps of natural logs of the two metal concentrations are created by kriging with an external drift, using the square root of the distance to the Meuse river as a predictor for the mean, see Section 21.3 for how this spatial prediction method works.

Figure 4.10 shows the map with the predicted log Cd and log Zn concentrations.

Figure 4.10: Kriging predictions of natural logs of Cd and Zn concentrations in the study area Meuse, used as stratification variables in bivariate stratification.

The predicted log concentrations of the two heavy metals are used as stratification variables in designing a new sample for design-based estimation of the population means of Cd and Zn. For the log of Cd, there are negative predicted concentrations (Figure 4.10 ). This leads to an error when running function optimStrata . The minimum predicted log Cd concentration is -1.7, so I added 2 to the predictions. A variable indicating the domains of interest is added to the data frame. The value of this variable is 1 for all grid cells, so that a sample is designed for estimating the mean of the entire population. As a first step, function buildFrameDF is used to create a data frame that can be handled by function optimStrata . Argument X specifies the stratification variables, and argument Y the study variables. In our case, the stratification variables and the study variables are the same. This is typical for the situation where the stratification variables are obtained by mapping the study variables.

Next, a data frame with the precision requirements for the estimated means is created. The precision requirement is given as a coefficient of variation, i.e., the standard error of the estimated population mean, divided by the estimated mean. The study variables as specified in Y are used to compute the estimated means and the standard errors for a given stratification and allocation.

Finally, the multivariate stratification is optimised by searching for the optimal stratum bounds using a genetic algorithm ( Gershenfeld 1999 ) .

A summary of the strata can be obtained with function summaryStrata .

Column Population contains the sizes of the strata, i.e., the number of grid cells. The total sample size equals 26. The sample size per stratum is computed with Bethel allocation, see Section 4.3 . The last four columns contain the lower and upper bounds of the orthogonal intervals.

Figure 4.11 shows a 2D-plot of the bivariate strata. The strata can be plotted as a series of nested rectangles. All population units in the smallest rectangle belong to stratum 1; all units in the one-but-smallest rectangle that are not in the smallest rectangle belong to stratum 2, etc. If we have more than two stratification variables, the strata form a series of nested hyperrectangles or boxes. The strata are obtained as the Cartesian product of orthogonal intervals.

Figure 4.11: 2D-plot of optimised bivariate strata of the study area Meuse.

Figure 4.12 shows a map of the optimised strata.

Figure 4.12: Map of optimised bivariate strata of the study area Meuse.

The expected coefficient of variation can be extracted with function expected_CV .

The coefficient of variation of Cd is indeed equal to the desired level of 0.02, for Zn it is smaller. So, in this case Cd is the study variable that determines the total sample size of 26 units.

Note that these coefficients of variation are computed from the stratification variables, which are predictions of the study variable. Errors in these predictions are not accounted for. It is well known that kriging is a smoother, so that the variance of the predicted values within a stratum is smaller than the variance of the true values. As a consequence, the coefficients of variation of the predictions underestimate the coefficients of variation of the study variables. See Section 13.2 for how prediction errors and spatial correlation of prediction errors can be accounted for in optimal stratification. An additional problem is that I added a value of 2 to the log Cd concentrations. This does not affect the standard error of the estimated mean, but does affect the estimated mean, so that also for this reason the coefficient of variation of the study variable Cd is underestimated.

The compact geostrata and the sample are plotted with package ggplot2 . A simple alternative is to use method plot of spcosa : plot(mygeostrata, mysample) . ↩︎

## User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
- Duis aute irure dolor in reprehenderit in voluptate
- Excepteur sint occaecat cupidatat non proident

## Keyboard Shortcuts

S.3.3 hypothesis testing examples.

- Example: Right-Tailed Test
- Example: Left-Tailed Test
- Example: Two-Tailed Test

## Brinell Hardness Scores

An engineer measured the Brinell hardness of 25 pieces of ductile iron that were subcritically annealed. The resulting data were:

The engineer hypothesized that the mean Brinell hardness of all such ductile iron pieces is greater than 170. Therefore, he was interested in testing the hypotheses:

H 0 : μ = 170 H A : μ > 170

The engineer entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

## Descriptive Statistics

$\mu$: mean of Brinelli

Null hypothesis H₀: $\mu$ = 170 Alternative hypothesis H₁: $\mu$ > 170

The output tells us that the average Brinell hardness of the n = 25 pieces of ductile iron was 172.52 with a standard deviation of 10.31. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 10.31 by the square root of n = 25, is 2.06). The test statistic t * is 1.22, and the P -value is 0.117.

If the engineer set his significance level α at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were greater than 1.7109 (determined using statistical software or a t -table):

Since the engineer's test statistic, t * = 1.22, is not greater than 1.7109, the engineer fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 24 curve and to the right of the test statistic t * = 1.22:

In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than \(\alpha\) = 0.05, the engineer fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

Note that the engineer obtains the same scientific conclusion regardless of the approach used. This will always be the case.

## Height of Sunflowers

A biologist was interested in determining whether sunflower seedlings treated with an extract from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and subsequently obtained the following heights:

The biologist's hypotheses are:

H 0 : μ = 15.7 H A : μ < 15.7

The biologist entered her data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. She obtained the following output:

$\mu$: mean of Height

Null hypothesis H₀: $\mu$ = 15.7 Alternative hypothesis H₁: $\mu$ < 15.7

The output tells us that the average height of the n = 33 sunflower seedlings was 13.664 with a standard deviation of 2.544. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 13.664 by the square root of n = 33, is 0.443). The test statistic t * is -4.60, and the P -value, 0.000, is to three decimal places.

Minitab Note. Minitab will always report P -values to only 3 decimal places. If Minitab reports the P -value as 0.000, it really means that the P -value is 0.000....something. Throughout this course (and your future research!), when you see that Minitab reports the P -value as 0.000, you should report the P -value as being "< 0.001."

If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t * were less than -1.6939 (determined using statistical software or a t -table):s-3-3

Since the biologist's test statistic, t * = -4.60, is less than -1.6939, the biologist rejects the null hypothesis. That is, the test statistic falls in the "critical region." There is sufficient evidence, at the α = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

If the biologist used the P -value approach to conduct her hypothesis test, she would determine the area under a t n - 1 = t 32 curve and to the left of the test statistic t * = -4.60:

In the output above, Minitab reports that the P -value is 0.000, which we take to mean < 0.001. Since the P -value is less than 0.001, it is clearly less than \(\alpha\) = 0.05, and the biologist rejects the null hypothesis. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

Note again that the biologist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

## Gum Thickness

A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch. A quality control specialist regularly checks this claim. On one production run, he took a random sample of n = 10 pieces of gum and measured their thickness. He obtained:

The quality control specialist's hypotheses are:

H 0 : μ = 7.5 H A : μ ≠ 7.5

The quality control specialist entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

$\mu$: mean of Thickness

Null hypothesis H₀: $\mu$ = 7.5 Alternative hypothesis H₁: $\mu \ne$ 7.5

The output tells us that the average thickness of the n = 10 pieces of gums was 7.55 one-hundredths of an inch with a standard deviation of 0.1027. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 0.1027 by the square root of n = 10, is 0.0325). The test statistic t * is 1.54, and the P -value is 0.158.

If the quality control specialist sets his significance level \(\alpha\) at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were less than -2.2616 or greater than 2.2616 (determined using statistical software or a t -table):

Since the quality control specialist's test statistic, t * = 1.54, is not less than -2.2616 nor greater than 2.2616, the quality control specialist fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all of the manufacturer's spearmint gum differs from 7.5 one-hundredths of an inch.

If the quality control specialist used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 9 curve, to the right of 1.54 and to the left of -1.54:

In the output above, Minitab reports that the P -value is 0.158. Since the P -value, 0.158, is greater than \(\alpha\) = 0.05, the quality control specialist fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all pieces of spearmint gum differs from 7.5 one-hundredths of an inch.

Note that the quality control specialist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

In our review of hypothesis tests, we have focused on just one particular hypothesis test, namely that concerning the population mean \(\mu\). The important thing to recognize is that the topics discussed here — the general idea of hypothesis tests, errors in hypothesis testing, the critical value approach, and the P -value approach — generally extend to all of the hypothesis tests you will encounter.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

- Publications
- Account settings
- Advanced Search
- Journal List
- v.15(1); 2023 Jan

## Clinical Research: A Review of Study Designs, Hypotheses, Errors, Sampling Types, Ethics, and Informed Consent

Addanki purna singh.

1 Physiology, Department of Biomedical Sciences, Saint James School of Medicine, The Quarter, AIA

## Sabitha Vadakedath

2 Biochemistry, Prathima Institute of Medical Sciences, Karimnagar, IND

## Venkataramana Kandi

3 Clinical Microbiology, Prathima Institute of Medical Sciences, Karimnagar, IND

Recently, we have been noticing an increase in the emergence and re-emergence of microbial infectious diseases. In the previous 100 years, there were several incidences of pandemics caused by different microbial species like the influenza virus , human immunodeficiency virus (HIV), dengue virus , severe acute respiratory syndrome Coronavirus (SARS-CoV), middle east respiratory syndrome coronavirus (MERS-CoV), and SARS-CoV-2 that were responsible for severe morbidity and mortality among humans. Moreover, non-communicable diseases, including malignancies, diabetes, heart, liver, kidney, and lung diseases, have been on the rise. The medical fraternity, people, and governments all need to improve their preparedness to effectively tackle health emergencies. Clinical research, therefore, assumes increased significance in the current world and may potentially be applied to manage human health-related problems. In the current review, we describe the critical aspects of clinical research that include research designs, types of study hypotheses, errors, types of sampling, ethical concerns, and informed consent.

## Introduction and background

To conduct successful and credible research, scientists/researchers should understand the key elements of clinical research like neutrality (unbiased), reliability, validity, and generalizability. Moreover, results from clinical studies are applied in the real world to benefit human health. As a result, researchers must understand the various types of research designs [ 1 ]. Before choosing a research design, the researchers must work out the aims and objectives of the study, identify the study population, and address the ethical concerns associated with the clinical study. Another significant aspect of clinical studies is the research methodology and the statistical applications that are employed to process the data and draw conclusions. There are primarily two types of research designs: observational studies and experimental studies [ 2 ]. Observational studies do not involve any interventions and are therefore considered inferior to experimental designs. The experimental studies include the clinical trials that are carried out among a selected group of participants who are given a drug to assess its safety and efficacy in treating and managing the disease. However, in the absence of a study group, a single-case experimental design (SCED) was suggested as an alternative methodology that is equally reliable as a randomization study [ 3 ]. The single case study designs are called N-of-1 type clinical trials [ 4 , 5 ]. The N-of-1 study design is being increasingly applied in healthcare-related research. Experimental studies are complex and are generally performed by pharmaceutical industries as a part of research and development activities during the discovery of a therapeutic drug/device. Also, clinical trials are undertaken by individual researchers or a consortium. In a recent study, the researchers were cautioned about the consequences of a faulty research design [ 6 ]. It was noted that clinical studies on the effect of the gut microbiome and its relationship with the feed could potentially be influenced by the choice of the experimental design, controls, and comparison groups included in the study. Moreover, clinical studies can be affected by sampling errors and biases [ 7 ]. In the present review, we briefly discuss the types of clinical study designs, study hypotheses, sampling errors, and the ethical issues associated with clinical research.

Research design

A research design is a systematic elucidation of the whole research process that includes methods and techniques, starting from the planning of research, execution (data collection), analysis, and drawing a logical conclusion based on the results obtained. A research design is a framework developed by a research team to find an answer/solution to a problem. The research designs are of several types that include descriptive research, surveys, correlation type, experimental, review (systematic/literature), and meta-analysis. The choice of research design is determined by the type of research question that is opted for. Both the research design and the research question are interdependent. For every research question, a complementary/appropriate research design must have been chosen. The choice of research design influences the research credibility, reliability, and accuracy of the data collected. A well-defined research design would contain certain elements that include a specific purpose of the research, methods to be applied while collecting and analyzing the data, the research methodology used to interpret the collected data, research infrastructure, limitations, and most importantly, the time required to complete the research. The research design can broadly be categorized into two types: qualitative and quantitative designs. In a qualitative research method, the collected data are measured and evaluated using mathematical and statistical applications. Whereas in quantitative research, a larger sample size is selected, and the results derived from statistics can benefit society. The various types of research designs are shown in Figure Figure1 1 [ 8 ].

Types of research studies

There are various types of research study designs. The researcher who aims to take up the study determines the type of study design to choose among the available ones. The choice of study design depends on many factors that include but are not limited to the research question, the aim of the study, the available funds, manpower, and infrastructure, among others. The research study designs include systematic reviews, meta-analyses, randomized controlled trials, cross-sectional studies, case-control studies, cohort studies, case reports/studies, animal experiments, and other in vitro studies, as shown in Figure Figure2 2 [ 9 ].

Systematic Reviews

In these studies, the researcher makes an elaborate and up-to-date search of the available literature. By doing a systematic review of a selected topic, the researcher collects the data, analyses it, and critically evaluates it to evolve with impactful conclusions. Systematic reviews could equip healthcare professionals with more than adequate evidence with respect to the decisions to be taken during improved patient management that may include diagnosis, interventions, prognosis, and others [ 10 ]. A recent systematic research study evaluated the role of socioeconomic conditions on the knowledge of risk factors for stroke in the World Health Organization (WHO) European region. This study collected data from PubMed, Embase, Web of Science (WoS), and other sources and finally included 20 studies and 67,309 subjects. This study concluded that the high socioeconomic group had better knowledge of risk factors and warning signs of stroke and suggested improved public awareness programs to better address the issue [ 11 ].

Meta-Analysis

Meta-analysis is like a systematic review, but this type of research design uses quantitative tools that include statistical methods to draw conclusions. Such a research method is therefore considered both equal and superior to the original research studies. Both the systematic review and the meta-analyses follow a similar research process that includes the research question, preparation of a protocol, registration of the study, devising study methods using inclusion and exclusion criteria, an extensive literature survey, selection of studies, assessing the quality of the evidence, data collection, analysis, assessment of the evidence, and finally the interpretation/drawing the conclusions [ 12 ]. A recent research study, using a meta-analytical study design, evaluated the quality of life (QoL) among patients suffering from chronic pulmonary obstructive disease (COPD). This study used WoS to collect the studies, and STATA to analyze and interpret the data. The study concluded that non-therapeutic mental health and multidisciplinary approaches were used to improve QoL along with increased support from high-income countries to low and middle-income countries [ 13 ].

Cross-Sectional Studies

These studies undertake the observation of a select population group at a single point in time, wherein the subjects included in the studies are evaluated for exposure and outcome simultaneously. These are probably the most common types of studies undertaken by students pursuing postgraduation. A recent study evaluated the activities of thyroid hormones among the pre- and post-menopausal women attending a tertiary care teaching hospital. The results of this study demonstrated that there was no significant difference in the activities of thyroid hormones in the study groups [ 14 ].

Cohort Studies

Cohort studies use participant groups called cohorts, which are followed up for a certain period and assess the exposure to the outcome. They are used for epidemiological observations to improve public health. Although cohort studies are laborious, financially burdensome, and difficult to undertake as they require a large population group, such study designs are frequently used to conduct clinical studies and are only second to randomized control studies in terms of their significance [ 15 ]. Also, cohort studies can be undertaken both retrospectively and prospectively. A retrospective study assessed the effect of alcohol intake among human immunodeficiency virus (HIV)-infected persons under the national program of the United States of America (USA) for HIV care. This study, which included more than 30,000 HIV patients under the HIV care continuum program, revealed that excessive alcohol use among the participants affected HIV care, including treatment [ 16 ].

Case-Control Study

The case-control studies use a single point of observation among two population groups that are categorized based on the outcome. Those who had an outcome are termed as cases, and the ones who did not develop the disease are called control groups. This type of study design is easy to perform and is extensively undertaken as a part of medical research. Such studies are frequently used to assess the efficacy of vaccines among the population [ 17 ]. A previous study evaluated the activities of zinc among patients suffering from beta-thalassemia and compared it with the control group. This study concluded that the patients with beta-thalassemia are prone to hypozincaemia and had low concentrations of zinc as compared to the control group [ 18 ].

Case Studies

Such types of studies are especially important from the perspective of patient management. Although these studies are just observations of single or multiple cases, they may prove to be particularly important in the management of patients suffering from unusual diseases or patients presenting with unusual presentations of a common disease. Listeria is a bacterium that generally affects humans in the form of food poisoning and neonatal meningitis. Such an organism was reported to cause breast abscesses [ 19 ].

Randomized Control Trial

This is probably the most trusted research design that is frequently used to evaluate the efficacy of a novel pharmacological drug or a medical device. This type of study has a negligible bias, and the results obtained from such studies are considered accurate. The randomized controlled studies use two groups, wherein the treatment group receives the trial drug and the other group, called the placebo group, receives a blank drug that appears remarkably like the trial drug but without the pharmacological element. This can be a single-blind study (only the investigator knows who gets the trial drug and who is given a placebo) or a double-blind study (both the investigator and the study participant have no idea what is being given). A recent study (clinical trial registration number: {"type":"clinical-trial","attrs":{"text":"NCT04308668","term_id":"NCT04308668"}} NCT04308668 ) concluded that post-exposure prophylaxis with hydroxychloroquine does not protect against Coronavirus disease-19 (COVID-19) after a high and moderate risk exposure when the treatment was initiated within four days of potential exposure [ 20 ].

Factors that affect study designs

Among the different factors that affect a study's design is the recruitment of study participants. It is not yet clear as to what is the optimal method to increase participant participation in clinical studies. A previous study had identified that the language barrier and the long study intervals could potentially hamper the recruitment of subjects for clinical trials [ 21 ]. It was noted that patient recruitment for a new drug trial is more difficult than for a novel diagnostic study [ 22 ].

Reproducibility is an important factor that affects a research design. The study designs must be developed in such a way that they are replicable by others. Only those studies that can be re-done by others to generate the same/similar results are considered credible [ 23 ]. Choosing an appropriate study design to answer a research question is probably the most important factor that could affect the research result [ 24 ]. This can be addressed by clearly understanding various study designs and their applications before selecting a more relevant design.

Retention is another significant aspect of the study design. It is hard to hold the participants of a study until it is completed. Loss of follow-up among the study participants will influence the study results and the credibility of the study. Other factors that considerably influence the research design are the availability of a source of funding, the necessary infrastructure, and the skills of the investigators and clinical trial personnel.

Synthesizing a research question or a hypothesis

A research question is at the core of research and is the point from which a clinical study is initiated. It should be well-thought-out, clear, and concise, with an arguable element that requires the conduction of well-designed research to answer it. A research question should generally be a topic of curiosity in the researcher's mind, and he/she must be passionate enough about it to do all that is possible to answer it [ 25 ].

A research question must be generated/framed only after a preliminary literature search, choosing an appropriate topic, identifying the audience, self-questioning, and brainstorming for its clarity, feasibility, and reproducibility.

A recent study suggested a stepwise process to frame the research question. The research question is developed to address a phenomenon, describe a case, establish a relationship for comparison, and identify causality, among others. A better research question is one that describes the statement of the problem, points out the study area, puts focus on the study aspects, and guides data collection, analysis, and interpretation. The aspects of a good research question are shown in Figure Figure3 3 [ 26 ].

Research questions may be framed to prove the existence of a phenomenon, describe and classify a condition, elaborate the composition of a disease condition, evaluate the relationship between variables, describe and compare disease conditions, establish causality, and compare the variables resulting in causality. Some examples of the research questions include: (i) Does the coronavirus mutate when it jumps from one organism to another?; (ii) What is the therapeutic efficacy of vitamin C and dexamethasone among patients infected with COVID-19?; (iii) Is there any relationship between COPD and the complications of COVID-19?; (iv) Is Remdesivir alone or in combination with vitamin supplements improve the outcome of COVID-19?; (v) Are males more prone to complications from COVID-19 than females?

The research hypothesis is remarkably like a research question except for the fact that in a hypothesis the researcher assumes either positively or negatively about a causality, relation, correlation, and association. An example of a research hypothesis: overweight and obesity are risk factors for cardiovascular disease.

Types of errors in hypothesis testing

An assumption or a preliminary observation made by the researcher about the potential outcome of research that is being envisaged may be called a hypothesis. There are different types of hypotheses, including simple hypotheses, complex hypotheses, empirical hypotheses, statistical hypotheses, null hypotheses, and alternative hypotheses. However, the null hypothesis (H0) and the alternative hypothesis (HA) are commonly practiced. The H0 is where the researcher assumes that there is no relation/causality/effect, and the HA is when the researcher believes/assumes that there is a relationship/effect [ 27 , 28 ].

Hypothesis testing is affected by two types of errors that include the type I error (α) and the type II error (β). The type I error (α) occurs when the investigator contradicts the null hypothesis despite it being true, which is considered a false positive error. The type II error (β) happens when the researcher considers/accepts the null hypothesis despite it being false, which is termed a false negative error [ 28 , 29 ].

The reasons for errors in the hypothesis testing may be due to bias and other causes. Therefore, the researchers set the standards for studies to rule out errors. A 5% deviation (α=0.05; range: 0.01-0.10) in the case of a type I error and up to a 20% probability (β=0.20; range: 0.05-0.20) for type II errors are generally accepted [ 28 , 29 ]. The features of a reasonable hypothesis include simplicity and specificity, and the hypothesis is generally determined by the researcher before the initiation of the study and during the preparation of the study proposal/protocol [ 28 , 29 ].

The applications of hypothesis testing

A hypothesis is tested by assessing the samples, where appropriate statistics are applied to the collected data and an inference is drawn from it. It was noted that a hypothesis can be made based on the observations of physicians using anatomical characteristics and other physiological attributes [ 28 , 30 ]. The hypothesis may also be tested by employing proper statistical techniques. Hypothesis testing is carried out on the sample data to affirm the null hypothesis or otherwise.

An investigator needs to believe the null hypothesis or accept that the alternate hypothesis is true based on the data collected from the samples. Interestingly, most of the time, a study that is carried out has only a 50% chance of either the null hypothesis or the alternative hypothesis coming true [ 28 , 31 ].

Hypothesis testing is a step-by-step strategy that is initiated by the assumption and followed by the measures applied to interpret the results, analysis, and conclusion. The margin of error and the level of significance (95% free of type I error and 80% free of type II error) are initially fixed. This enables the chance for the study results to be reproduced by other researchers [ 32 ].

Ethics in health research

Ethical concerns are an important aspect of civilized societies. Moreover, ethics in medical research and practice assumes increased significance as most health-related research is undertaken to find a cure or discover a medical device/diagnostic tool that can either diagnose or cure the disease. Because such research involves human participants, and due to the fact that people approach doctors to find cures for their diseased condition, ethics, and ethical concerns take center stage in public health-related clinical/medical practice and research.

The local and international authorities like the Drugs Controller General of India (DCGI), and the Food and Drug Administration (FDA) make sure that health-related research is carried out following all ethical concerns and good clinical practice (GCP) guidelines. The ethics guidelines are prescribed by both national and international bodies like the Indian Council of Medical Research (ICMR) and the World Medical Association (WMA) Declaration of Helsinki guidelines for ethical principles for medical research involving human subjects [ 33 ].

Ethical conduct is more significant during clinical practice, medical education, and research. It is recommended that medical practitioners embark on self-regulation of the medical profession. Becoming proactive in terms of ethical practices will enhance the social image of a medical practitioner/researcher. Moreover, such behavior will allow people to comprehend that this profession is not for trade/money but for the benefit of the patients and the public at large. Administrations should promote ethical practitioners and penalize unethical practitioners and clinical research organizations. It is suggested that the medical curriculum should incorporate ethics as a module and ethics-related training must be delivered to all medical personnel. It should be noted that a tiny seed grows into an exceptionally gigantic tree if adequately watered and taken care of [ 33 ]. It is therefore inevitable to address the ethical concerns in medical education, research and practice to make more promising medical practitioners and acceptable medical educators and researchers as shown in Figure Figure4 4 .

Sampling in health research

Sampling is the procedure of picking a precise number of individuals from a defined group to accomplish a research study. This sample is a true representative subset of individuals who potentially share the same characteristics as a large population, and the results of the research can be generalized [ 34 , 35 ]. Sampling is a prerogative because it is almost impossible to include all the individuals who want to partake in a research investigation. A sample identified from a representative population can be depicted in Figure Figure5 5 .

Sampling methods are of different types and are broadly classified into probability sampling and non-probability sampling. In a probability sampling method, which is routinely employed in quantitative research, each individual in the representative population is provided with an equivalent likelihood of being included in the study [ 35 ]. Probability sampling can be separated into four types that include simple random sampling, systematic sampling, stratified sampling, and cluster sampling, as shown in Figure Figure6 6 .

Simple Random Sample

In the simple random sampling method, every person in the representative population is given an equal chance of being selected. It may use a random number generator for selecting the study participants. To study the employees’ perceptions of government policies, a researcher initially assigns a number to each employee [ 35 ]. After this, the researcher randomly chooses the required number of samples. In this type of sampling method, each one has an equal chance of being selected.

Systematic Sample

In this sampling method, the researcher selects the study participants depending on a pre-defined order (1, 3, 5, 7, 9…), wherein the researcher assigns a serial number (1-100 (n)) to volunteers [ 35 ]. The researcher in this type of sample selects a number from 1 to 10 and later applies a systematic pattern to select the sample like 2, 12, 22, 32, etc.

Stratified Sample

The stratified sampling method is applied when the people from whom the sample must be taken have mixed features. In this type of sampling, the representative population is divided into clusters/strata based on attributes like age, sex, and other factors. Subsequently, a simple random or systematic sampling method is applied to select the samples from each group. Initially, different age groups, sexes, and other characters were selected as a group [ 35 ]. The investigator finds his/her sample from each group using simple or systematic random sampling methods.

Cluster Sample

This sampling method is used to create clusters of the representative population with mixed qualities. Because such groups have mixed features, each one can be regarded as a sample. Conversely, a sample can be developed by using simple random/systematic sampling approaches. The cluster sampling method is similar to stratified sampling but differs in the group characteristics, wherein each group has representatives of varied ages, different sexes, and other mixed characters [ 35 ]. Although each group appears as a sample, the researcher again applies a simple or systematic random sampling method to choose the sample.

Non-probability Sample

In this type of sampling method, the participants are chosen based on non-random criteria. In a non-probability sampling method, the volunteers do not have an identical opportunity to get selected. This method, although it appears to be reasonable and effortless to do, is plagued by selection bias. The non-probability sampling method is routinely used in experimental and qualitative research. It is suitable to perform a pilot study that is carried out to comprehend the qualities of a representative population [ 35 ]. The non-probability sampling is of four types, including convenience sampling, voluntary response sampling, purposive sampling, and snowball sampling, as shown in Figure Figure7 7 .

Convenience Sample

In the convenience sampling method, there are no pre-defined criteria, and only those volunteers who are readily obtainable to the investigator are included. Despite it being an inexpensive method, the results yielded from studies that apply convenience sampling may not reflect the qualities of the population, and therefore, the results cannot be generalized [ 35 ]. The best example of this type of sampling is when the researcher invites people from his/her own work area (company, school, city, etc.).

Voluntary Response Sample

In the voluntary response sampling method, the participants volunteer to partake in the study. This sampling method is similar to convenience sampling and therefore leaves sufficient room for bias [ 35 ]. The researcher waits for the participants who volunteer in the study in a voluntary response sampling method.

Purposive Sample/Judgment Sample

In the purposive or judgemental sampling method, the investigator chooses the participants based on his/her judgment/discretion. In this type of sampling method, the attributes (opinions/experiences) of the precise population group can be achieved [ 35 ]. An example of such a sampling method is the handicapped group's opinion on the facilities at an educational institute.

Snowball Sample

In the snowball sampling method, suitable study participants are found based on the recommendations and suggestions made by the participating subjects [ 36 ]. In this type, the individual/sample recruited by the investigator in turn invites/recruits other participants.

Significance of informed consent and confidentiality in health research

Informed consent is a document that confirms the fact that the study participants are recruited only after being thoroughly informed about the research process, risks, and benefits, along with other important details of the study like the time of research. The informed consent is generally drafted in the language known to the participants. The essential contents of informed consent include the aim of research in a way that is easily understood even by a layman. It must also brief the person as to what is expected from participation in the study. The informed consent contains information such as that the participant must be willing to share demographic characteristics, participate in the clinical and diagnostic procedures, and have the liberty to withdraw from the study at any time during the research. The informed consent must also have a statement that confirms the confidentiality of the participant and the protection of privacy of information and identity [ 37 ].

Health research is so complex that there may be several occasions when a researcher wants to re-visit a medical record to investigate a specific clinical condition, which also requires informed consent [ 38 ]. Awareness of biomedical research and the importance of human participation in research studies is a key element in the individual’s knowledge that may contribute to participation or otherwise in the research study [ 39 ]. In the era of information technology, the patient’s medical data are stored as electronic health records. Research that attempts to use such records is associated with ethical, legal, and social concerns [ 40 , 41 ]. Improved technological advances and the availability of medical devices to treat, diagnose, and prevent diseases have thrown a new challenge at healthcare professionals. Medical devices are used for interventions only after being sure of the potential benefit to the patients, and at any cost, they must never affect the health of the patient and only improve the outcome [ 42 ]. Even in such cases, the medical persons must ensure informed consent from the patients.

## Conclusions

Clinical research is an essential component of healthcare that enables physicians, patients, and governments to tackle health-related problems. Increased incidences of both communicable and non-communicable diseases warrant improved therapeutic interventions to treat, control, and manage diseases. Several illnesses do not have a treatment, and for many others, the treatment, although available, is plagued by drug-related adverse effects. For many other infections, like dengue, we require preventive vaccines. Therefore, clinical research studies must be carried out to find solutions to the existing problems. Moreover, the knowledge of clinical research, as discussed briefly in this review, is required to carry out research and enhance preparedness to counter conceivable public health emergencies in the future.

The content published in Cureus is the result of clinical experience and/or research by independent individuals or organizations. Cureus is not responsible for the scientific accuracy or reliability of data or conclusions published herein. All content published within Cureus is intended only for educational, research and reference purposes. Additionally, articles published within Cureus should not be deemed a suitable substitute for the advice of a qualified health care professional. Do not disregard or avoid professional medical advice due to content published within Cureus.

The authors have declared that no competing interests exist.

Teach yourself statistics

## Sample Size Calculator

The Sample Size Calculator guides you step-by-step to find the right sample design for your research. Use the calculator to create powerful, cost-effective survey sampling plans.

- Find the optimum design (most precision, least cost).
- See how sample size affects cost and precision.
- Compare different survey sampling methods.
- Assess statistical power and Type II errors .

In each section, provide the data requested. The calculator does the rest. It crunches numbers and generates an easy-to-understand report that summarizes key findings, describes the analysis, and documents calculator inputs.

## Describe the Research

The first step in using this calculator is to describe the research you are conducting.

Sampling method: This calculator can work with three sampling methods: simple random sampling , stratified sampling , and cluster sampling . With stratified sampling, you have the option to choose proportional stratification . And with cluster sampling, you can choose between one-stage sampling and two-stage sampling .

Purpose of research: Most surveys are designed to estimate a population parameter . Some surveys also test a hypothesis about that parameter. In the dropdown box below, indicate whether your survey is focused only on estimation or whether it also includes a hypothesis test.

Sample statistic: Most surveys use a sample statistic to estimate a population parameter . This calculator can work with two types of statistics: a mean score and a proportion . From the dropdown box below, select the sample statistic that you will use.

Population parameter: Most surveys use a sample statistic to estimate a population parameter . From the dropbox below, identify the population parameter that you want to estimate.

## Define the Output

In this section, specify the output that you desire from the Sample Size Calculator - your main goal plus any optional analyses.

Optional analyses (select all that apply)

## Define Statistical Constraints

Please provide the following additional information:

- Significance level: The significance level is the probability of rejecting the null hypothesis when it is, in fact, true. Researchers often use significance levels of 0.01 or 0.05.

Significance level

- Confidence level: A confidence level refers to the percentage of all possible samples that can be expected to include the true population parameter. For example, suppose all possible samples were selected from the same population, and a confidence interval were computed for each sample. A 95% confidence level implies that 95% of the confidence intervals would include the true population parameter.

Confidence level

Margin of error: The margin of error is a measure of sampling error in a survey result. The bigger the margin of error, the less confidence a researcher can place in the survey result. To be meaningful, a confidence level should be reported along with the margin of error. For example, a margin of error of 3 percent based on a 95% confidence level is more impressive than a margin of error of 3 percent based on a 90% confidence level.

Null hypothesis: The null hypothesis is the hypothesis that sample observations result purely from chance. For example, if a coin were fair and balanced, we would expect that half the flips would result in Heads and half, in Tails. Thus, the null hypothesis would state that the proportion of coin flips resulting in heads would equal 0.5.

Alternative hypothesis: The alternative hypothesis , denoted by H a , describes the region of rejection for a null hypothesis. Typically, an alternative hypothesis states that the true value of a population parameter (μ) is not equal to the value in the null hypothesis, less than the value in the null hypothesis, or greater than the value in the null hypothesis. Which option describes the alternative hypothesis for your study?

True value: The power of the hypothesis test measures the probability of rejecting the null hypothesis when the true value does not equal the hypothesized value. This calculator will compute power based on the true value that you provide.

## Describe the Population and Sample

Specify population and sample properties. If you don't know the exact values for these inputs, estimate. Base your estimate on the best information that you have - personal experience with similar studies, previous research by others, or even subjective judgment.

In the table below, enter the following information for each for each stratum : population size , sample size , sample mean , sample standard deviation , and sample proportion .

## Describe Survey Costs

Effective sampling plans provide maximum precision for minimum cost. The following cost factors are critical:

- Overhead: Overhead refers to total indirect cost. Indirect costs (consultant fees, office supplies, etc.) are largely unaffected by the number of surveys completed.
- Unit cost: Unit cost is the average cost per completed survey, not counting overhead. Unit cost may vary from one stratum to another.

Is unit cost the same for each stratum ?

## Write Report

To analyze data you've entered and write a report summarizing key findings, click the Calculate button.

Oops! Something went wrong.

## Weighted Data

When a researcher is interested in examining distinct subgroups within a population, it is often best to use a stratified random sample to better represent the entire population. A stratified random sample involves dividing the population of interest into several smaller groups, called "strata" and then taking a simple random sample from each of these smaller groups. This method is commonly used when we want to guarantee a large enough sample from each subgroup. When this type of sampling method is used, it is important to use weights to take the relative size of each subgroup into account. This "Weighted Data" site introduces basic techniques used in estimating and testing population parameters using weights. Note that these labs can be used at various levels:

- Introductory courses with no statistics background: Should it Pass? And Political Preferences1
- Courses that require some statistics background: Political Preferences2 , CAM and NHANES(Health)
- Advanced course supplements: Mathematical Details of the Rao-Scott Method and Types of Weights, Subsetting, Strata and Clustering

## Introductory Activities: Calculating population estimates with weighted data

Student Handout

Online Visualizations

Variable Descriptions

This activity is designed to help introductory statistics students understand how survey data (stratified samples) are collected and how weights are needed to create population estimates.

Online Visualization

Data provided by the Inter-university Consortium for Political and Social Research

This activity is designed to help students understand how to create population estimates with weighted data. An online app allows students to visualize how estimates vary based upon appropriate use of weights. Additional information on the dataset may be found at: Here

## Intermediate Activities: Hypothesis testing with categorical weighted data

These activities describes the use of hypothesis tests with weighted data. Online apps allows students to visualize how estimates vary based upon appropriate use of weights.

Data provided by the Inter-university Consortium for Political and Social Research . Additional information on this specific dataset may be found here

Data provided by the National Center for Health Statistics .

Data provided by the National Center for Health Statistics and is available within the MOSAIC Package in R.

## Advanced Supplements

- Mathematical Details of the Rao-Scott Method
- Types of Weights, Subsetting, Strata and Clustering
- The following link connects you to All Grinnell Data Visualization Apps.

Contact Pam Fellers or Shonda Kuiper for R Markdown files.

## IMAGES

## VIDEO

## COMMENTS

- 1) ] * p is a sample estimate of the variance within stratum is the number of observations from stratum in the sample, and p is a sample estimate of the proportion is stratum Why do we care about the variance within each stratum? Stratum variance is needed to compute the standard error. And why do we care about the standard error? Read on.

Upload Account details Logout My account Overview Availability Information package Account details Logout Admin Log in Search Proofreading & Editing Thesis Paper AI Proofreader Essay Checker PhD dissertation APA editing Academic editing College admissions essay Personal statement English proofreading Spanish, French, or German About our services

First, divide the population into strata. Then, draw a random sample from each stratum. Dan Kernler, CC BY-SA 4.0, via Wikimedia Commons

Stratified Random Sampling: Definition Stratified random sampling is used when your population is divided into strata (characteristics like male and female or education level), and you want to include the stratum when taking your sample.

Jason Sadowski · Follow Published in Towards Data Science · 9 min read · Dec 12, 2019 Rock stratification is a topic for a different day. Credit: https://en.wikipedia.org/wiki/Stratigraphy When constructing an experiment, one of the most important questions to ask is: How should I sample from my population?

Stratified sampling, or stratified random sampling, is a way researchers choose sample members. It's based on a defined formula whenever there are defined subgroups, known as stratum/strata. Stratified random sampling = total sample size / entire population x population of stratum/strata.

Home Knowledge Base Statistics Hypothesis Testing | A Step-by-Step Guide with Easy Examples Hypothesis Testing | A Step-by-Step Guide with Easy Examples Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023. Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics.

A systematic random sample is one in which every k th item is selected. k is determined by dividing the number of items in the sampling frame by sample size. A stratified random sample is one in which the population is first divided into relevant strata or subgroups and then, using the simple random sample method, a sample is drawn from each ...

To choose a stratified sample, divide the population into groups called strata and then take a proportionate number from each stratum. For example, you could stratify (group) your college population by department and then choose a proportionate simple random sample from each stratum (each department) to get a stratified random sample.

In stratified sampling, you first identify members of your sample who belong to each group. Then you randomly sample from each of those subgroups in such a way that the sizes of the subgroups in the sample are proportional to their sizes in the population. Let's take an example: Suppose you were interested in views of capital punishment at an ...

Hypothesis Tests With Survey Data Matrix algebra Combinations & permutations Bartlett's test Combinations/permutations Bartlett's test Find the critical value (often a z-score or a t-score). Define the upper limit of the region of acceptance. Define the lower limit of the region of acceptance.

Large-Sample Hypothesis Tests for Stratified Group-Testing Data Joshua M. Tebbs and Melinda H. McCann Insect-vectored plant diseases impact the agricultural community each year by af fecting the economic value, the quantity, and the quality of crops. Controlling the spread

1 Answer Sorted by: 0 First let's deal with your first problem (sample size changes). I will think about the second problem later.. I would probably assume that all the observations of a given procedure have the same distribution, no matter what the year is. So for example for procedure A I would make the assumption that you draw:

12.3 Statistical testing of hypothesis. 12.3.1 Sample size for testing a proportion; 12.4 Accounting for design effect; ... Chapter 4 Stratified simple random sampling. In stratified random sampling the population is divided into subpopulations, for instance, soil mapping units, areas with the same land use or land cover, administrative units ...

If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a tn - 1 = t24 curve and to the right of the test statistic t * = 1.22: In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than α = 0.05, the engineer fails to reject the null hypothesis.

PMC9898800 As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health. Learn more: PMC Disclaimer | PMC Copyright Notice Cureus. 2023 Jan; 15 (1): e33374. Published online 2023 Jan 4. doi: 10.7759/cureus.33374

Nov 5, 2021 Page ID OpenStax OpenStax

Sampling the population Simple random sampling - sometimes known as random selection - and stratified random sampling are both statistical measuring tools. Using random selection will minimize bias, as each member of the population is treated equally with an equal likelihood of being sampled.

2 8 In the table below, enter the following information for each for each stratum : population size, sample size, sample mean, sample standard deviation, and sample proportion . 6 Describe Survey Costs Effective sampling plans provide maximum precision for minimum cost. The following cost factors are critical:

These activities describes the use of hypothesis tests with weighted data. Online apps allows students to visualize how estimates vary based upon appropriate use of weights. Political Preferences 2 Weighted Hypothesis Tests with R Instructions Student Handout Online Visualization Dataset

hypothesis testing - How to calculate sampling error for proportionate sampling? - Cross Validated How to calculate sampling error for proportionate sampling? Ask Question Asked 1 year, 8 months ago Modified 1 year, 8 months ago Viewed 41 times 0 I have done sampling using Proportionate Stratified Random Sampling.

Beaumont and Bocci (Citation 2009) investigated a general weighted bootstrap method for hypothesis testing under complex sampling designs in the context of Wald tests for linear ... (Citation 1992) under stratified random sampling, but their method is only applicable when the parameter of interest is a smooth function of population totals ...

Bias from stratified sampling. Due to a lack of significance and the large size of the dataset (which had binomial responses with 20,000 responses out of a sample of 15,000,000) my peer has used random sampling to reduce the amount of data and import into our modelling software. This adjusted dataset has the full 20,000 responses but under ...