When you plan to conduct an experiment, there are some factors that are under direct control of the researcher:
Unlike α and n, which are specified by the researcher, the magnitude of β depends on the actual value of the population parameter. In addition, β is influenced by the effect size (e.g., Cohen’s d), which can be used to determine a standardized measure of the magnitude of an observed effect. The following parameters are affected more indirectly:
Although β is unknown, it is related to α. For example, if we would like to be absolutely sure that we do not falsely identify an effect which does not exist (i.e., make a type I error), this means that the probability of identifying an effect that does exist (i.e., 1-β) decreases and vice versa. Thus, an extremely low value of α (e.g., α = 0.0001) will result in intolerably high β errors. A common approach is to set α=0.05 and 1-β=0.80.
Unlike the t-value of our test, the effect size (d) is unaffected by the sample size and can be categorized as follows (see Cohen, J. 1988):
In order to test more subtle effects (smaller effect sizes), you need a larger sample size compared to the test of more obvious effects. In this paper , you can find a list of examples for different effect sizes and the number of observations you need to reliably find an effect of that magnitude. Although the exact effect size is unknown before the experiment, you might be able to make a guess about the effect size (e.g., based on previous studies).
If you wish to obtain a standardized measure of the effect, you may compute the effect size (Cohen’s d) using the cohensD() function from the lsr package. Using the examples from the independent-means t-test above, we would use:
According to the thresholds defined above, this effect would be judged to be a small-medium effect.
For the dependent-means t-test, we would use:
According to the thresholds defined above, this effect would also be judged to be a small-medium effect.
When constructing an experimental design, your goal should be to maximize the power of the test while maintaining an acceptable significance level and keeping the sample as small as possible. To achieve this goal, you may use the pwr package, which let’s you compute n , d , alpha , and power . You only need to specify three of the four input variables to get the fourth.
For example, what sample size do we need (per group) to identify an effect with d = 0.6, α = 0.05, and power = 0.8:
Or we could ask, what is the power of our test with 51 observations in each group, d = 0.6, and α = 0.05:
From my experience, students tend to place a lot of weight on p-values when interpreting their research findings. It is therefore important to note some points that hopefully help to put the meaning of a “significant” vs. “insignificant” test result into perspective.
Significant result
Insignificant result
Thus, you should not base your research conclusion on p-values alone!
It is also crucial to determine the sample size before you run the experiment or before you start your analysis. Why? Consider the following example:
This is called p-hacking and should be avoided at all costs. Assuming that both groups come from the same population (i.e., there is no difference in the means): What is the likelihood that the result will be significant at some point? In other words, what is the likelihood that you will draw the wrong conclusion from your data that there is an effect, while there is none? This is shown in the following graph using simulated data - the color red indicates significant test results that arise although there is no effect (i.e., false positives).
Figure 5.1: p-hacking (red indicates false positives)
This chapter is primarily based on Field, A., Miles J., & Field, Z. (2012): Discovering Statistics Using R. Sage Publications, chapters 10 & 12 .
In the previous section we learned how to compare means using a t-test. The t-test has some limitations since it only lets you compare 2 means and you can only use it with one independent variable. However, often we would like to compare means from 3 or more groups. In addition, there may be instances in which you manipulate more than one independent variable. For these applications, ANOVA (ANalysis Of VAriance) can be used. Hence, to conduct ANOVA you need:
A treatment is a particular combination of factor levels, or categories. One-way ANOVA is used when there is only one categorical variable (factor). In this case, a treatment is the same as a factor level. N-way ANOVA is used with two or more factors. Note that we are only going to talk about a single independent variable in the context of ANOVA. If you have multiple independent variables please refere to the chapter on Regression .
Let’s use an example to see how ANOVA works. Similar to the previous example it is also imaginable that the music streaming service experiments with a recommendation system for user created playlists. We now have three groups, the control group “A” with the current system, treatment group “B” who have access to playlists created by other users but are not shown recommendations and treatment group “C” who are shown recommendations for user created playlists. As always, we load and inspect the data first:
The null hypothesis, typically, is that all means are equal (non-directional hypothesis). Hence, in our case:
\[H_0: \mu_1 = \mu_2 = \mu_3\]
The alternative hypothesis is simply that the means are not all equal, i.e.,
\[H_1: \textrm{Means are not all equal}\]
If you wanted to put this in mathematical notation, you could also write:
\[H_1: \exists {i,j}: {\mu_i \ne \mu_j} \]
To get a first impression if there are any differences in listening times across the experimental groups, we use the describeBy(...) function from the psych package:
In addition, you should visualize the data using appropriate plots:
Figure 5.2: Plot of means
Note that ANOVA is an omnibus test, which means that we test for an overall difference between groups. Hence, the test will only tell you if the group means are different, but it won’t tell you exactly which groups are different from another.
So why don’t we then just conduct a series of t-tests for all combinations of groups (i.e., A vs. B, A vs. C, B vs. C)? The reason is that if we assume each test to be independent, then there is a 5% probability of falsely rejecting the null hypothesis (Type I error) for each test. In our case:
This means that the overall probability of making a Type I error is 1-(0.95 3 ) = 0.143, since the probability of no Type I error is 0.95 for each of the three tests. Consequently, the Type I error probability would be 14.3%, which is above the conventional standard of 5%. This is also known as the family-wise or experiment-wise error.
The basic concept underlying ANOVA is the decomposition of the variance in the data. There are three variance components which we need to consider:
The following figure shows the different variance components using a generalized data matrix:
Decomposing variance
The total variation is determined by the variation between the categories (due to our experimental manipulation) and the within-category variation that is due to extraneous factors (e.g., promotion of artists on a social network):
\[SS_T= SS_M+SS_R\]
To get a better feeling how this relates to our data set, we can look at the data in a slightly different way. Specifically, we can use the dcast(...) function from the reshape2 package to convert the data to wide format:
In this example, X 1 from the generalized data matrix above would refer to the factor level “A”, X 2 to the level “B”, and X 3 to the level “C”. Y 11 refers to the first data point in the first row (i.e., “13”), Y 12 to the second data point in the first row (i.e., “21”), etc.. The grand mean ( \(\overline{Y}\) ) and the category means ( \(\overline{Y}_c\) ) can be easily computed:
To see how each variance component can be derived, let’s look at the data again. The following graph shows the individual observations by experimental group:
Figure 5.3: Sum of Squares
To compute the total variation in the data, we consider the difference between each observation and the grand mean. The grand mean is the mean over all observations in the data set. The vertical lines in the following plot measure how far each observation is away from the grand mean:
Figure 5.4: Total Sum of Squares
The formal representation of the total sum of squares (SS T ) is:
\[ SS_T= \sum_{i=1}^{N} (Y_i-\bar{Y})^2 \]
This means that we need to subtract the grand mean from each individual data point, square the difference, and sum up over all the squared differences. Thus, in our example, the total sum of squares can be calculated as:
\[ \begin{align} SS_T =&(13−24.67)^2 + (14−24.67)^2 + … + (2−24.67)^2\\ &+(21−24.67)^2 + (18-24.67)^2 + … + (17−24.67)^2\\ &+(30−24.67)^2 + (37−24.67)^2 + … + (28−24.67)^2\\ &=30855.64 \end{align} \]
You could also compute this in R using:
For the subsequent analyses, it is important to understand the concept behind the degrees of freedom . Remember that in order to estimate a population value from a sample, we need to hold something in the population constant. In ANOVA, the df are generally one less than the number of values used to calculate the SS. For example, when we estimate the population mean from a sample, we assume that the sample mean is equal to the population mean. Then, in order to estimate the population mean from the sample, all but one scores are free to vary and the remaining score needs to be the value that keeps the population mean constant. In our example, we used all 300 observations to calculate the sum of square, so the total degrees of freedom (df T ) are:
\[\begin{equation} \begin{split} df_T = N-1=300-1=299 \end{split} \tag{5.1} \end{equation}\]
Now we know that there are 26646.33 units of total variation in our data. Next, we compute how much of the total variation can be explained by the differences between groups (i.e., our experimental manipulation). To compute the explained variation in the data, we consider the difference between the values predicted by our model for each observation (i.e., the group mean) and the grand mean. The group mean refers to the mean value within the experimental group. The vertical lines in the following plot measure how far the predicted value for each observation (i.e., the group mean) is away from the grand mean:
Figure 5.5: Model Sum of Squares
The formal representation of the model sum of squares (SS M ) is:
\[ SS_M= \sum_{j=1}^{c} n_j(\bar{Y}_j-\bar{Y})^2 \]
where c denotes the number of categories (experimental groups). This means that we need to subtract the grand mean from each group mean, square the difference, and sum up over all the squared differences. Thus, in our example, the model sum of squares can be calculated as:
\[ \begin{align} SS_M &= 100*(15.47−24.67)^2 + 100*(24.88−24.67)^2 + 100*(33.66−24.67)^2 \\ &= 21321.21 \end{align} \]
You could also compute this manually in R using:
In this case, we used the three group means to calculate the sum of squares, so the model degrees of freedom (df M ) are:
\[ df_M= c-1=3-1=2 \]
Lastly, we calculate the amount of variation that cannot be explained by our model. In ANOVA, this is the sum of squared distances between what the model predicts for each data point (i.e., the group means) and the observed values. In other words, this refers to the amount of variation that is caused by extraneous factors, such as differences between product characteristics of the products in the different experimental groups. The vertical lines in the following plot measure how far each observation is away from the group mean:
Figure 5.6: Residual Sum of Squares
The formal representation of the residual sum of squares (SS R ) is:
\[ SS_R= \sum_{j=1}^{c} \sum_{i=1}^{n} ({Y}_{ij}-\bar{Y}_{j})^2 \]
This means that we need to subtract the group mean from each individual observation, square the difference, and sum up over all the squared differences. Thus, in our example, the model sum of squares can be calculated as:
\[ \begin{align} SS_R =& (13−14.34)^2 + (14−14.34)^2 + … + (2−14.34)^2 \\ +&(21−24.7)^2 + (18−24.7)^2 + … + (17−24.7)^2 \\ +& (30−34.99)^2 + (37−34.99)^2 + … + (28−34.99)^2 \\ =& 9534.43 \end{align} \]
In this case, we used the 10 values for each of the SS for each group, so the residual degrees of freedom (df R ) are:
\[ \begin{align} df_R=& (n_1-1)+(n_2-1)+(n_3-1) \\ =&(100-1)+(100-1)+(100-1)=297 \end{align} \]
Once you have computed the different sum of squares, you can investigate the effect strength. \(\eta^2\) is a measure of the variation in Y that is explained by X:
\[ \eta^2= \frac{SS_M}{SS_T}=\frac{21321.21}{30855.64}=0.69 \]
To compute this in R:
The statistic can only take values between 0 and 1. It is equal to 0 when all the category means are equal, indicating that X has no effect on Y. In contrast, it has a value of 1 when there is no variability within each category of X but there is some variability between categories.
How can we determine whether the effect of X on Y is significant?
The F-statistic uses the ratio of mean square related to X (explained variation) and the mean square related to the error (unexplained variation):
\(\frac{SS_M}{SS_R}\)
However, since these are summed values, their magnitude is influenced by the number of scores that were summed. For example, to calculate SS M we only used the sum of 3 values (the group means), while we used 30 and 27 values to calculate SS T and SS R , respectively. Thus, we calculate the average sum of squares (“mean square”) to compare the average amount of systematic vs. unsystematic variation by dividing the SS values by the degrees of freedom associated with the respective statistic.
Mean square due to X:
\[ MS_M= \frac{SS_M}{df_M}=\frac{SS_M}{c-1}=\frac{21321.21}{(3-1)} \]
Mean square due to error:
\[ MS_R= \frac{SS_R}{df_R}=\frac{SS_R}{N-c}=\frac{9534.43}{(300-3)} \]
Now, we compare the amount of variability explained by the model (experiment), to the error in the model (variation due to extraneous variables). If the model explains more variability than it can’t explain, then the experimental manipulation has had a significant effect on the outcome (DV). The F-radio can be derived as follows:
\[ F= \frac{MS_M}{MS_R}=\frac{\frac{SS_M}{c-1}}{\frac{SS_R}{N-c}}=\frac{\frac{21321.21}{(3-1)}}{\frac{9534.43}{(300-3)}}=332.08 \]
You can easily compute this in R:
This statistic follows the F distribution with (m = c – 1) and (n = N – c) degrees of freedom. This means that, like the \(\chi^2\) distribution, the shape of the F-distribution depends on the degrees of freedom. In this case, the shape depends on the degrees of freedom associated with the numerator and denominator used to compute the F-ratio. The following figure shows the shape of the F-distribution for different degrees of freedom:
The F distribution
The outcome of the test is one of the following:
For 2 and 297 degrees of freedom, the critical value of F is 3.026 for α=0.05. As usual, you can either look up these values in a table or use the appropriate function in R:
The output tells us that the calculated test statistic exceeds the critical value. We can also show the test result visually:
Visual depiction of the test result
Thus, we conclude that because F CAL = 332.08 > F CR = 3.03, H 0 is rejected!
Interpretation: one or more of the differences between means are statistically significant.
Reporting: There was a significant effect of promotion on sales levels, F(2,297) = 332.08, p < 0.05, \(\eta^2\) = 0.69.
Remember: This doesn’t tell us where the differences between groups lie. To find out which group means exactly differ, we need to use post-hoc procedures (see below).
You don’t have to compute these statistics manually! Luckily, there is a function for ANOVA in R, which does the above calculations for you as we will see in the next section.
5.4.3.1 basic anova.
As already indicated, one-way ANOVA is used when there is only one categorical variable (factor). Before conducting ANOVA, you need to check if the assumptions of the test are fulfilled. The assumptions of ANOVA are discussed in the following sections.
The observations in the groups should be independent. Because we randomly assigned the listeners to the experimental conditions, this assumption can be assumed to be met.
ANOVA is relatively immune to violations to the normality assumption when sample sizes are large due to the Central Limit Theorem. However, if your sample is small (i.e., n < 30 per group) you may nevertheless want to check the normality of your data, e.g., by using the Shapiro-Wilk test or QQ-Plot. In our example, we have 100 observations in each group which is plenty but let’s create another example with only 10 observations in each group. In the latter case we cannot rely on the Central Limit Theorem and we should test the normality of our data. This can be done using the Shapiro-Wilk Test, which has the Null Hypothesis that the data is normally distributed. Hence, an insignificant test results means that the data can be assumed to be approximately normally distributed:
Since the test result is insignificant for all groups, we can conclude that the data approximately follow a normal distribution.
We could also test the distributional assumptions visually using a Q-Q plot (i.e., quantile-quantile plot). This plot can be used to assess if a set of data plausibly came from some theoretical distribution such as the Normal distribution. Since this is just a visual check, it is somewhat subjective. But it may help us to judge if our assumption is plausible, and if not, which data points contribute to the violation. A Q-Q plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, we should see the points forming a line that’s roughly straight. In other words, Q-Q plots take your sample data, sort it in ascending order, and then plot them versus quantiles calculated from a theoretical distribution. Quantiles are often referred to as “percentiles” and refer to the points in your data below which a certain proportion of your data fall. Recall, for example, the standard Normal distribution with a mean of 0 and a standard deviation of 1. Since the 50th percentile (or 0.5 quantile) is 0, half the data lie below 0. The 95th percentile (or 0.95 quantile), is about 1.64, which means that 95 percent of the data lie below 1.64. The 97.5th quantile is about 1.96, which means that 97.5% of the data lie below 1.96. In the Q-Q plot, the number of quantiles is selected to match the size of your sample data.
To create the Q-Q plot for the normal distribution, you may use the qqnorm() function, which takes the data to be tested as an argument. Using the qqline() function subsequently on the data creates the line on which the data points should fall based on the theoretical quantiles. If the individual data points deviate a lot from this line, it means that the data is not likely to follow a normal distribution.
Figure 5.7: Q-Q plot 1
Figure 5.8: Q-Q plot 2
Figure 5.9: Q-Q plot 3
The Q-Q plots suggest an approximately Normal distribution. If the assumption had been violated, you might consider transforming your data or resort to a non-parametric test.
Let’s return to our original dataset with 100 observations in each group for the rest of the analysis.
You can test the homogeneity of variances in R using Levene’s test:
The null hypothesis of the test is that the group variances are equal. Thus, if the test result is significant it means that the variances are not equal. If we cannot reject the null hypothesis (i.e., the group variances are not significantly different), we can proceed with the ANOVA as follows:
You can see that the p-value is smaller than 0.05. This means that, if there really was no difference between the population means (i.e., the Null hypothesis was true), the probability of the observed differences (or larger differences) is less than 5%.
To compute η 2 from the output, we can extract the relevant sum of squares as follows
You can see that the results match the results from our manual computation above ( \(\eta^2 =\) 0.69).
The aov() function also automatically generates some plots that you can use to judge if the model assumptions are met. We will inspect two of the plots here.
We will use the first plot to inspect if the residual variances are equal across the experimental groups:
Generally, the residual variance (i.e., the range of values on the y-axis) should be the same for different levels of our independent variable. The plot shows, that there are some slight differences. Notably, the range of residuals is higher in group “B” than in group “C”. However, the differences are not that large and since the Levene’s test could not reject the Null of equal variances, we conclude that the variances are similar enough in this case.
The second plot can be used to test the assumption that the residuals are approximately normally distributed. We use a Q-Q plot to test this assumption:
The plot suggests that, the residuals are approximately normally distributed. We could also test this by extracting the residuals from the anova output using the resid() function and using the Shapiro-Wilk test:
Confirming the impression from the Q-Q plot, we cannot reject the Null that the residuals are approximately normally distributed.
Note that if Levene’s test would have been significant (i.e., variances are not equal), we would have needed to either resort to non-parametric tests (see below), or compute the Welch’s F-ratio instead:
You can see that the results are fairly similar, since the variances turned out to be fairly equal across groups.
Provided that significant differences were detected by the overall ANOVA you can find out which group means are different using post hoc procedures. Post hoc procedures are designed to conduct pairwise comparisons of all different combinations of the treatment groups by correcting the level of significance for each test such that the overall Type I error rate (α) across all comparisons remains at 0.05.
In other words, we rejected H 0 : μ 1 = μ 2 = μ 3 , and now we would like to test:
\[H_0: \mu_1 = \mu_2\]
\[H_0: \mu_1 = \mu_3\]
\[H_0: \mu_2 = \mu_3\]
There are several post hoc procedures available to choose from. In this tutorial, we will cover Bonferroni and Tukey’s HSD (“honest significant differences”). Both tests control for family-wise error. Bonferroni tends to have more power when the number of comparisons is small, whereas Tukey’ HSDs is better when testing large numbers of means.
One of the most popular (and easiest) methods to correct for the family-wise error rate is to conduct the individual t-tests and divide α by the number of comparisons („k“):
\[ p_{CR}= \frac{\alpha}{k} \]
In our example with three groups:
\[p_{CR}= \frac{0.05}{3}=0.017\]
Thus, the “corrected” critical p-value is now 0.017 instead of 0.05 (i.e., the critical t value is higher). You can implement the Bonferroni procedure in R using:
In the output, you will get the corrected p-values for the individual tests. In our example, we can reject H 0 of equal means for all three tests, since p < 0.05 for all combinations of groups.
Note the difference between the results from the post-hoc test compared to individual t-tests. For example, when we test the “B” vs. “C” groups, the result from a t-test would be:
Usually the p-value is lower in the t-test, reflecting the fact that the family-wise error is not corrected (i.e., the test is less conservative). In this case the p-value is extremely small in both cases and thus indistinguishable.
Tukey’s HSD also compares all possible pairs of means (two-by-two combinations; i.e., like a t-test, except that it corrects for family-wise error rate).
Test statistic:
\[\begin{equation} \begin{split} HSD= q\sqrt{\frac{MS_R}{n_c}} \end{split} \tag{5.2} \end{equation}\]
\[|\bar{Y}_i-\bar{Y}_j | > HSD\]
The value from the studentized range table can be obtained using the qtukey() function.
\[HSD= 3.33\sqrt{\frac{33.99}{100}}=1.94\]
Since all mean differences between groups are larger than 1.906, we can reject the null hypothesis for all individual tests, confirming the results from the Bonferroni test. To compute Tukey’s HSD, we can use the appropriate function from the multcomp package.
We may also plot the result for the mean differences incl. their confidence intervals:
Figure 5.10: Tukey’s HSD
You can see that the CIs do not cross zero, which means that the true difference between group means is unlikely zero.
Reporting of post hoc results:
The post hoc tests based on Bonferroni and Tukey’s HSD revealed that people listened to music significantly more when:
The following video summarizes how to conduct a one-way ANOVA in R
Non-Parametric tests do not require the sampling distribution to be normally distributed (a.k.a. “assumption free tests”). These tests may be used when the variable of interest is measured on an ordinal scale or when the parametric assumptions do not hold. They often rely on ranking the data instead of analyzing the actual scores. By ranking the data, information on the magnitude of differences is lost. Thus, parametric tests are more powerful if the sampling distribution is normally distributed.
When should you use non-parametric tests?
The Mann-Whitney U test is a non-parametric test of differences between groups, similar to the two sample t-test. In contrast to the two sample t-test it only requires ordinally scaled data and relies on weaker assumptions. Thus it is often useful if the assumptions of the t-test are violated, especially if the data is not on a ratio scale. The following assumptions must be fulfilled for the test to be applicable:
Intuitively, the test compares the frequency of low and high ranks between groups. Under the null hypothesis, the amount of high and low ranks should be roughly equal in the two groups. This is achieved through comparing the expected sum of ranks to the actual sum of ranks.
As an example, we will be using data obtained from a field experiment with random assignment. In a music download store, new releases were randomly assigned to an experimental group and sold at a reduced price (i.e., 7.95€), or a control group and sold at the standard price (9.95€). A representative sample of 102 new releases were sampled and these albums were randomly assigned to the experimental groups (i.e., 51 albums per group). The sales were tracked over one day.
Let’s load and investigate the data first:
Inspect descriptives (overall and by group).
Create boxplot and plot of means.
Figure 5.11: Boxplot
Let’s assume that one of the parametric assumptions has been violated and we needed to conduct a non-parametric test. Then, the Mann-Whitney U test is implemented in R using the function wilcox.test() . Using the ranking data as an independent variable and the listening time as a dependent variable, the test could be executed as follows:
The p-value is smaller than 0.05, which leads us to reject the null hypothesis, i.e. the test yields evidence that the new service feature leads to higher music listening times.
The Wilcoxon signed-rank test is a non-parametric test used to analyze the difference between paired observations, analogously to the paired t-test. It can be used when measurements come from the same observational units but the distributional assumptions of the paired t-test do not hold, because it does not require any assumptions about the distribution of the measurements. Since we subtract two values, however, the test requires that the dependent variable is at least interval scaled, meaning that intervals have the same meaning for different points on our measurement scale.
Under the null hypothesis \(H_0\) , the differences of the measurements should follow a symmetric distribution around 0, meaning that, on average, there is no difference between the two matched samples. \(H_1\) states that the distributions mean is non-zero.
As an example, let’s consider a slightly different experimental setup for the music download store. Imagine that new releases were either sold at a reduced price (i.e., 7.95€), or at the standard price (9.95€). Every time a customer came to the store, the prices were randomly determined for every new release. This means that the same 51 albums were either sold at the standard price or at the reduced price and this price was determined randomly. The sales were then recorded over one day. Note the difference to the previous case, where we randomly split the sample and assigned 50% of products to each condition. Now, we randomly vary prices for all albums between high and low prices.
Again, let’s assume that one of the prarametric assumptions has been violated and we needed to conduct a non-parametric test. Then the Wilcoxon signed-rank test can be performed with the same command as the Mann-Whitney U test, provided that the argument paired is set to TRUE .
Using the 95% confidence level, the result would suggest a significant effect of price on sales (i.e., p < 0.05).
The Kruskal–Wallis test is the non-parametric counterpart of the one-way independent ANOVA. It is designed to test for significant differences in population medians when you have more than two samples (otherwise you would use the Mann-Whitney U-test). The theory is very similar to that of the Mann–Whitney U-test since it is also based on ranked data. The Kruskal-Wallis test is carried out using the kruskal.test() function. Using the same data as before, we type:
The test-statistic follows a chi-square distribution and since the test is significant (p < 0.05), we can conclude that there are significant differences in population medians. Provided that the overall effect is significant, you may perform a post hoc test to find out which groups are different. To get a first impression, we can plot the data using a boxplot:
Figure 5.12: Boxplot
To test for differences between groups, we can, for example, apply post hoc tests according to Nemenyi for pairwise multiple comparisons of the ranked data using the appropriate function from the PMCMR package.
The results reveal that there is a significant difference between the “low” and “high” promotion groups. Note that the results are different compared to the results from the parametric test above. This difference occurs because non-parametric tests have more power to detect differences between groups since we lose information by ranking the data. Thus, you should rely on parametric tests if the assumptions are met.
In some instances, you will be confronted with differences between proportions, rather than differences between means. For example, you may conduct an A/B-Test and wish to compare the conversion rates between two advertising campaigns. In this case, your data is binary (0 = no conversion, 1 = conversion) and the sampling distribution for such data is binomial. While binomial probabilities are difficult to calculate, we can use a Normal approximation to the binomial when n is large (>100) and the true likelihood of a 1 is not too close to 0 or 1.
Let’s use an example: assume a call center where service agents call potential customers to sell a product. We consider two call center agents:
As always, we load the data first:
Next, we create a table to check the relative frequencies:
We could also plot the data to visualize the frequencies using ggplot:
Figure 5.13: proportion of conversions per agent (stacked bar chart)
… or using the mosaicplot() function:
Figure 5.14: proportion of conversions per agent (mosaic plot)
Recall that we can use confidence intervals to determine the range of values that the true population parameter will take with a certain level of confidence based on the sample. Similar to the confidence interval for means, we can compute a confidence interval for proportions. The (1- \(\alpha\) )% confidence interval for proportions is approximately
\[ CI = p\pm z_{1-\frac{\alpha}{2}}*\sqrt{\frac{p*(1-p)}{N}} \]
where \(\sqrt{p(1-p)}\) is the equivalent to the standard deviation in the formula for the confidence interval for means. Based on the equation, it is easy to compute the confidence intervals for the conversion rates of the call center agents:
Similar to testing for differences in means, we could also ask: Is agent 1 twice as likely as agent 2 to convert a customer? Or, to state it formally:
\[H_0: \pi_1=\pi_2 \\ H_1: \pi_1\ne \pi_2\]
where \(\pi\) denotes the population parameter associated with the proportion in the respective population. One approach to test this is based on confidence intervals to estimate the difference between two populations. We can compute an approximate confidence interval for the difference between the proportion of successes in group 1 and group 2, as:
\[ CI = p_1-p_2\pm z_{1-\frac{\alpha}{2}}*\sqrt{\frac{p_1*(1-p_1)}{n_1}+\frac{p_2*(1-p_2)}{n_2}} \]
If the confidence interval includes zero, then the data does not suggest a difference between the groups. Let’s compute the confidence interval for differences in the proportions by hand first:
Now we can see that the 95% confidence interval estimate of the difference between the proportion of conversions for agent 1 and the proportion of conversions for agent 2 is between 26% and 41%. This interval tells us the range of plausible values for the difference between the two population proportions. According to this interval, zero is not a plausible value for the difference (i.e., interval does not cross zero), so we reject the null hypothesis that the population proportions are the same.
Instead of computing the intervals by hand, we could also use the prop.test() function:
Note that the prop.test() function uses a slightly different (more accurate) way to compute the confidence interval (Wilson’s score method is used). It is particularly a better approximation for smaller N. That’s why the confidence interval in the output slightly deviates from the manual computation above, which uses the Wald interval.
You can also see that the output from the prop.test() includes the results from a χ 2 test for the equality of proportions (which will be discussed below) and the associated p-value. Since the p-value is less than 0.05, we reject the null hypothesis of equal probability. Thus, the reporting would be:
The test showed that the conversion rate for agent 1 was higher by 33%. This difference is significant χ (1) = 70, p < .05 (95% CI = [0.25,0.41]).
In the previous section, we saw how we can compute the confidence interval for the difference between proportions to decide on whether or not to reject the null hypothesis. Whenever you would like to investigate the relationship between two categorical variables, the \(\chi^2\) test may be used to test whether the variables are independent of each other. It achieves this by comparing the expected number of observations in a group to the actual values. Let’s continue with the example from the previous section. Under the null hypothesis, the two variables agent and conversion in our contingency table are independent (i.e., there is no relationship). This means that the frequency in each field will be roughly proportional to the probability of an observation being in that category, calculated under the assumption that they are independent. The difference between that expected quantity and the actual quantity can be used to construct the test statistic. The test statistic is computed as follows:
\[ \chi^2=\sum_{i=1}^{J}\frac{(f_o-f_e)^2}{f_e} \]
where \(J\) is the number of cells in the contingency table, \(f_o\) are the observed cell frequencies and \(f_e\) are the expected cell frequencies. The larger the differences, the larger the test statistic and the smaller the p-value.
The observed cell frequencies can easily be seen from the contingency table:
The expected cell frequencies can be calculated as follows:
\[ f_e=\frac{(n_r*n_c)}{n} \]
where \(n_r\) are the total observed frequencies per row, \(n_c\) are the total observed frequencies per column, and \(n\) is the total number of observations. Thus, the expected cell frequencies under the assumption of independence can be calculated as:
To sum up, these are the expected cell frequencies
… and these are the observed cell frequencies
To obtain the test statistic, we simply plug the values into the formula:
The test statistic is \(\chi^2\) distributed. The chi-square distribution is a non-symmetric distribution. Actually, there are many different chi-square distributions, one for each degree of freedom as show in the following figure.
Figure 5.15: The chi-square distribution
You can see that as the degrees of freedom increase, the chi-square curve approaches a normal distribution. To find the critical value, we need to specify the corresponding degrees of freedom, given by:
\[ df=(r-1)*(c-1) \]
where \(r\) is the number of rows and \(c\) is the number of columns in the contingency table. Recall that degrees of freedom are generally the number of values that can vary freely when calculating a statistic. In a 2 by 2 table as in our case, we have 2 variables (or two samples) with 2 levels and in each one we have 1 that vary freely. Hence, in our example the degrees of freedom can be calculated as:
Now, we can derive the critical value given the degrees of freedom and the level of confidence using the qchisq() function and test if the calculated test statistic is larger than the critical value:
Figure 5.16: Visual depiction of the test result
We could also compute the p-value using the pchisq() function, which tells us the probability of the observed cell frequencies if the null hypothesis was true (i.e., there was no association):
The test statistic can also be calculated in R directly on the contingency table with the function chisq.test() .
Since the p-value is smaller than 0.05 (i.e., the calculated test statistic is larger than the critical value), we reject H 0 that the two variables are independent.
Note that the test statistic is sensitive to the sample size. To see this, let’s assume that we have a sample of 100 observations instead of 1000 observations:
You can see that even though the proportions haven’t changed, the test is insignificant now. The following equation lets you compute a measure of the effect size, which is insensitive to sample size:
\[ \phi=\sqrt{\frac{\chi^2}{n}} \]
The following guidelines are used to determine the magnitude of the effect size (Cohen, 1988):
In our example, we can compute the effect sizes for the large and small samples as follows:
You can see that the statistic is insensitive to the sample size.
Note that the Φ coefficient is appropriate for two dichotomous variables (resulting from a 2 x 2 table as above). If any your nominal variables has more than two categories, Cramér’s V should be used instead:
\[ V=\sqrt{\frac{\chi^2}{n*df_{min}}} \]
where \(df_{min}\) refers to the degrees of freedom associated with the variable that has fewer categories (e.g., if we have two nominal variables with 3 and 4 categories, \(df_{min}\) would be 3 - 1 = 2). The degrees of freedom need to be taken into account when judging the magnitude of the effect sizes (see e.g., here ).
Note that the correct = FALSE argument above ensures that the test statistic is computed in the same way as we have done by hand above. By default, chisq.test() applies a correction to prevent overestimation of statistical significance for small data (called the Yates’ correction). The correction is implemented by subtracting the value 0.5 from the computed difference between the observed and expected cell counts in the numerator of the test statistic. This means that the calculated test statistic will be smaller (i.e., more conservative). Although the adjustment may go too far in some instances, you should generally rely on the adjusted results, which can be computed as follows:
As you can see, the results don’t change much in our example, since the differences between the observed and expected cell frequencies are fairly large relative to the correction.
Caution is warranted when the cell counts in the contingency table are small. The usual rule of thumb is that all cell counts should be at least 5 (this may be a little too stringent though). When some cell counts are too small, you can use Fisher’s exact test using the fisher.test() function.
The Fisher test, while more conservative, also shows a significant difference between the proportions (p < 0.05). This is not surprising since the cell counts in our example are fairly large.
To calculate the required sample size when comparing proportions, the power.prop.test() function can be used. For example, we could ask how large our sample needs to be if we would like to compare two groups with conversion rates of 2% and 2.5%, respectively using the conventional settings for \(\alpha\) and \(\beta\) :
The output tells us that we need 13809 observations per group to detect a difference of the desired size.
Want to know the secret to always running successful tests?
The answer is to formulate a hypothesis .
Now when I say it’s always successful, I’m not talking about always increasing your Key Performance Indicator (KPI). You can “lose” a test, but still be successful.
That sounds like an oxymoron, but it’s not. If you set up your test strategically, even if the test decreases your KPI, you gain a learning , which is a success! And, if you win, you simultaneously achieve a lift and a learning. Double win!
The way you ensure you have a strategic test that will produce a learning is by centering it around a strong hypothesis.
So, what is a hypothesis?
By definition, a hypothesis is a proposed statement made on the basis of limited evidence that can be proved or disproved and is used as a starting point for further investigation.
Let’s break that down:
It is a proposed statement.
It is made on the basis of limited (but hopefully some ) evidence.
It can be proved or disproved.
It is used as a starting point for further investigation.
How do I write a hypothesis?
The structure of your basic hypothesis follows a CHANGE: EFFECT framework.
While this is a truly scientific and testable template, it is very open-ended. Even though this hypothesis, “Changing an English headline into a Spanish headline will increase clickthrough rate,” is perfectly valid and testable, if your visitors are English-speaking, it probably doesn’t make much sense.
So now the question is …
How do I write a GOOD hypothesis?
To quote my boss Tony Doty , “This isn’t Mad Libs.”
We can’t just start plugging in nouns and verbs and conclude that we have a good hypothesis. Your hypothesis needs to be backed by a strategy. And, your strategy needs to be rooted in a solution to a problem .
So, a more complete version of the above template would be something like this:
In order to have a good hypothesis, you don’t necessarily have to follow this exact sentence structure, as long as it is centered around three main things:
Presumed problem
Proposed solution
Anticipated result
After you’ve completed your analysis and research, identify the problem that you will address. While we need to be very clear about what we think the problem is, you should leave it out of the hypothesis since it is harder to prove or disprove. You may want to come up with both a problem statement and a hypothesis .
For example:
Problem Statement: “The lead generation form is too long, causing unnecessary friction .”
Hypothesis: “By changing the amount of form fields from 20 to 10, we will increase number of leads.”
When you are thinking about the solution you want to implement, you need to think about the psychology of the customer. What psychological impact is your proposed problem causing in the mind of the customer?
For example, if your proposed problem is “There is a lack of clarity in the sign-up process,” the psychological impact may be that the user is confused.
Now think about what solution is going to address the problem in the customer’s mind. If they are confused, we need to explain something better, or provide them with more information. For this example, we will say our proposed solution is to “Add a progress bar to the sign-up process.” This leads straight into the anticipated result.
If we reduce the confusion in the visitor’s mind (psychological impact) by adding the progress bar, what do we foresee to be the result? We are anticipating that it would be more people completing the sign-up process. Your proposed solution and your KPI need to be directly correlated.
Note: Some people will include the psychological impact in their hypothesis. This isn’t necessarily wrong, but we do have to be careful with assumptions. If we say that the effect will be “Reduced confusion and therefore increase in conversion rate,” we are assuming the reduced confusion is what made the impact. While this may be correct, it is not measureable and it is hard to prove or disprove.
To summarize, your hypothesis should follow a structure of: “If I change this, it will have this effect,” but should always be informed by an analysis of the problems and rooted in the solution you deemed appropriate.
Related Resources:
A/B Testing 101: How to get real results from optimization
The True Value of Data
15 Years of Marketing Research in 11 Minutes
Marketing Analytics: 6 simple steps for interpreting your data
Website A/B Testing: 4 tips to beat an unbeatable landing page
Online Cart: 6 ideas to test and optimize your checkout process
B2B Gamification: Autodesk’s two approaches to in-trial marketing [Video]
How to Discover Exactly What the Customer Wants to See on the Next Click: 3 critical…
The 21 Psychological Elements that Power Effective Web Design (Part 3)
The 21 Psychological Elements that Power Effective Web Design (Part 2)
The 21 Psychological Elements that Power Effective Web Design (Part 1)
Thanks for the article. I’ve been trying to wrap my head around this type of testing because I’d like to use it to see the effectiveness on some ads. This article really helped. Thanks Again!
Hey Lauren, I am just getting to the point that I have something to perform A-B testing on. This post led me to this site which will and already has become a help in what to test and how to test .
Again, thanks for getting me here .
Good article. I have been researching different approaches to writing testing hypotheses and this has been a help. The only thing I would add is that it can be useful to capture the insight/justification within the hypothesis statement. IF i do this, THEN I expect this result BECAUSE I have this insight.
@Kaya Great!
Good article – but technically you can never prove an hypothesis, according to the principle of falsification (Popper), only fail to disprove the null hypothesis.
Leave A Reply Cancel Reply
Your email address will not be published.
Save my name, email, and website in this browser for the next time I comment.
Welcome, Login to your account.
Recover your password.
A password will be e-mailed to you.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Edward barroga.
1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.
2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.
The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.
Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6
It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4
There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.
A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5
On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4
Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8
Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12
Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13
There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10
Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .
Quantitative research questions | Quantitative research hypotheses |
---|---|
Descriptive research questions | Simple hypothesis |
Comparative research questions | Complex hypothesis |
Relationship research questions | Directional hypothesis |
Non-directional hypothesis | |
Associative hypothesis | |
Causal hypothesis | |
Null hypothesis | |
Alternative hypothesis | |
Working hypothesis | |
Statistical hypothesis | |
Logical hypothesis | |
Hypothesis-testing | |
Qualitative research questions | Qualitative research hypotheses |
Contextual research questions | Hypothesis-generating |
Descriptive research questions | |
Evaluation research questions | |
Explanatory research questions | |
Exploratory research questions | |
Generative research questions | |
Ideological research questions | |
Ethnographic research questions | |
Phenomenological research questions | |
Grounded theory questions | |
Qualitative case study questions |
In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .
Quantitative research questions | |
---|---|
Descriptive research question | |
- Measures responses of subjects to variables | |
- Presents variables to measure, analyze, or assess | |
What is the proportion of resident doctors in the hospital who have mastered ultrasonography (response of subjects to a variable) as a diagnostic technique in their clinical training? | |
Comparative research question | |
- Clarifies difference between one group with outcome variable and another group without outcome variable | |
Is there a difference in the reduction of lung metastasis in osteosarcoma patients who received the vitamin D adjunctive therapy (group with outcome variable) compared with osteosarcoma patients who did not receive the vitamin D adjunctive therapy (group without outcome variable)? | |
- Compares the effects of variables | |
How does the vitamin D analogue 22-Oxacalcitriol (variable 1) mimic the antiproliferative activity of 1,25-Dihydroxyvitamin D (variable 2) in osteosarcoma cells? | |
Relationship research question | |
- Defines trends, association, relationships, or interactions between dependent variable and independent variable | |
Is there a relationship between the number of medical student suicide (dependent variable) and the level of medical student stress (independent variable) in Japan during the first wave of the COVID-19 pandemic? |
In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .
Quantitative research hypotheses | |
---|---|
Simple hypothesis | |
- Predicts relationship between single dependent variable and single independent variable | |
If the dose of the new medication (single independent variable) is high, blood pressure (single dependent variable) is lowered. | |
Complex hypothesis | |
- Foretells relationship between two or more independent and dependent variables | |
The higher the use of anticancer drugs, radiation therapy, and adjunctive agents (3 independent variables), the higher would be the survival rate (1 dependent variable). | |
Directional hypothesis | |
- Identifies study direction based on theory towards particular outcome to clarify relationship between variables | |
Privately funded research projects will have a larger international scope (study direction) than publicly funded research projects. | |
Non-directional hypothesis | |
- Nature of relationship between two variables or exact study direction is not identified | |
- Does not involve a theory | |
Women and men are different in terms of helpfulness. (Exact study direction is not identified) | |
Associative hypothesis | |
- Describes variable interdependency | |
- Change in one variable causes change in another variable | |
A larger number of people vaccinated against COVID-19 in the region (change in independent variable) will reduce the region’s incidence of COVID-19 infection (change in dependent variable). | |
Causal hypothesis | |
- An effect on dependent variable is predicted from manipulation of independent variable | |
A change into a high-fiber diet (independent variable) will reduce the blood sugar level (dependent variable) of the patient. | |
Null hypothesis | |
- A negative statement indicating no relationship or difference between 2 variables | |
There is no significant difference in the severity of pulmonary metastases between the new drug (variable 1) and the current drug (variable 2). | |
Alternative hypothesis | |
- Following a null hypothesis, an alternative hypothesis predicts a relationship between 2 study variables | |
The new drug (variable 1) is better on average in reducing the level of pain from pulmonary metastasis than the current drug (variable 2). | |
Working hypothesis | |
- A hypothesis that is initially accepted for further research to produce a feasible theory | |
Dairy cows fed with concentrates of different formulations will produce different amounts of milk. | |
Statistical hypothesis | |
- Assumption about the value of population parameter or relationship among several population characteristics | |
- Validity tested by a statistical experiment or analysis | |
The mean recovery rate from COVID-19 infection (value of population parameter) is not significantly different between population 1 and population 2. | |
There is a positive correlation between the level of stress at the workplace and the number of suicides (population characteristics) among working people in Japan. | |
Logical hypothesis | |
- Offers or proposes an explanation with limited or no extensive evidence | |
If healthcare workers provide more educational programs about contraception methods, the number of adolescent pregnancies will be less. | |
Hypothesis-testing (Quantitative hypothesis-testing research) | |
- Quantitative research uses deductive reasoning. | |
- This involves the formation of a hypothesis, collection of data in the investigation of the problem, analysis and use of the data from the investigation, and drawing of conclusions to validate or nullify the hypotheses. |
Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15
There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .
Qualitative research questions | |
---|---|
Contextual research question | |
- Ask the nature of what already exists | |
- Individuals or groups function to further clarify and understand the natural context of real-world problems | |
What are the experiences of nurses working night shifts in healthcare during the COVID-19 pandemic? (natural context of real-world problems) | |
Descriptive research question | |
- Aims to describe a phenomenon | |
What are the different forms of disrespect and abuse (phenomenon) experienced by Tanzanian women when giving birth in healthcare facilities? | |
Evaluation research question | |
- Examines the effectiveness of existing practice or accepted frameworks | |
How effective are decision aids (effectiveness of existing practice) in helping decide whether to give birth at home or in a healthcare facility? | |
Explanatory research question | |
- Clarifies a previously studied phenomenon and explains why it occurs | |
Why is there an increase in teenage pregnancy (phenomenon) in Tanzania? | |
Exploratory research question | |
- Explores areas that have not been fully investigated to have a deeper understanding of the research problem | |
What factors affect the mental health of medical students (areas that have not yet been fully investigated) during the COVID-19 pandemic? | |
Generative research question | |
- Develops an in-depth understanding of people’s behavior by asking ‘how would’ or ‘what if’ to identify problems and find solutions | |
How would the extensive research experience of the behavior of new staff impact the success of the novel drug initiative? | |
Ideological research question | |
- Aims to advance specific ideas or ideologies of a position | |
Are Japanese nurses who volunteer in remote African hospitals able to promote humanized care of patients (specific ideas or ideologies) in the areas of safe patient environment, respect of patient privacy, and provision of accurate information related to health and care? | |
Ethnographic research question | |
- Clarifies peoples’ nature, activities, their interactions, and the outcomes of their actions in specific settings | |
What are the demographic characteristics, rehabilitative treatments, community interactions, and disease outcomes (nature, activities, their interactions, and the outcomes) of people in China who are suffering from pneumoconiosis? | |
Phenomenological research question | |
- Knows more about the phenomena that have impacted an individual | |
What are the lived experiences of parents who have been living with and caring for children with a diagnosis of autism? (phenomena that have impacted an individual) | |
Grounded theory question | |
- Focuses on social processes asking about what happens and how people interact, or uncovering social relationships and behaviors of groups | |
What are the problems that pregnant adolescents face in terms of social and cultural norms (social processes), and how can these be addressed? | |
Qualitative case study question | |
- Assesses a phenomenon using different sources of data to answer “why” and “how” questions | |
- Considers how the phenomenon is influenced by its contextual situation. | |
How does quitting work and assuming the role of a full-time mother (phenomenon assessed) change the lives of women in Japan? |
Qualitative research hypotheses | |
---|---|
Hypothesis-generating (Qualitative hypothesis-generating research) | |
- Qualitative research uses inductive reasoning. | |
- This involves data collection from study participants or the literature regarding a phenomenon of interest, using the collected data to develop a formal hypothesis, and using the formal hypothesis as a framework for testing the hypothesis. | |
- Qualitative exploratory studies explore areas deeper, clarifying subjective experience and allowing formulation of a formal hypothesis potentially testable in a future quantitative approach. |
Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15
Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1
Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14
The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14
As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.
Variables | Unclear and weak statement (Statement 1) | Clear and good statement (Statement 2) | Points to avoid |
---|---|---|---|
Research question | Which is more effective between smoke moxibustion and smokeless moxibustion? | “Moreover, regarding smoke moxibustion versus smokeless moxibustion, it remains unclear which is more effective, safe, and acceptable to pregnant women, and whether there is any difference in the amount of heat generated.” | 1) Vague and unfocused questions |
2) Closed questions simply answerable by yes or no | |||
3) Questions requiring a simple choice | |||
Hypothesis | The smoke moxibustion group will have higher cephalic presentation. | “Hypothesis 1. The smoke moxibustion stick group (SM group) and smokeless moxibustion stick group (-SLM group) will have higher rates of cephalic presentation after treatment than the control group. | 1) Unverifiable hypotheses |
Hypothesis 2. The SM group and SLM group will have higher rates of cephalic presentation at birth than the control group. | 2) Incompletely stated groups of comparison | ||
Hypothesis 3. There will be no significant differences in the well-being of the mother and child among the three groups in terms of the following outcomes: premature birth, premature rupture of membranes (PROM) at < 37 weeks, Apgar score < 7 at 5 min, umbilical cord blood pH < 7.1, admission to neonatal intensive care unit (NICU), and intrauterine fetal death.” | 3) Insufficiently described variables or outcomes | ||
Research objective | To determine which is more effective between smoke moxibustion and smokeless moxibustion. | “The specific aims of this pilot study were (a) to compare the effects of smoke moxibustion and smokeless moxibustion treatments with the control group as a possible supplement to ECV for converting breech presentation to cephalic presentation and increasing adherence to the newly obtained cephalic position, and (b) to assess the effects of these treatments on the well-being of the mother and child.” | 1) Poor understanding of the research question and hypotheses |
2) Insufficient description of population, variables, or study outcomes |
a These statements were composed for comparison and illustrative purposes only.
b These statements are direct quotes from Higashihara and Horiuchi. 16
Variables | Unclear and weak statement (Statement 1) | Clear and good statement (Statement 2) | Points to avoid |
---|---|---|---|
Research question | Does disrespect and abuse (D&A) occur in childbirth in Tanzania? | How does disrespect and abuse (D&A) occur and what are the types of physical and psychological abuses observed in midwives’ actual care during facility-based childbirth in urban Tanzania? | 1) Ambiguous or oversimplistic questions |
2) Questions unverifiable by data collection and analysis | |||
Hypothesis | Disrespect and abuse (D&A) occur in childbirth in Tanzania. | Hypothesis 1: Several types of physical and psychological abuse by midwives in actual care occur during facility-based childbirth in urban Tanzania. | 1) Statements simply expressing facts |
Hypothesis 2: Weak nursing and midwifery management contribute to the D&A of women during facility-based childbirth in urban Tanzania. | 2) Insufficiently described concepts or variables | ||
Research objective | To describe disrespect and abuse (D&A) in childbirth in Tanzania. | “This study aimed to describe from actual observations the respectful and disrespectful care received by women from midwives during their labor period in two hospitals in urban Tanzania.” | 1) Statements unrelated to the research question and hypotheses |
2) Unattainable or unexplorable objectives |
a This statement is a direct quote from Shimoda et al. 17
The other statements were composed for comparison and illustrative purposes only.
To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .
Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.
Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12
In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.
Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.
Disclosure: The authors have no potential conflicts of interest to disclose.
Author Contributions:
Every product owner knows that it takes effort to build something that'll cater to user needs. You'll have to make many tough calls if you wish to grow the company and evolve the product so it delivers more value. But how do you decide what to change in the product, your marketing strategy, or the overall direction to succeed? And how do you make a product that truly resonates with your target audience?
There are many unknowns in business, so many fundamental decisions start from a simple "what if?". But they can't be based on guesses, as you need some proof to fill in the blanks reasonably.
Because there's no universal recipe for successfully building a product, teams collect data, do research, study the dynamics, and generate hypotheses according to the given facts. They then take corresponding actions to find out whether they were right or wrong, make conclusions, and most likely restart the process again.
On this page, we thoroughly inspect product hypotheses. We'll go over what they are, how to create hypothesis statements and validate them, and what goes after this step.
A hypothesis in product development and product management is a statement or assumption about the product, planned feature, market, or customer (e.g., their needs, behavior, or expectations) that you can put to the test, evaluate, and base your further decisions on . This may, for instance, regard the upcoming product changes as well as the impact they can result in.
A hypothesis implies that there is limited knowledge. Hence, the teams need to undergo testing activities to validate their ideas and confirm whether they are true or false.
Hypotheses guide the product development process and may point at important findings to help build a better product that'll serve user needs. In essence, teams create hypothesis statements in an attempt to improve the offering, boost engagement, increase revenue, find product-market fit quicker, or for other business-related reasons.
It's sort of like an experiment with trial and error, yet, it is data-driven and should be unbiased . This means that teams don't make assumptions out of the blue. Instead, they turn to the collected data, conducted market research , and factual information, which helps avoid completely missing the mark. The obtained results are then carefully analyzed and may influence decision-making.
Such experiments backed by data and analysis are an integral aspect of successful product development and allow startups or businesses to dodge costly startup mistakes .
When do teams create hypothesis statements and validate them? To some extent, hypothesis testing is an ongoing process to work on constantly. It may occur during various product development life cycle stages, from early phases like initiation to late ones like scaling.
In any event, the key here is learning how to generate hypothesis statements and validate them effectively. We'll go over this in more detail later on.
You might be wondering whether ideas and hypotheses are the same thing. Well, there are a few distinctions.
An idea is simply a suggested proposal. Say, a teammate comes up with something you can bring to life during a brainstorming session or pitches in a suggestion like "How about we shorten the checkout process?". You can jot down such ideas and then consider working on them if they'll truly make a difference and improve the product, strategy, or result in other business benefits. Ideas may thus be used as the hypothesis foundation when you decide to prove a concept.
A hypothesis is the next step, when an idea gets wrapped with specifics to become an assumption that may be tested. As such, you can refine the idea by adding details to it. The previously mentioned idea can be worded into a product hypothesis statement like: "The cart abandonment rate is high, and many users flee at checkout. But if we shorten the checkout process by cutting down the number of steps to only two and get rid of four excessive fields, we'll simplify the user journey, boost satisfaction, and may get up to 15% more completed orders".
A hypothesis is something you can test in an attempt to reach a certain goal. Testing isn't obligatory in this scenario, of course, but the idea may be tested if you weigh the pros and cons and decide that the required effort is worth a try. We'll explain how to create hypothesis statements next.
The last thing those developing a product want is to invest time and effort into something that won't bring any visible results, fall short of customer expectations, or won't live up to their needs. Therefore, to increase the chances of achieving a successful outcome and product-led growth , teams may need to revisit their product development approach by optimizing one of the starting points of the process: learning to make reasonable product hypotheses.
If the entire procedure is structured, this may assist you during such stages as the discovery phase and raise the odds of reaching your product goals and setting your business up for success. Yet, what's the entire process like?
Such processes imply sharing ideas when a problem is spotted by digging deep into facts and studying the possible risks, goals, benefits, and outcomes. You may apply various MVP tools like (FigJam, Notion, or Miro) that were designed to simplify brainstorming sessions, systemize pitched suggestions, and keep everyone organized without losing any ideas.
Predictive product analysis can also be integrated into this process, leveraging data and insights to anticipate market trends and consumer preferences, thus enhancing decision-making and product development strategies. This approach fosters a more proactive and informed approach to innovation, ensuring products are not only relevant but also resonate with the target audience, ultimately increasing their chances of success in the market.
Besides, you can settle on one of the many frameworks that facilitate decision-making processes , ideation phases, or feature prioritization . Such frameworks are best applicable if you need to test your assumptions and structure the validation process. These are a few common ones if you're looking toward a systematic approach:
Upsilon's team of pros is ready to share our expertise in building tech products.
Once you've indicated the addressable problem or opportunity and broken down the issue in focus, you need to work on formulating the hypotheses and associated tasks. By the way, it works the same way if you want to prove that something will be false (a.k.a null hypothesis).
If you're unsure how to write a hypothesis statement, let's explore the essential steps that'll set you on the right track.
Product hypotheses are generally different for each case, so begin by pinpointing the major variables, i.e., the cause and effect . You'll need to outline what you think is supposed to happen if a change or action gets implemented.
Put simply, the "cause" is what you're planning to change, and the "effect" is what will indicate whether the change is bringing in the expected results. Falling back on the example we brought up earlier, the ineffective checkout process can be the cause, while the increased percentage of completed orders is the metric that'll show the effect.
Make sure to also note such vital points as:
Mind that generic connections that lack specifics will get you nowhere. So if you're thinking about how to word a hypothesis statement, make sure that the cause and effect include clear reasons and a logical dependency .
Think about what can be the precise and link showing why A affects B. In our checkout example, it could be: fewer steps in the checkout and the removed excessive fields will speed up the process, help avoid confusion, irritate users less, and lead to more completed orders. That's much more explicit than just stating the fact that the checkout needs to be changed to get more completed orders.
Certainly, multiple things can be used to measure the effect. Therefore, you need to choose the optimal metrics and validation criteria that'll best envision if you're moving in the right direction.
If you need a tip on how to create hypothesis statements that won't result in a waste of time, try to avoid vagueness and be as specific as you can when selecting what can best measure and assess the results of your hypothesis test. The criteria must be measurable and tied to the hypotheses . This can be a realistic percentage or number (say, you expect a 15% increase in completed orders or 2x fewer cart abandonment cases during the checkout phase).
Once again, if you're not realistic, then you might end up misinterpreting the results. Remember that sometimes an increase that's even as little as 2% can make a huge difference, so why make 50% the merit if it's not achievable in the first place?
It's quite common that you'll end up with multiple product hypotheses. Some are more important than others, of course, and some will require more effort and input.
Therefore, just as with the features on your product development roadmap , prioritize your hypotheses according to their impact and importance. Then, group and order them, especially if the results of some hypotheses influence others on your list.
To demonstrate how to formulate your assumptions clearly, here are several more apart from the example of a hypothesis statement given above:
There are multiple options when it comes to validating hypothesis statements. To get appropriate results, you have to come up with the right experiment that'll help you test the hypothesis. You'll need a control group or people who represent your target audience segments or groups to participate (otherwise, your results might not be accurate).
What can serve as the experiment you may run? Experiments may take tons of different forms, and you'll need to choose the one that clicks best with your hypothesis goals (and your available resources, of course). The same goes for how long you'll have to carry out the test (say, a time period of two months or as little as two weeks). Here are several to get you started.
Talking to users, potential customers, or members of your own online startup community can be another way to test your hypotheses. You may use surveys, questionnaires, or opt for more extensive interviews to validate hypothesis statements and find out what people think. This assumption validation approach involves your existing or potential users and might require some additional time, but can bring you many insights.
One of the experiments you may develop involves making more than one version of an element or page to see which option resonates with the users more. As such, you can have a call to action block with different wording or play around with the colors, imagery, visuals, and other things.
To run such split experiments, you can apply tools like VWO that allows to easily construct alternative designs and split what your users see (e.g., one half of the users will see version one, while the other half will see version two). You can track various metrics and apply heatmaps, click maps, and screen recordings to learn more about user response and behavior. Mind, though, that the key to such tests is to get as many users as you can give the tests time. Don't jump to conclusions too soon or if very few people participated in your experiment.
Demos and clickable prototypes can be a great way to save time and money on costly feature or product development. A prototype also allows you to refine the design. However, they can also serve as experiments for validating hypotheses, collecting data, and getting feedback.
For instance, if you have a new feature in mind and want to ensure there is interest, you can utilize such MVP types as fake doors . Make a short demo recording of the feature and place it on your landing page to track interest or test how many people sign up.
Similarly, you can run experiments to observe how users interact with the feature, page, product, etc. Usually, such experiments are held on prototype testing platforms with a focus group representing your target visitors. By showing a prototype or early version of the design to users, you can view how people use the solution, where they face problems, or what they don't understand. This may be very helpful if you have hypotheses regarding redesigns and user experience improvements before you move on from prototype to MVP development.
You can even take it a few steps further and build a barebone feature version that people can really interact with, yet you'll be the one behind the curtain to make it happen. There were many MVP examples when companies applied Wizard of Oz or concierge MVPs to validate their hypotheses.
Or you can actually develop some functionality but release it for only a limited number of people to see. This is referred to as a feature flag , which can show really specific results but is effort-intensive.
Analysis is what you move on to once you've run the experiment. This is the time to review the collected data, metrics, and feedback to validate (or invalidate) the hypothesis.
You have to evaluate the experiment's results to determine whether your product hypotheses were valid or not. For example, if you were testing two versions of an element design, color scheme, or copy, look into which one performed best.
It is crucial to be certain that you have enough data to draw conclusions, though, and that it's accurate and unbiased . Because if you don't, this may be a sign that your experiment needs to be run for some additional time, be altered, or held once again. You won't want to make a solid decision based on uncertain or misleading results, right?
On another note, make sure to record your hypotheses and experiment results . Some companies use CRMs to jot down the key findings, while others use something as simple as Google Docs. Either way, this can be your single source of truth that can help you avoid running the same experiments or allow you to compare results over time.
Upsilon's team of pros can help you build a product most optimally.
The hypothesis-driven approach in product development is a great way to avoid uncalled-for risks and pricey mistakes. You can back up your assumptions with facts, observe your target audience's reactions, and be more certain that this move will deliver value.
However, this only makes sense if the validation of hypothesis statements is backed by relevant data that'll allow you to determine whether the hypothesis is valid or not. By doing so, you can be certain that you're developing and testing hypotheses to accelerate your product management and avoiding decisions based on guesswork.
Certainly, a failed experiment may bring you just as much knowledge and findings as one that succeeds. Teams have to learn from their mistakes, boost their hypothesis generation and testing knowledge , and make improvements according to the results of their experiments. This is an ongoing process, of course, as no product can grow if it isn't iterated and improved.
If you're only planning to or are currently building a product, Upsilon can lend you a helping hand. Our team has years of experience providing product development services for growth-stage startups and building MVPs for early-stage businesses , so you can use our expertise and knowledge to dodge many mistakes. Don't be shy to contact us to discuss your needs!
Never miss an update.
Published: August 08, 2024
One of the most underrated skills you can have as a marketer is marketing research — which is great news for this unapologetic cyber sleuth.
From brand design and product development to buyer personas and competitive analysis, I’ve researched a number of initiatives in my decade-long marketing career.
And let me tell you: having the right marketing research methods in your toolbox is a must.
Market research is the secret to crafting a strategy that will truly help you accomplish your goals. The good news is there is no shortage of options.
Thanks to the Internet, we have more marketing research (or market research) methods at our fingertips than ever, but they’re not all created equal. Let’s quickly go over how to choose the right one.
5 Research and Planning Templates + a Free Guide on How to Use Them in Your Market Research
All fields are required.
Click this link to access this resource at any time.
What are you researching? Do you need to understand your audience better? How about your competition? Or maybe you want to know more about your customer’s feelings about a specific product.
Before starting your research, take some time to identify precisely what you’re looking for. This could be a goal you want to reach, a problem you need to solve, or a question you need to answer.
For example, an objective may be as foundational as understanding your ideal customer better to create new buyer personas for your marketing agency (pause for flashbacks to my former life).
Or if you’re an organic sode company, it could be trying to learn what flavors people are craving.
Next, determine what data type will best answer the problems or questions you identified. There are primarily two types: qualitative and quantitative. (Sound familiar, right?)
Understanding the differences between qualitative and quantitative data will help you pinpoint which research methods will yield the desired results.
For instance, thinking of our earlier examples, qualitative data would usually be best suited for buyer personas, while quantitative data is more useful for the soda flavors.
However, truth be told, the two really work together.
Qualitative conclusions are usually drawn from quantitative, numerical data. So, you’ll likely need both to get the complete picture of your subject.
For example, if your quantitative data says 70% of people are Team Black and only 30% are Team Green — Shout out to my fellow House of the Dragon fans — your qualitative data will say people support Black more than Green.
(As they should.)
You’ll also want to understand the difference between primary and secondary research.
Primary research involves collecting new, original data directly from the source (say, your target market). In other words, it’s information gathered first-hand that wasn’t found elsewhere.
Some examples include conducting experiments, surveys, interviews, observations, or focus groups.
Meanwhile, secondary research is the analysis and interpretation of existing data collected from others. Think of this like what we used to do for school projects: We would read a book, scour the internet, or pull insights from others to work from.
So, which is better?
Personally, I say any research is good research, but if you have the time and resources, primary research is hard to top. With it, you don’t have to worry about your source's credibility or how relevant it is to your specific objective.
You are in full control and best equipped to get the reliable information you need.
Once you know your objective and what kind of data you want, you’re ready to select your marketing research method.
For instance, let’s say you’re a restaurant trying to see how attendees felt about the Speed Dating event you hosted last week.
You shouldn’t run a field experiment or download a third-party report on speed dating events; those would be useless to you. You need to conduct a survey that allows you to ask pointed questions about the event.
This would yield both qualitative and quantitative data you can use to improve and bring together more love birds next time around.
Now that you know what you’re looking for in a marketing research method, let’s dive into the best options.
Note: According to HubSpot’s 2024 State of Marketing report, understanding customers and their needs is one of the biggest challenges facing marketers today. The options we discuss are great consumer research methodologies , but they can also be used for other areas.
1. interviews.
Interviews are a form of primary research where you ask people specific questions about a topic or theme. They typically deliver qualitative information.
I’ve conducted many interviews for marketing purposes, but I’ve also done many for journalistic purposes, like this profile on comedian Zarna Garg . There’s no better way to gather candid, open-ended insights in my book, but that doesn’t mean they’re a cure-all.
What I like: Real-time conversations allow you to ask different questions if you’re not getting the information you need. They also push interviewees to respond quickly, which can result in more authentic answers.
What I dislike: They can be time-consuming and harder to measure (read: get quantitative data) unless you ask pointed yes or no questions.
Best for: Creating buyer personas or getting feedback on customer experience, a product, or content.
Focus groups are similar to conducting interviews but on a larger scale.
In marketing and business, this typically means getting a small group together in a room (or Zoom), asking them questions about various topics you are researching. You record and/or observe their responses to then take action.
They are ideal for collecting long-form, open-ended feedback, and subjective opinions.
One well-known focus group you may remember was run by Domino’s Pizza in 2009 .
After poor ratings and dropping over $100 million in revenue, the brand conducted focus groups with real customers to learn where they could have done better.
It was met with comments like “worst excuse for pizza I’ve ever had” and “the crust tastes like cardboard.” But rather than running from the tough love, it took the hit and completely overhauled its recipes.
The team admitted their missteps and returned to the market with better food and a campaign detailing their “Pizza Turn Around.”
The result? The brand won a ton of praise for its willingness to take feedback, efforts to do right by its consumers, and clever campaign. But, most importantly, revenue for Domino’s rose by 14.3% over the previous year.
The brand continues to conduct focus groups and share real footage from them in its promotion:
What I like: Similar to interviewing, you can dig deeper and pivot as needed due to the real-time nature. They’re personal and detailed.
What I dislike: Once again, they can be time-consuming and make it difficult to get quantitative data. There is also a chance some participants may overshadow others.
Best for: Product research or development
Pro tip: Need help planning your focus group? Our free Market Research Kit includes a handy template to start organizing your thoughts in addition to a SWOT Analysis Template, Survey Template, Focus Group Template, Presentation Template, Five Forces Industry Analysis Template, and an instructional guide for all of them. Download yours here now.
Surveys are a form of primary research where individuals are asked a collection of questions. It can take many different forms.
They could be in person, over the phone or video call, by email, via an online form, or even on social media. Questions can be also open-ended or closed to deliver qualitative or quantitative information.
A great example of a close-ended survey is HubSpot’s annual State of Marketing .
In the State of Marketing, HubSpot asks marketing professionals from around the world a series of multiple-choice questions to gather data on the state of the marketing industry and to identify trends.
The survey covers various topics related to marketing strategies, tactics, tools, and challenges that marketers face. It aims to provide benchmarks to help you make informed decisions about your marketing.
It also helps us understand where our customers’ heads are so we can better evolve our products to meet their needs.
Apple is no stranger to surveys, either.
In 2011, the tech giant launched Apple Customer Pulse , which it described as “an online community of Apple product users who provide input on a variety of subjects and issues concerning Apple.”
"For example, we did a large voluntary survey of email subscribers and top readers a few years back."
While these readers gave us a long list of topics, formats, or content types they wanted to see, they sometimes engaged more with content types they didn’t select or favor as much on the surveys when we ran follow-up ‘in the wild’ tests, like A/B testing.”
Pepsi saw similar results when it ran its iconic field experiment, “The Pepsi Challenge” for the first time in 1975.
The beverage brand set up tables at malls, beaches, and other public locations and ran a blindfolded taste test. Shoppers were given two cups of soda, one containing Pepsi, the other Coca-Cola (Pepsi’s biggest competitor). They were then asked to taste both and report which they preferred.
People overwhelmingly preferred Pepsi, and the brand has repeated the experiment multiple times over the years to the same results.
What I like: It yields qualitative and quantitative data and can make for engaging marketing content, especially in the digital age.
What I dislike: It can be very time-consuming. And, if you’re not careful, there is a high risk for scientific error.
Best for: Product testing and competitive analysis
Pro tip: " Don’t make critical business decisions off of just one data set," advises Pamela Bump. "Use the survey, competitive intelligence, external data, or even a focus group to give you one layer of ideas or a short-list for improvements or solutions to test. Then gather your own fresh data to test in an experiment or trial and better refine your data-backed strategy."
8. public domain or third-party research.
While original data is always a plus, there are plenty of external resources you can access online and even at a library when you’re limited on time or resources.
Some reputable resources you can use include:
It’s also smart to turn to reputable organizations that are specific to your industry or field. For instance, if you’re a gardening or landscaping company, you may want to pull statistics from the Environmental Protection Agency (EPA).
If you’re a digital marketing agency, you could look to Google Research or HubSpot Research . (Hey, I know them!)
What I like: You can save time on gathering data and spend more time on analyzing. You can also rest assured the data is from a source you trust.
What I dislike: You may not find data specific to your needs.
Best for: Companies under a time or resource crunch, adding factual support to content
Pro tip: Fellow HubSpotter Iskiev suggests using third-party data to inspire your original research. “Sometimes, I use public third-party data for ideas and inspiration. Once I have written my survey and gotten all my ideas out, I read similar reports from other sources and usually end up with useful additions for my own research.”
If the data you need isn’t available publicly and you can’t do your own market research, you can also buy some. There are many reputable analytics companies that offer subscriptions to access their data. Statista is one of my favorites, but there’s also Euromonitor , Mintel , and BCC Research .
What I like: Same as public domain research
What I dislike: You may not find data specific to your needs. It also adds to your expenses.
Best for: Companies under a time or resource crunch or adding factual support to content
You’re not going to like my answer, but “it depends.” The best marketing research method for you will depend on your objective and data needs, but also your budget and timeline.
My advice? Aim for a mix of quantitative and qualitative data. If you can do your own original research, awesome. But if not, don’t beat yourself up. Lean into free or low-cost tools . You could do primary research for qualitative data, then tap public sources for quantitative data. Or perhaps the reverse is best for you.
Whatever your marketing research method mix, take the time to think it through and ensure you’re left with information that will truly help you achieve your goals.
Related articles.
Free Guide & Templates to Help Your Market Research
Marketing software that helps you drive revenue, save time and resources, and measure and optimize your investments — all on one easy-to-use platform
A title page is required for all APA Style papers. There are both student and professional versions of the title page. Students should use the student version of the title page unless their instructor or institution has requested they use the professional version. APA provides a student title page guide (PDF, 199KB) to assist students in creating their title pages.
The student title page includes the paper title, author names (the byline), author affiliation, course number and name for which the paper is being submitted, instructor name, assignment due date, and page number, as shown in this example.
Title page setup is covered in the seventh edition APA Style manuals in the Publication Manual Section 2.3 and the Concise Guide Section 1.6
Student papers do not include a running head unless requested by the instructor or institution.
Follow the guidelines described next to format each element of the student title page.
|
|
|
---|---|---|
Paper title | Place the title three to four lines down from the top of the title page. Center it and type it in bold font. Capitalize of the title. Place the main title and any subtitle on separate double-spaced lines if desired. There is no maximum length for titles; however, keep titles focused and include key terms. |
|
Author names | Place one double-spaced blank line between the paper title and the author names. Center author names on their own line. If there are two authors, use the word “and” between authors; if there are three or more authors, place a comma between author names and use the word “and” before the final author name. | Cecily J. Sinclair and Adam Gonzaga |
Author affiliation | For a student paper, the affiliation is the institution where the student attends school. Include both the name of any department and the name of the college, university, or other institution, separated by a comma. Center the affiliation on the next double-spaced line after the author name(s). | Department of Psychology, University of Georgia |
Course number and name | Provide the course number as shown on instructional materials, followed by a colon and the course name. Center the course number and name on the next double-spaced line after the author affiliation. | PSY 201: Introduction to Psychology |
Instructor name | Provide the name of the instructor for the course using the format shown on instructional materials. Center the instructor name on the next double-spaced line after the course number and name. | Dr. Rowan J. Estes |
Assignment due date | Provide the due date for the assignment. Center the due date on the next double-spaced line after the instructor name. Use the date format commonly used in your country. | October 18, 2020 |
| Use the page number 1 on the title page. Use the automatic page-numbering function of your word processing program to insert page numbers in the top right corner of the page header. | 1 |
The professional title page includes the paper title, author names (the byline), author affiliation(s), author note, running head, and page number, as shown in the following example.
Follow the guidelines described next to format each element of the professional title page.
|
|
|
---|---|---|
Paper title | Place the title three to four lines down from the top of the title page. Center it and type it in bold font. Capitalize of the title. Place the main title and any subtitle on separate double-spaced lines if desired. There is no maximum length for titles; however, keep titles focused and include key terms. |
|
Author names
| Place one double-spaced blank line between the paper title and the author names. Center author names on their own line. If there are two authors, use the word “and” between authors; if there are three or more authors, place a comma between author names and use the word “and” before the final author name. | Francesca Humboldt |
When different authors have different affiliations, use superscript numerals after author names to connect the names to the appropriate affiliation(s). If all authors have the same affiliation, superscript numerals are not used (see Section 2.3 of the for more on how to set up bylines and affiliations). | Tracy Reuter , Arielle Borovsky , and Casey Lew-Williams | |
Author affiliation
| For a professional paper, the affiliation is the institution at which the research was conducted. Include both the name of any department and the name of the college, university, or other institution, separated by a comma. Center the affiliation on the next double-spaced line after the author names; when there are multiple affiliations, center each affiliation on its own line.
| Department of Nursing, Morrigan University |
When different authors have different affiliations, use superscript numerals before affiliations to connect the affiliations to the appropriate author(s). Do not use superscript numerals if all authors share the same affiliations (see Section 2.3 of the for more). | Department of Psychology, Princeton University | |
Author note | Place the author note in the bottom half of the title page. Center and bold the label “Author Note.” Align the paragraphs of the author note to the left. For further information on the contents of the author note, see Section 2.7 of the . | n/a |
| The running head appears in all-capital letters in the page header of all pages, including the title page. Align the running head to the left margin. Do not use the label “Running head:” before the running head. | Prediction errors support children’s word learning |
| Use the page number 1 on the title page. Use the automatic page-numbering function of your word processing program to insert page numbers in the top right corner of the page header. | 1 |
Marketers have begun experimenting with AI to improve their brand-management efforts. But unlike other marketing tasks, brand management involves more than just repeatedly executing one specialized function. Long considered the exclusive domain of creative talent, it encompasses multiple activities designed to build the reputation and image of a business—such as crafting and communicating the brand story, ensuring that the product or service and its price reflect the brand’s competitive positioning, and managing customer relationships to forge loyalty to the brand.
A brand is a promise to customers about the quality, style, reliability, and aspiration of a purchase. AI can’t fulfill that promise on its own (at least not anytime soon). But it can shape customers’ impressions of a brand at every interaction. And it can automate expensive creative tasks—including product design. To succeed with it, you must understand how it is perceived by stakeholders and what can be done not only to mitigate their concerns but to make them avid supporters. Using examples from Intuit, Caterpillar, and LOOP, along with in-depth scholarly research, the authors propose a framework for thinking about the key roles that AI plays when it comes to managing brands effectively.
It can automate creative tasks and improve the customer experience.
The opportunity.
Brand management, long considered the exclusive domain of creative talent, has become faster and better informed than ever because of AI.
AI has the potential to adversely affect a brand, so successfully implementing it in this context often involves confronting resistance and backlash from both customers and employees.
The most successful brand management blends the best of human and machine intelligence to augment rather than replace human creativity. Nike, Intuit, Caterpillar, and others have used AI to the great benefit of their brands.
Few brands are more iconic than Nike. From its swoosh logo to its slogan “Just Do It,” the company has mastered the artistry necessary to build a renowned brand. So when Nike asked Obvious, a trio of Parisian artists who make AI-inspired designs, to develop new iterations of the Air Max sneaker in 2020, it wanted to be sure the designs wouldn’t deviate too dramatically from Nike’s signature style. Obvious trained its generative AI model by feeding it pictures of the Air Max 1, the Air Max 90, and the Air Max 97 and used the model to create a vast array of design ideas. Then, drawing on their own knowledge and perception of broader fashion trends along with Nike’s marketing objectives, the trio iteratively tweaked the model until it produced a design that struck the right balance between novelty and staying on brand. The design incorporated many of the stylistic elements of the classic Air Max but blended them with new colors, shapes, and patterns to achieve a fresh, cool feel. The limited edition shoes sold out in less than 10 days.
IMAGES
COMMENTS
Following the hypothesis structure: "A new CTA on my page will increase [conversion goal]". The first test implied a problem with clarity, provides a potential theme: "Improving the clarity of the page will reduce confusion and improve [conversion goal].". The potential clarity theme leads to a new hypothesis: "Changing the wording of ...
Simple hypothesis. A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, "Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking. 4.
For example, a hypothesis might be that a new product will sell well in a particular market, and this hypothesis can be tested through market research. Characteristics of Hypothesis. Here are some common characteristics of a hypothesis: Testable: A hypothesis must be able to be tested through observation or experimentation. This means that it ...
It seeks to explore and understand a particular aspect of the research subject. In contrast, a research hypothesis is a specific statement or prediction that suggests an expected relationship between variables. It is formulated based on existing knowledge or theories and guides the research design and data analysis. 7.
Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.
The Basics: Marketing Experimentation Hypothesis. A hypothesis is a research-based statement that aims to explain an observed trend and create a solution that will improve the result. This statement is an educated, testable prediction about what will happen. It has to be stated in declarative form and not as a question.
Make a hypothesis. Collect research. Select your metrics. Execute the experiment. Analyze the results. Performing a marketing experiment involves doing research, structuring the experiment, and analyzing the results. Let's go through the seven steps necessary to conduct a marketing experiment. 1.
Developing a hypothesis is an essential part of marketing experimentation. Qualitative-based research should inform hypotheses that you test with real-world behavior. The hypotheses help you discover how accurate those insights from qualitative research are. If you engage in hypothesis-driven testing, then you ensure your tests are strategic ...
Step 5: Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.
A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.
Hypothesis. The first step in any marketing research experiment is to develop a hypothesis. A hypothesis is a statement of what the researcher believes to be true. ... Marketing Research Example 1 ...
Step 2: Design the Research. The next step in the marketing research process is to do a research design. The research design is your "plan of attack.". It outlines what data you are going to gather and from whom, how and when you will collect the data, and how you will analyze it once it's been obtained.
3. One-Sided vs. Two-Sided Testing. When it's time to test your hypothesis, it's important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests, or one-tailed and two-tailed tests, respectively. Typically, you'd leverage a one-sided test when you have a strong conviction ...
The marketing research process - an overview. A typical marketing research process is as follows: Identify an issue, discuss alternatives and set out research objectives. Develop a research program. Choose a sample. Gather information. Gather data. Organize and analyze information and data. Present findings.
A/B Testing Summit free online conference - Research your seat to see Flint McGlaughlin's keynote Design Hypotheses that Win: A 4-step framework for gaining customer wisdom and generating significant results. The Hypothesis and the Modern-Day Marketer. Customer Theory: How we learned from a previous test to drive a 40% increase in CTR
With your marketing objectives in mind, the next step is formulating a hypothesis for your experiment. A hypothesis is a testable prediction that outlines the expected outcome of your experiment. It should be based on existing knowledge, data, or observations and provide a clear direction for your experimental design.
Hypothesis testing is a critical component of marketing research that allows marketers to draw conclusions about the effectiveness of their strategies. In essence, hypothesis testing involves making an educated guess about a population parameter and then using data to determine if the hypothesis is supported or rejected.
A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process. Consider a study designed to examine the relationship between sleep deprivation and test ...
A research hypothesis can be defined as a clear, specific and predictive statement that states the possible outcome of a scientific study. The result of the research study is based on previous research studies and can be tested by scientific research. The research hypothesis is written before the beginning of any scientific research or data ...
This can be formally expressed as follows: ˉx − μ0 = zσˉx. In this equation, z will tell us how many standard deviations the sample mean ˉx¯x is away from the null hypothesis μ0μ0. Solving for z gives us: z = ˉx − μ0 σˉx = ˉx − μ0 σ / √n. This standardized value (or "z-score") is also referred to as a test statistic.
For example: Problem Statement: "The lead generation form is too long, causing unnecessary friction.". Hypothesis: "By changing the amount of form fields from 20 to 10, we will increase number of leads.". Proposed solution. When you are thinking about the solution you want to implement, you need to think about the psychology of the ...
INTRODUCTION. Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses.1,2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results.3,4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the ...
A hypothesis in product development and product management is a statement or assumption about the product, planned feature, market, or customer (e.g., their needs, behavior, or expectations) that you can put to the test, evaluate, and base your further decisions on. This may, for instance, regard the upcoming product changes as well as the ...
From brand design and product development to buyer personas and competitive analysis, I've researched a number of initiatives in my decade-long marketing career.. And let me tell you: having the right marketing research methods in your toolbox is a must. Market research is the secret to crafting a strategy that will truly help you accomplish your goals.
Example. Paper title. Place the title three to four lines down from the top of the title page. Center it and type it in bold font. Capitalize major words of the title. Place the main title and any subtitle on separate double-spaced lines if desired. There is no maximum length for titles; however, keep titles focused and include key terms.
Using examples from Intuit, Caterpillar, and LOOP, along with in-depth scholarly research, the authors propose a framework for thinking about the key roles that AI plays when it comes to managing ...