Nonparametric Tests

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

Introduction

hypothesis testing example non parametric

The three modules on hypothesis testing presented a number of tests of hypothesis for continuous, dichotomous and discrete outcomes. Tests for continuous outcomes focused on comparing means, while tests for dichotomous and discrete outcomes focused on comparing proportions. All of the tests presented in the modules on hypothesis testing are called parametric tests and are based on certain assumptions. For example, when running tests of hypothesis for means of continuous outcomes, all parametric tests assume that the outcome is approximately normally distributed in the population. This does not mean that the data in the observed sample follows a normal distribution, but rather that the outcome follows a normal distribution in the full population which is not observed. For many outcomes, investigators are comfortable with the normality assumption (i.e., most of the observations are in the center of the distribution while fewer are at either extreme). It also turns out that many statistical tests are robust, which means that they maintain their statistical properties even when assumptions are not entirely met. Tests are robust in the presence of violations of the normality assumption when the sample size is large based on the Central Limit Theorem (see page 11 in the module on Probability). When the sample size is small and the distribution of the outcome is not known and cannot be assumed to be approximately normally distributed, then alternative tests called nonparametric tests are appropriate.

Learning Objectives

After completing this module, the student will be able to:

  • Compare and contrast parametric and nonparametric tests
  • Identify multiple applications where nonparametric approaches are appropriate
  • Perform and interpret the Mann Whitney U Test
  • Perform and interpret the Sign test and Wilcoxon Signed Rank Test
  • Compare and contrast the Sign test and Wilcoxon Signed Rank Test
  • Perform and interpret the Kruskal Wallis test
  • Identify the appropriate nonparametric hypothesis testing procedure based on type of outcome variable and number of samples

When to Use a Nonparametric Test

Nonparametric tests are sometimes called distribution-free tests because they are based on fewer assumptions (e.g., they do not assume that the outcome is approximately normally distributed). Parametric tests involve specific probability distributions (e.g., the normal distribution) and the tests involve estimation of the key parameters of that distribution (e.g., the mean or difference in means) from the sample data. The cost of fewer assumptions is that nonparametric tests are generally less powerful than their parametric counterparts (i.e., when the alternative is true, they may be less likely to reject H 0 ).

It can sometimes be difficult to assess whether a continuous outcome follows a normal distribution and, thus, whether a parametric or nonparametric test is appropriate. There are several statistical tests that can be used to assess whether data are likely from a normal distribution. The most popular are the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Shapiro-Wilk test 1 . Each test is essentially a goodness of fit test and compares observed data to quantiles of the normal (or other specified) distribution. The null hypothesis for each test is H 0 : Data follow a normal distribution versus H 1 : Data do not follow a normal distribution. If the test is statistically significant (e.g., p<0.05), then data do not follow a normal distribution, and a nonparametric test is warranted. It should be noted that these tests for normality can be subject to low power. Specifically, the tests may fail to reject H 0 : Data follow a normal distribution when in fact the data do not follow a normal distribution. Low power is a major issue when the sample size is small - which unfortunately is often when we wish to employ these tests. The most practical approach to assessing normality involves investigating the distributional form of the outcome in the sample using a histogram and to augment that with data from other studies, if available, that may indicate the likely distribution of the outcome in the population.

There are some situations when it is clear that the outcome does not follow a normal distribution. These include situations:

  • when the outcome is an ordinal variable or a rank,
  • when there are definite outliers or
  • when the outcome has clear limits of detection.

Using an Ordinal Scale

Consider a clinical trial where study participants are asked to rate their symptom severity following 6 weeks on the assigned treatment. Symptom severity might be measured on a 5 point ordinal scale with response options: Symptoms got much worse, slightly worse, no change, slightly improved, or much improved. Suppose there are a total of n=20 participants in the trial, randomized to an experimental treatment or placebo, and the outcome data are distributed as shown in the figure below.

Distribution of Symptom Severity in Total Sample

Histogram showing the number of participants with various categories of symptom severity. The distribution is skewed with most patients in the Slightyly Improved or Much Improved Categories

The distribution of the outcome (symptom severity) does not appear to be normal as more participants report improvement in symptoms as opposed to worsening of symptoms.

When the Outcome is a Rank

In some studies, the outcome is a rank. For example, in obstetrical studies an APGAR score is often used to assess the health of a newborn. The score, which ranges from 1-10, is the sum of five component scores based on the infant's condition at birth. APGAR scores generally do not follow a normal distribution, since most newborns have scores of 7 or higher (normal range).

When There Are Outliers

In some studies, the outcome is continuous but subject to outliers or extreme values. For example, days in the hospital following a particular surgical procedure is an outcome that is often subject to outliers. Suppose in an observational study investigators wish to assess whether there is a difference in the days patients spend in the hospital following liver transplant in for-profit versus nonprofit hospitals. Suppose we measure days in the hospital following transplant in n=100 participants, 50 from for-profit and 50 from non-profit hospitals. The number of days in the hospital are summarized by the box-whisker plot below.

  Distribution of Days in the Hospital Following Transplant

Box and whisker plot of number of patients (vertical axis) as a funtion of days in hospital (horizontal axis). The the plot suggests a skewed distribution with most patients having shorter stays, but smaller numbers having long stays.

Note that 75% of the participants stay at most 16 days in the hospital following transplant, while at least 1 stays 35 days which would be considered an outlier. Recall from page 8 in the module on Summarizing Data that we used Q 1 -1.5(Q 3 -Q 1 ) as a lower limit and Q 3 +1.5(Q 3 -Q 1 ) as an upper limit to detect outliers. In the box-whisker plot above, 10.2, Q 1 =12 and Q 3 =16, thus outliers are values below 12-1.5(16-12) = 6 or above 16+1.5(16-12) = 22.  

Limits of Detection 

In some studies, the outcome is a continuous variable that is measured with some imprecision (e.g., with clear limits of detection). For example, some instruments or assays cannot measure presence of specific quantities above or below certain limits. HIV viral load is a measure of the amount of virus in the body and is measured as the amount of virus per a certain volume of blood. It can range from "not detected" or "below the limit of detection" to hundreds of millions of copies. Thus, in a sample some participants may have measures like 1,254,000 or 874,050 copies and others are measured as "not detected." If a substantial number of participants have undetectable levels, the distribution of viral load is not normally distributed.

Advantages of Nonparametric Tests

Nonparametric tests have some distinct advantages. With outcomes such as those described above, nonparametric tests may be the only way to analyze these data. Outcomes that are ordinal, ranked, subject to outliers or measured imprecisely are difficult to analyze with parametric methods without making major assumptions about their distributions as well as decisions about coding some values (e.g., "not detected"). As described here, nonparametric tests can also be relatively simple to conduct.

Introduction to Nonparametric Testing

This module will describe some popular nonparametric tests for continuous outcomes. Interested readers should see Conover 3 for a more comprehensive coverage of nonparametric tests.      

The techniques described here apply to outcomes that are ordinal, ranked, or continuous outcome variables that are not normally distributed. Recall that continuous outcomes are quantitative measures based on a specific measurement scale (e.g., weight in pounds, height in inches). Some investigators make the distinction between continuous, interval and ordinal scaled data. Interval data are like continuous data in that they are measured on a constant scale (i.e., there exists the same difference between adjacent scale scores across the entire spectrum of scores). Differences between interval scores are interpretable, but ratios are not. Temperature in Celsius or Fahrenheit is an example of an interval scale outcome. The difference between 30º and 40º is the same as the difference between 70º and 80º, yet 80º is not twice as warm as 40º. Ordinal outcomes can be less specific as the ordered categories need not be equally spaced. Symptom severity is an example of an ordinal outcome and it is not clear whether the difference between much worse and slightly worse is the same as the difference between no change and slightly improved. Some studies use visual scales to assess participants' self-reported signs and symptoms. Pain is often measured in this way, from 0 to 10 with 0 representing no pain and 10 representing agonizing pain. Participants are sometimes shown a visual scale such as that shown in the upper portion of the figure below and asked to choose the number that best represents their pain state. Sometimes pain scales use visual anchors as shown in the lower portion of the figure below.

 Visual Pain Scale

Horizontal pain scale ranging from 0 (no pain) to 10 (the most intense pain)

In the upper portion of the figure, certainly 10 is worse than 9, which is worse than 8; however, the difference between adjacent scores may not necessarily be the same. It is important to understand how outcomes are measured to make appropriate inferences based on statistical analysis and, in particular, not to overstate precision.

Assigning Ranks

The nonparametric procedures that we describe here follow the same general procedure. The outcome variable (ordinal, interval or continuous) is ranked from lowest to highest and the analysis focuses on the ranks as opposed to the measured or raw values. For example, suppose we measure self-reported pain using a visual analog scale with anchors at 0 (no pain) and 10 (agonizing pain) and record the following in a sample of n=6 participants:

                                                                      7               5               9              3             0               2                  

 The ranks, which are used to perform a nonparametric test, are assigned as follows: First, the data are ordered from smallest to largest. The lowest value is then assigned a rank of 1, the next lowest a rank of 2 and so on. The largest value is assigned a rank of n (in this example, n=6). The observed data and corresponding ranks are shown below:

A complicating issue that arises when assigning ranks occurs when there are ties in the sample (i.e., the same values are measured in two or more participants). For example, suppose that the following data are observed in our sample of n=6:

Observed Data:       7         7           9            3           0           2                  

The 4 th and 5 th ordered values are both equal to 7. When assigning ranks, the recommended procedure is to assign the mean rank of 4.5 to each (i.e. the mean of 4 and 5), as follows:

Suppose that there are three values of 7.   In this case, we assign a rank of 5 (the mean of 4, 5 and 6) to the 4 th , 5 th and 6 th values, as follows:

Using this approach of assigning the mean rank when there are ties ensures that the sum of the ranks is the same in each sample (for example, 1+2+3+4+5+6=21, 1+2+3+4.5+4.5+6=21 and 1+2+3+5+5+5=21). Using this approach, the sum of the ranks will always equal n(n+1)/2. When conducting nonparametric tests, it is useful to check the sum of the ranks before proceeding with the analysis.

To conduct nonparametric tests, we again follow the five-step approach outlined in the modules on hypothesis testing.  

  • Set up hypotheses and select the level of significance α. Analogous to parametric testing, the research hypothesis can be one- or two- sided (one- or two-tailed), depending on the research question of interest.
  • Select the appropriate test statistic. The test statistic is a single number that summarizes the sample information. In nonparametric tests, the observed data is converted into ranks and then the ranks are summarized into a test statistic.
  • Set up decision rule. The decision rule is a statement that tells under what circumstances to reject the null hypothesis. Note that in some nonparametric tests we reject H 0 if the test statistic is large, while in others we reject H 0 if the test statistic is small. We make the distinction as we describe the different tests.
  • Compute the test statistic. Here we compute the test statistic by summarizing the ranks into the test statistic identified in Step 2.
  • Conclusion. The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule.   The final conclusion is either to reject the null hypothesis (because it is very unlikely to observe the sample data if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely if the null hypothesis is true).  

Mann Whitney U Test (Wilcoxon Rank Sum Test)

The modules on hypothesis testing presented techniques for testing the equality of means in two independent samples. An underlying assumption for appropriate use of the tests described was that the continuous outcome was approximately normally distributed or that the samples were sufficiently large (usually n 1 > 30 and n 2 > 30) to justify their use based on the Central Limit Theorem. When comparing two independent samples when the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate.

A popular nonparametric test to compare outcomes between two independent groups is the Mann Whitney U test. The Mann Whitney U test, sometimes called the Mann Whitney Wilcoxon Test or the Wilcoxon Rank Sum Test, is used to test whether two samples are likely to derive from the same population (i.e., that the two populations have the same shape). Some investigators interpret this test as comparing the medians between the two populations. Recall that the parametric test compares the means (H 0 : μ 1 =μ 2 ) between independent groups.

In contrast, the null and two-sided research hypotheses for the nonparametric test are stated as follows:

H 0 : The two populations are equal versus

H 1 : The two populations are not equal.

This test is often performed as a two-sided test and, thus, the research hypothesis indicates that the populations are not equal as opposed to specifying directionality. A one-sided research hypothesis is used if interest lies in detecting a positive or negative shift in one population as compared to the other. The procedure for the test involves pooling the observations from the two samples into one combined sample, keeping track of which sample each observation comes from, and then ranking lowest to highest from 1 to n 1 +n 2 , respectively.

Consider a Phase II clinical trial designed to investigate the effectiveness of a new drug to reduce symptoms of asthma in children. A total of n=10 participants are randomized to receive either the new drug or a placebo. Participants are asked to record the number of episodes of shortness of breath over a 1 week period following receipt of the assigned treatment. The data are shown below.

Is there a difference in the number of episodes of shortness of breath over a 1 week period in participants receiving the new drug as compared to those receiving the placebo? By inspection, it appears that participants receiving the placebo have more episodes of shortness of breath, but is this statistically significant?

In this example, the outcome is a count and in this sample the data do not follow a normal distribution.  

Frequency Histogram of Number of Episodes of Shortness of Breath

Frequency histogram of episodes of shortness of breath

In addition, the sample size is small (n 1 =n 2 =5), so a nonparametric test is appropriate. The hypothesis is given below, and we run the test at the 5% level of significance (i.e., α=0.05).

Note that if the null hypothesis is true (i.e., the two populations are equal), we expect to see similar numbers of episodes of shortness of breath in each of the two treatment groups, and we would expect to see some participants reporting few episodes and some reporting more episodes in each group. This does not appear to be the case with the observed data. A test of hypothesis is needed to determine whether the observed data is evidence of a statistically significant difference in populations.

The first step is to assign ranks and to do so we order the data from smallest to largest. This is done on the combined or total sample (i.e., pooling the data from the two treatment groups (n=10)), and assigning ranks from 1 to 10, as follows. We also need to keep track of the group assignments in the total sample.

Note that the lower ranks (e.g., 1, 2 and 3) are assigned to responses in the new drug group while the higher ranks (e.g., 9, 10) are assigned to responses in the placebo group. Again, the goal of the test is to determine whether the observed data support a difference in the populations of responses. Recall that in parametric tests (discussed in the modules on hypothesis testing), when comparing means between two groups, we analyzed the difference in the sample means relative to their variability and summarized the sample information in a test statistic. A similar approach is employed here. Specifically, we produce a test statistic based on the ranks.

First, we sum the ranks in each group. In the placebo group, the sum of the ranks is 37; in the new drug group, the sum of the ranks is 18. Recall that the sum of the ranks will always equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 10(11)/2=55 which is equal to 37+18 = 55.

For the test, we call the placebo group 1 and the new drug group 2 (assignment of groups 1 and 2 is arbitrary). We let R 1 denote the sum of the ranks in group 1 (i.e., R 1 =37), and R 2 denote the sum of the ranks in group 2 (i.e., R 2 =18). If the null hypothesis is true (i.e., if the two populations are equal), we expect R 1 and R 2 to be similar. In this example, the lower values (lower ranks) are clustered in the new drug group (group 2), while the higher values (higher ranks) are clustered in the placebo group (group 1). This is suggestive, but is the observed difference in the sums of the ranks simply due to chance? To answer this we will compute a test statistic to summarize the sample information and look up the corresponding value in a probability distribution.

T est Statistic for the Mann Whitney U Test

The test statistic for the Mann Whitney U Test is denoted U and is the smaller of U 1 and U 2 , defined below.

where R 1 = sum of the ranks for group 1 and R 2 = sum of the ranks for group 2.

For this example,

In our example, U=3. Is this evidence in support of the null or research hypothesis? Before we address this question, we consider the range of the test statistic U in two different situations.

Situation #1

Consider the situation where there is complete separation of the groups, supporting the research hypothesis that the two populations are not equal. If all of the higher numbers of episodes of shortness of breath (and thus all of the higher ranks) are in the placebo group, and all of the lower numbers of episodes (and ranks) are in the new drug group and that there are no ties, then:

Therefore, when there is clearly a difference in the populations, U=0.

Situation #2

Consider a second situation where l ow and high scores are approximately evenly distributed in the two groups , supporting the null hypothesis that the groups are equal. If ranks of 2, 4, 6, 8 and 10 are assigned to the numbers of episodes of shortness of breath reported in the placebo group and ranks of 1, 3, 5, 7 and 9 are assigned to the numbers of episodes of shortness of breath reported in the new drug group, then:

When there is clearly no difference between populations, then U=10.  

Thus, smaller values of U support the research hypothesis, and larger values of U support the null hypothesis.

In every test, we must determine whether the observed U supports the null or research hypothesis. This is done following the same approach used in parametric testing. Specifically, we determine a critical value of U such that if the observed value of U is less than or equal to the critical value, we reject H 0 in favor of H 1 and if the observed value of U exceeds the critical value we do not reject H 0 .

The critical value of U can be found in the table below. To determine the appropriate critical value we need sample sizes (for Example: n 1 =n 2 =5) and our two-sided level of significance (α=0.05). For Example 1 the critical value is 2, and the decision rule is to reject H 0 if U < 2. We do not reject H 0 because 3 > 2. We do not have statistically significant evidence at α =0.05, to show that the two populations of numbers of episodes of shortness of breath are not equal. However, in this example, the failure to reach statistical significance may be due to low power. The sample data suggest a difference, but the sample sizes are too small to conclude that there is a statistically significant difference.

Table of Critical Values for U

A new approach to prenatal care is proposed for pregnant women living in a rural community. The new program involves in-home visits during the course of pregnancy in addition to the usual or regularly scheduled visits. A pilot randomized trial with 15 pregnant women is designed to evaluate whether women who participate in the program deliver healthier babies than women receiving usual care. The outcome is the APGAR score measured 5 minutes after birth. Recall that APGAR scores range from 0 to 10 with scores of 7 or higher considered normal (healthy), 4-6 low and 0-3 critically low. The data are shown below.

Is there statistical evidence of a difference in APGAR scores in women receiving the new and enhanced versus usual prenatal care? We run the test using the five-step approach.

  •   Step 1. Set up hypotheses and determine level of significance.

H 1 : The two populations are not equal.  α =0.05

  • Step 2.  Select the appropriate test statistic.  

Because APGAR scores are not normally distributed and the samples are small (n 1 =8 and n 2 =7), we use the Mann Whitney U test. The test statistic is U, the smaller of

  where R 1 and R 2 are the sums of the ranks in groups 1 and 2, respectively.

  • Step 3. Set up decision rule.

The appropriate critical value can be found in the table above. To determine the appropriate critical value we need sample sizes (n 1 =8 and n 2 =7) and our two-sided level of significance (α=0.05). The critical value for this test with n 1 =8, n 2 =7 and α =0.05 is 10 and the decision rule is as follows: Reject H 0 if U < 10.

  • Step 4. Compute the test statistic.  

The first step is to assign ranks of 1 through 15 to the smallest through largest values in the total sample, as follows:

Next, we sum the ranks in each group. In the usual care group, the sum of the ranks is R 1 =45.5 and in the new program group, the sum of the ranks is R 2 =74.5. Recall that the sum of the ranks will always equal n(n+1)/2.   As a check on our assignment of ranks, we have n(n+1)/2 = 15(16)/2=120 which is equal to 45.5+74.5 = 120.  

We now compute U 1 and U 2 , as follows:

Thus, the test statistic is U=9.5.  

  • Step 5.  Conclusion:

We reject H 0 because 9.5 < 10. We have statistically significant evidence at α =0.05 to show that the populations of APGAR scores are not equal in women receiving usual prenatal care as compared to the new program of prenatal care.

Example:  

A clinical trial is run to assess the effectiveness of a new anti-retroviral therapy for patients with HIV. Patients are randomized to receive a standard anti-retroviral therapy (usual care) or the new anti-retroviral therapy and are monitored for 3 months. The primary outcome is viral load which represents the number of HIV copies per milliliter of blood. A total of 30 participants are randomized and the data are shown below.

Is there statistical evidence of a difference in viral load in patients receiving the standard versus the new anti-retroviral therapy?  

  • Step 1. Set up hypotheses and determine level of significance.

H 1 : The two populations are not equal. α=0.05

  • Step 2. Select the appropriate test statistic.  

Because viral load measures are not normally distributed (with outliers as well as limits of detection (e.g., "undetectable")), we use the Mann-Whitney U test. The test statistic is U, the smaller of

where R 1 and R 2 are the sums of the ranks in groups 1 and 2, respectively.

  • Step 3. Set up the decision rule.  

The critical value can be found in the table of critical values based on sample sizes (n 1 =n 2 =15) and a two-sided level of significance (α=0.05). The critical value 64 and the decision rule is as follows: Reject H 0 if U < 64.

  • Step 4 . Compute the test statistic.  

The first step is to assign ranks of 1 through 30 to the smallest through largest values in the total sample. Note in the table below, that the "undetectable" measurement is listed first in the ordered values (smallest) and assigned a rank of 1.  

Next, we sum the ranks in each group. In the standard anti-retroviral therapy group, the sum of the ranks is R 1 =245; in the new anti-retroviral therapy group, the sum of the ranks is R 2 =220. Recall that the sum of the ranks will always equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 30(31)/2=465 which is equal to 245+220 = 465.  We now compute U 1 and U 2 , as follows,

Thus, the test statistic is U=100.  

  • Step 5.  Conclusion.  

We do not reject H 0 because 100 > 64. We do not have sufficient evidence to conclude that the treatment groups differ in viral load.

Tests with Matched Samples

This section describes nonparametric tests to compare two groups with respect to a continuous outcome when the data are collected on matched or paired samples. The parametric procedure for doing this was presented in the modules on hypothesis testing for the situation in which the continuous outcome was normally distributed. This section describes procedures that should be used when the outcome cannot be assumed to follow a normal distribution. There are two popular nonparametric tests to compare outcomes between two matched or paired groups. The first is called the Sign Test and the second the Wilcoxon Signed Rank Test .  

Recall that when data are matched or paired, we compute difference scores for each individual and analyze difference scores. The same approach is followed in nonparametric tests. In parametric tests, the null hypothesis is that the mean difference (μ d ) is zero. In nonparametric tests, the null hypothesis is that the median difference is zero.  

Consider a clinical investigation to assess the effectiveness of a new drug designed to reduce repetitive behaviors in children affected with autism. If the drug is effective, children will exhibit fewer repetitive behaviors on treatment as compared to when they are untreated. A total of 8 children with autism enroll in the study. Each child is observed by the study psychologist for a period of 3 hours both before treatment and then again after taking the new drug for 1 week. The time that each child is engaged in repetitive behavior during each 3 hour observation period is measured. Repetitive behavior is scored on a scale of 0 to 100 and scores represent the percent of the observation time in which the child is engaged in repetitive behavior. For example, a score of 0 indicates that during the entire observation period the child did not engage in repetitive behavior while a score of 100 indicates that the child was constantly engaged in repetitive behavior. The data are shown below. 

Looking at the data, it appears that some children improve (e.g., Child 5 scored 80 before treatment and 20 after treatment), but some got worse (e.g., Child 3 scored 40 before treatment and 50 after treatment). Is there statistically significant improvement in repetitive behavior after 1 week of treatment?.  

Because the before and after treatment measures are paired, we compute difference scores for each child. In this example, we subtract the assessment of repetitive behaviors after treatment from that measured before treatment so that difference scores represent improvement in repetitive behavior. The question of interest is whether there is significant improvement after treatment.

In this small sample, the observed difference (or improvement) scores vary widely and are subject to extremes (e.g., the observed difference of 60 is an outlier). Thus, a nonparametric test is appropriate to test whether there is significant improvement in repetitive behavior before versus after treatment. The hypotheses are given below.

H 0 : The median difference is zero  versus

H 1 : The median difference is positive α=0.05

In this example, the null hypothesis is that there is no difference in scores before versus after treatment. If the null hypothesis is true, we expect to see some positive differences (improvement) and some negative differences (worsening). If the research hypothesis is true, we expect to see more positive differences after treatment as compared to before.

The Sign Test

The Sign Test is the simplest nonparametric test for matched or paired data. The approach is to analyze only the signs of the difference scores, as shown below:

If the null hypothesis is true (i.e., if the median difference is zero) then we expect to see approximately half of the differences as positive and half of the differences as negative. If the research hypothesis is true, we expect to see more positive differences.  

Test Statistic for the Sign Test

The test statistic for the Sign Test is the number of positive signs or number of negative signs, whichever is smaller. In this example, we observe 2 negative and 6 positive signs. Is this evidence of significant improvement or simply due to chance?

Determining whether the observed test statistic supports the null or research hypothesis is done following the same approach used in parametric testing. Specifically, we determine a critical value such that if the smaller of the number of positive or negative signs is less than or equal to that critical value, then we reject H 0 in favor of H 1 and if the smaller of the number of positive or negative signs is greater than the critical value, then we do not reject H 0 . Notice that this is a one-sided decision rule corresponding to our one-sided research hypothesis (the two-sided situation is discussed in the next example).  

Table of Critical Values for the Sign Test

The critical values for the Sign Test are in the table below.

To determine the appropriate critical value we need the sample size, which is equal to the number of matched pairs (n=8) and our one-sided level of significance α=0.05. For this example, the critical value is 1, and the decision rule is to reject H 0 if the smaller of the number of positive or negative signs < 1. We do not reject H 0 because 2 > 1. We do not have sufficient evidence at α=0.05 to show that there is improvement in repetitive behavior after taking the drug as compared to before. In essence, we could use the critical value to decide whether to reject the null hypothesis. Another alternative would be to calculate the p-value, as described below.

Computing P-values for the Sign Test 

With the Sign test we can readily compute a p-value based on our observed test statistic. The test statistic for the Sign Test is the smaller of the number of positive or negative signs and it follows a binomial distribution with n = the number of subjects in the study and p=0.5 (See the module on Probability for details on the binomial distribution). In the example above, n=8 and p=0.5 (the probability of success under H 0 ).

By using the binomial distribution formula:

we can compute the probability of observing different numbers of successes during 8 trials. These are shown in the table below.

Recall that a p-value is the probability of observing a test statistic as or more extreme than that observed. We observed 2 negative signs. Thus, the p-value for the test is: p-value = P(x < 2). Using the table above,

Because the p-value = 0.1446 exceeds the level of significance α=0.05, we do not have statistically significant evidence that there is improvement in repetitive behaviors after taking the drug as compared to before.  Notice in the table of binomial probabilities above, that we would have had to observe at most 1 negative sign to declare statistical significance using a 5% level of significance. Recall the critical value for our test was 1 based on the table of critical values for the Sign Test (above).

One-Sided versus Two-Sided Test

In the example looking for differences in repetitive behaviors in autistic children, we used a one-sided test (i.e., we hypothesize improvement after taking the drug). A two sided test can be used if we hypothesize a difference in repetitive behavior after taking the drug as compared to before. From the table of critical values for the Sign Test, we can determine a two-sided critical value and again reject H 0 if the smaller of the number of positive or negative signs is less than or equal to that two-sided critical value. Alternatively, we can compute a two-sided p-value. With a two-sided test, the p-value is the probability of observing many or few positive or negative signs. If the research hypothesis is a two sided alternative (i.e., H 1 : The median difference is not zero), then the p-value is computed as: p-value = 2*P(x < 2). Notice that this is equivalent to p-value = P(x < 2) + P(x > 6), representing the situation of few or many successes. Recall in two-sided tests, we reject the null hypothesis if the test statistic is extreme in either direction. Thus, in the Sign Test, a two-sided p-value is the probability of observing few or many positive or negative signs. Here we observe 2 negative signs (and thus 6 positive signs). The opposite situation would be 6 negative signs (and thus 2 positive signs as n=8). The two-sided p-value is the probability of observing a test statistic as or more extreme in either direction (i.e.,

When Difference Scores are Zero

There is a special circumstance that needs attention when implementing the Sign Test which arises when one or more participants have difference scores of zero (i.e., their paired measurements are identical). If there is just one difference score of zero, some investigators drop that observation and reduce the sample size by 1 (i.e., the sample size for the binomial distribution would be n-1). This is a reasonable approach if there is just one zero. However, if there are two or more zeros, an alternative approach is preferred.

  • If there is an even number of zeros, we randomly assign them positive or negative signs.
  • If there is an odd number of zeros, we randomly drop one and reduce the sample size by 1, and then randomly assign the remaining observations positive or negative signs. The following example illustrates the approach.

A new chemotherapy treatment is proposed for patients with breast cancer.   Investigators are concerned with patient's ability to tolerate the treatment and assess their quality of life both before and after receiving the new chemotherapy treatment.   Quality of life (QOL) is measured on an ordinal scale and for analysis purposes, numbers are assigned to each response category as follows: 1=Poor, 2= Fair, 3=Good, 4= Very Good, 5 = Excellent.   The data are shown below. 

The question of interest is whether there is a difference in QOL after chemotherapy treatment as compared to before.  

H 0 : The median difference is zero versus

H 1 : The median difference is not zero α=0.05

  • Step 2.  Select the appropriate test statistic.

The test statistic for the Sign Test is the smaller of the number of positive or negative signs.

  • Step 3. Set up the decision rule.

The appropriate critical value for the Sign Test can be found in the table of critical values for the Sign Test. To determine the appropriate critical value we need the sample size (or number of matched pairs, n=12), and our two-sided level of significance α=0.05.  

The critical value for this two-sided test with n=12 and a =0.05 is 2, and the decision rule is as follows: Reject H 0 if the smaller of the number of positive or negative signs < 2.

  • Step 4. Compute the test statistic.

Because the before and after treatment measures are paired, we compute difference scores for each patient. In this example, we subtract the QOL measured before treatment from that measured after.

We now capture the signs of the difference scores and because there are two zeros, we randomly assign one negative sign (i.e., "-" to patient 5)   and one positive sign (i.e., "+" to patient 8), as follows:

 The test statistic is the number of negative signs which is equal to 3.

  • Step 5. Conclusion.

We do not reject H 0 because 3 > 2. We do not have statistically significant evidence at α=0.05 to show that there is a difference in QOL after chemotherapy treatment as compared to before.  

We can also compute the p-value directly using the binomial distribution with n = 12 and p=0.5.   The two-sided p-value for the test is p-value = 2*P(x < 3) (which is equivalent to p-value = P(x < 3) + P(x > 9)). Again, the two-sided p-value is the probability of observing few or many positive or negative signs. Here we observe 3 negative signs (and thus 9 positive signs). The opposite situation would be 9 negative signs (and thus 3 positive signs as n=12). The two-sided p-value is the probability of observing a test statistic as or more extreme in either direction (i.e., P(x < 3) + P(x > 9)). We can compute the p-value using the binomial formula or a statistical computing package, as follows:

Because the p-value = 0.1460 exceeds the level of significance (α=0.05) we do not have statistically significant evidence at α =0.05 to show that there is a difference in QOL after chemotherapy treatment as compared to before.  

Wilcoxon Signed Rank Test

Another popular nonparametric test for matched or paired data is called the Wilcoxon Signed Rank Test. Like the Sign Test, it is based on difference scores, but in addition to analyzing the signs of the differences, it also takes into account the magnitude of the observed differences.

Let's use the Wilcoxon Signed Rank Test to re-analyze the data in Example 4 on page 5 of this module. Recall that this study assessed the effectiveness of a new drug designed to reduce repetitive behaviors in children affected with autism. A total of 8 children with autism enroll in the study and the amount of time that each child is engaged in repetitive behavior during three hour observation periods are measured both before treatment and then again after taking the new medication for a period of 1 week. The data are shown below. 

First, we compute difference scores for each child.  

The next step is to rank the difference scores. We first order the absolute values of the difference scores and assign rank from 1 through n to the smallest through largest absolute values of the difference scores, and assign the mean rank when there are ties in the absolute values of the difference scores.  

The final step is to attach the signs ("+" or "-") of the observed differences to each rank as shown below.

Similar to the Sign Test, hypotheses for the Wilcoxon Signed Rank Test concern the population median of the difference scores. The research hypothesis can be one- or two-sided. Here we consider a one-sided test.

Test Statistic for the Wilcoxon Signed Rank Test

The test statistic for the Wilcoxon Signed Rank Test is W, defined as the smaller of W+ (sum of the positive ranks) and W- (sum of the negative ranks). If the null hypothesis is true, we expect to see similar numbers of lower and higher ranks that are both positive and negative (i.e., W+ and W- would be similar). If the research hypothesis is true we expect to see more higher and positive ranks (in this example, more children with substantial improvement in repetitive behavior after treatment as compared to before, i.e., W+ much larger than W-).

In this example, W+ = 32 and W- = 4. Recall that the sum of the ranks (ignoring the signs) will always equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 8(9)/2 = 36 which is equal to 32+4. The test statistic is W = 4.

Next we must determine whether the observed test statistic W supports the null or research hypothesis. This is done following the same approach used in parametric testing. Specifically, we determine a critical value of W such that if the observed value of W is less than or equal to the critical value, we reject H 0 in favor of H 1 , and if the observed value of W exceeds the critical value, we do not reject H 0 .

Table of Critical Values of W

The critical value of W can be found in the table below:

To determine the appropriate one-sided critical value we need sample size (n=8) and our one-sided level of significance (α=0.05). For this example, the critical value of W is 6 and the decision rule is to reject H 0 if W < 6. Thus, we reject H 0 , because 4 < 6. We have statistically significant evidence at α =0.05, to show that the median difference is positive (i.e., that repetitive behavior improves.)

Note that when we analyzed the data previously using the Sign Test, we failed to find statistical significance. However, when we use the Wilcoxon Signed Rank Test, we conclude that the treatment result in a statistically significant improvement at α=0.05. The discrepant results are due to the fact that the Sign Test uses very little information in the data and is a less powerful test.

A study is run to evaluate the effectiveness of an exercise program in reducing systolic blood pressure in patients with pre-hypertension (defined as a systolic blood pressure between 120-139 mmHg or a diastolic blood pressure between 80-89 mmHg). A total of 15 patients with pre-hypertension enroll in the study, and their systolic blood pressures are measured. Each patient then participates in an exercise training program where they learn proper techniques and execution of a series of exercises. Patients are instructed to do the exercise program 3 times per week for 6 weeks. After 6 weeks, systolic blood pressures are again measured. The data are shown below. 

Is there is a difference in systolic blood pressures after participating in the exercise program as compared to before?

  • Step1. Set up hypotheses and determine level of significance.
  • Step 2. Select the appropriate test statistic.

The test statistic for the Wilcoxon Signed Rank Test is W, defined as the smaller of W+ and W- which are the sums of the positive and negative ranks, respectively.  

The critical value of W can be found in the table of critical values. To determine the appropriate critical value from Table 7 we need sample size (n=15) and our two-sided level of significance (α=0.05). The critical value for this two-sided test with n=15 and α=0.05 is 25 and the decision rule is as follows: Reject H 0 if W < 25.

 Because the before and after systolic blood pressures measures are paired, we compute difference scores for each patient.  

The next step is to rank the ordered absolute values of the difference scores using the approach outlined in Section 10.1. Specifically, we assign ranks from 1 through n to the smallest through largest absolute values of the difference scores, respectively, and assign the mean rank when there are ties in the absolute values of the difference scores.  

The final step is to attach the signs ("+" or "-") of the observed differences to each rank as shown below. 

In this example, W+ = 89 and W- = 31. Recall that the sum of the ranks (ignoring the signs) will always equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 15(16)/2 = 120 which is equal to 89 + 31. The test statistic is W = 31.

We do not reject H 0 because 31 > 25. Therefore, we do not have statistically significant evidence at α=0.05, to show that the median difference in systolic blood pressures is not zero (i.e., that there is a significant difference in systolic blood pressures after the exercise program as compared to before).

Tests with More than Two Independent Samples

In the modules on hypothesis testing we presented techniques for testing the equality of means in more than two independent samples using analysis of variance (ANOVA). An underlying assumption for appropriate use of ANOVA was that the continuous outcome was approximately normally distributed or that the samples were sufficiently large (usually n j > 30, where j=1, 2, ..., k and k denotes the number of independent comparison groups). An additional assumption for appropriate use of ANOVA is equality of variances in the k comparison groups. ANOVA is generally robust when the sample sizes are small but equal. When the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate.

The Kruskal-Wallis Test

A popular nonparametric test to compare outcomes among more than two independent groups is the Kruskal Wallis test.   The Kruskal Wallis test is used to compare medians among k comparison groups (k > 2) and is sometimes described as an ANOVA with the data replaced by their ranks.   The null and research hypotheses for the Kruskal Wallis nonparametric test are stated as follows: 

H 0 : The k population medians are equal versus

H 1 : The k population medians are not all equal

The procedure for the test involves pooling the observations from the k samples into one combined sample, keeping track of which sample each observation comes from, and then ranking lowest to highest from 1 to N, where N = n 1 +n 2 + ...+ n k .  To illustrate the procedure, consider the following example.

A clinical study is designed to assess differences in albumin levels in adults following diets with different amounts of protein. Low protein diets are often prescribed for patients with kidney failure. Albumin is the most abundant protein in blood, and its concentration in the serum is measured in grams per deciliter (g/dL). Clinically, serum albumin concentrations are also used to assess whether patients get sufficient protein in their diets. Three diets are compared, ranging from 5% to 15% protein, and the 15% protein diet represents a typical American diet. The albumin levels of participants following each diet are shown below.

Is there is a difference in serum albumin levels among subjects on the three different diets. For reference, normal albumin levels are generally between 3.4 and 5.4 g/dL. By inspection, it appears that participants following the 15% protein diet have higher albumin levels than those following the 5% protein diet. The issue is whether this observed difference is statistically significant.  

In this example, the outcome is continuous, but the sample sizes are small and not equal across comparison groups (n 1 =3, n 2 =5, n 3 =4). Thus, a nonparametric test is appropriate. The hypotheses to be tested are given below, and we will us a 5% level of significance.

H 0 : The three population medians are equal versus

H 1 : The three population medians are not all equal

To conduct the test we first order the data in the combined total sample of 12 subjects from smallest to largest. We also need to keep track of the group assignments in the total sample.

Notice that the lower ranks (e.g., 1, 2.5, 4) are assigned to the 5% protein diet group while the higher ranks (e.g., 10, 11 and 12) are assigned to the 15% protein diet group. Again, the goal of the test is to determine whether the observed data support a difference in the three population medians. Recall in the parametric tests, discussed in the modules on hypothesis testing, when comparing means among more than two groups we analyzed the difference among the sample means (mean square between groups) relative to their within group variability and summarized the sample information in a test statistic (F statistic). In the Kruskal Wallis test we again summarize the sample information in a test statistic based on the ranks.

Test Statistic for the Kruskal Wallis Test 

The test statistic for the Kruskal Wallis test is denoted H and is defined as follows: 

where k=the number of comparison groups, N= the total sample size, n j is the sample size in the j th group and R j is the sum of the ranks in the j th group.  

In this example R 1 = 7.5, R 2 = 30.5, and R 3 = 40. Recall that the sum of the ranks will always equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 12(13)/2=78 which is equal to 7.5+30.5+40 = 78. The H statistic for this example is computed as follows:

We must now determine whether the observed test statistic H supports the null or research hypothesis. Once again, this is done by establishing a critical value of H. If the observed value of H is greater than or equal to the critical value, we reject H 0 in favor of H 1 ; if the observed value of H is less than the critical value we do not reject H 0 . The critical value of H can be found in the table below.

Critical Values of H for the Kruskal Wallis Test

To determine the appropriate critical value we need sample sizes (n 1 =3, n 2 =5 and n 3 =4) and our level of significance (α=0.05). For this example the critical value is 5.656, thus we reject H 0 because 7.52 > 5.656, and we conclude that there is a difference in median albumin levels among the three different diets.  

Notice that Table 8 contains critical values for the Kruskal Wallis test for tests comparing 3, 4 or 5 groups with small sample sizes. If there are 3 or more comparison groups and 5 or more observations in each of the comparison groups, it can be shown that the test statistic H approximates a chi-square distribution with df=k-1. 4 Thus, in a Kruskal Wallis test with 3 or more comparison groups and 5 or more observations in each group, the critical value for the test can be found in the table of Critical Values of the χ 2 Distribution below.

Critical Values of the χ 2 Distribution

The following example illustrates this situation.

A personal trainer is interested in comparing the anaerobic thresholds of elite athletes. Anaerobic threshold is defined as the point at which the muscles cannot get more oxygen to sustain activity or the upper limit of aerobic exercise. It is a measure also related to maximum heart rate. The following data are anaerobic thresholds for distance runners, distance cyclists, distance swimmers and cross-country skiers.  

 Is a difference in anaerobic thresholds among the different groups of elite athletes?

H 0 : The four population medians are equal versus

H 1 : The four population medians are not all equal α=0.05

The test statistic for the Kruskal Wallis test is denoted H and is defined as follows:

 where k=the number of comparison groups, N= the total sample size, n j is the sample size in the j th group and R j is the sum of the ranks in the j th group.   

Because there are 4 comparison groups and 5 observations in each of the comparison groups, we find the critical value in the table of critical values for the chi-square distribution for df=k-1=4-1=3 and α=0.05. The critical value is 7.81, and the decision rule is to reject H 0 if H > 7.81.  

To conduct the test we assign ranks using the procedures outlined above. The first step in assigning ranks is to order the data from smallest to largest. This is done on the combined or total sample (i.e., pooling the data from the four comparison groups (n=20)), and assigning ranks from 1 to 20, as follows. We also need to keep track of the group assignments in the total sample. The table below shows the ordered data.

 We now assign the ranks to the ordered values and sum the ranks in each group. 

Recall that the sum of the ranks will always equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 20(21)/2=210 which is equal to 46+62+24+78 = 210. In this example,

Reject H 0 because 9.11 > 7.81. We have statistically significant evidence at α =0.05, to show that there is a difference in median anaerobic thresholds among the four different groups of elite athletes.  

Notice that in this example, the anaerobic thresholds of the distance runners, cyclists and cross-country skiers are comparable (looking only at the raw data).  The distance swimmers appear to be the athletes that differ from the others in terms of anaerobic thresholds.   Recall, similar to analysis of variance tests, we reject the null hypothesis in favor of the alternative hypothesis if any two of the medians are not equal.

This module presents hypothesis testing techniques for situations with small sample sizes and outcomes that are ordinal, ranked or continuous and cannot be assumed to be normally distributed. Nonparametric tests are based on ranks which are assigned to the ordered data. The tests involve the same five steps as parametric tests, specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. The tests are summarized below.

Mann Whitney U Test

Use: To compare a continuous outcome in two independent samples.

Null Hypothesis: H 0 : Two populations are equal

Test Statistic: The test statistic is U, the smaller of

  Decision Rule:  Reject H 0 if U < critical value from table

Use: To compare a continuous outcome in two matched or paired samples.

Null Hypothesis: H 0 : Median difference is zero

Test Statistic: The test statistic is the smaller of the number of positive or negative signs.

Decision Rule: Reject H 0 if the smaller of the number of positive or negative signs < critical value from table.

Wilcoxon Signed Rank Test  

Test Statistic: The test statistic is W, defined as the smaller of W+ and W- which are the sums of the positive and negative ranks of the difference scores, respectively.  

Decision Rule: Reject H 0 if W < critical value from table.

Kruskal Wallis Test

Use: To compare a continuous outcome in more than two independent samples.

Null Hypothesis: H 0 : k population medians are equal

Test Statistic: The test statistic is H,

Decision Rule: Reject H 0 if H > critical value

  • D'Agostino RB and Stevens MA. Goodness of Fit Techniques.
  • Apgar, Virginia (1953). " A proposal for a new method of evaluation of the newborn infant ". Curr. Res. Anesth. Analg. 32 (4): 260-267.
  • Conover WJ. Practical Nonparametric Statistics, 2 nd edition, New York: John Wiley and Sons.
  • Siegel and Castellan. (1988). "Nonparametric Statistics for the Behavioral Sciences," 2nd edition, New York: McGraw-Hill.

Non-Parametric Test

Non-parametric test is a statistical analysis method that does not assume the population data belongs to some prescribed distribution which is determined by some parameters. Due to this, a non-parametric test is also known as a distribution-free test. These tests are usually based on distributions that have unspecified parameters.

A non-parametric test acts as an alternative to a parametric test for mathematical models where the nature of parameters is flexible. Usually, when the assumptions of parametric tests are violated then non-parametric tests are used. In this article, we will learn more about a non-parametric test, the types, examples, advantages, and disadvantages.

What is Non-Parametric Test in Statistics?

A non-parametric test in statistics does not assume that the data has been taken from a normal distribution . A normal distribution belongs to a parametrized family of probability distributions and includes parameters such as mean, variance, standard deviation, etc. Thus, a non-parametric test does not make assumptions about the probability distribution's parameters.

Non-Parametric Test Definition

A non-parametric test can be defined as a test that is used in statistical analysis when the data under consideration does not belong to a parametrized family of distributions. When the data does not meet the requirements to perform a parametric test, a non-parametric test is used to analyze it.

Reasons to Use Non-Parametric Tests

It is important to access when to apply parametric and non-parametric tests in order to arrive at the correct statistical inference. The reasons to use a non-parametric test are given below:

  • When the distribution is skewed, a non-parametric test is used. For skewed distributions, the mean is not the best measure of central tendency, hence, parametric tests cannot be used.
  • If the size of the data is too small then validating the distribution of the data becomes difficult. Thus, in such cases, a non-parametric test is used to analyze the data.
  • If the data is nominal or ordinal, a non-parametric test is used. This is because a parametric test can only be used for continuous data.

Types of Non-Parametric Tests

Types of Non-Parametric Tests

Parametric tests are those that assume that the data follows a normal distribution. Examples include ANOVA and t-tests. There are many different methods available to perform a non-parametric test. These tests can also be used in hypothesis testing. Some common non-parametric tests are given as follows:

Mann-Whitney U Test

This non-parametric test is analogous to t-tests for independent samples. To conduct such a test the distribution must contain ordinal data. It is also known as the Wilcoxon rank sum test.

Null Hypothesis: \(H_{0}\): The two populations under consideration must be equal.

Test Statistic: U should be smaller of

\(U_{1} = n_{1}n_{2}+\frac{n_{1}(n_{1}+1)}{2}-R_{1}\) or \(U_{2} = n_{1}n_{2}+\frac{n_{2}(n_{2}+1)}{2}-R_{2}\)

where, \(R_{1}\) is the sum of ranks in group 1 and \(R_{2}\) is the sum of ranks in group 2.

Decision Criteria: Reject the null hypothesis if U < critical value.

Wilcoxon Signed Rank Test

This is the non-parametric test whose counterpart is the parametric paired t-test . It is used to compare two samples that contain ordinal data and are dependent. The Wilcoxon signed rank test assumes that the data comes from a symmetric distribution.

Null Hypothesis: \(H_{0}\): The difference in the median is 0.

Test Statistic: W. W is defined as the smaller of the sums of the negative and positive ranks.

Decision Criteria: Reject the null hypothesis if W < critical value.

This non-parametric test is the parametric counterpart to the paired samples t-test. The sign test is similar to the Wilcoxon sign test.

Test Statistic: The smaller value among the number of positive and negative signs.

Decision Criteria: Reject the null hypothesis if the test statistic < critical value.

Kruskal Wallis Test

The parametric one-way ANOVA test is analogous to the non-parametric Kruskal Wallis test. It is used for comparing more than two groups of data that are independent and ordinal.

Null Hypothesis: \(H_{0}\): m population medians are equal

Test Statistic: H = \(\left ( \frac{12}{N(N+1)}\sum_{1}^{m} \frac{R_{j}^{2}}{n_{j}}\right ) - 3(N+1)\)

where, N = total sample size, \(n_{j}\) and \(R_{j}\) are the sample size and the sum of ranks of the j th group

Decision Criteria: Reject the null hypothesis if H > critical value

Non-Parametric Test Example

The best way to understand how to set up and solve a hypothesis involving a non-parametric test is by taking an example.

Suppose patients are suffering from cancer. They are divided into three groups and different drugs were administered. The platelet count for the patients is given in the table below. It needs to be checked if the population medians are equal. The significance level is 0.05.

As the size of the 3 groups is not same the Kruskal Wallis test is used.

\(H_{0}\): Population medians are same

\(H_{1}\): Population medians are different

\(n_{1}\) = 5, \(n_{2}\) = 3, \(n_{3}\) = 4

N = 5 + 3 + 4 = 12

Now ordering the groups and assigning ranks

\(R_{1}\) = 18.5, \(R_{2}\) = 21, \(R_{3}\) = 38.5,

Substituting these values in the test statistic formula, \(\left ( \frac{12}{N(N+1)}\sum_{1}^{m} \frac{R_{j}^{2}}{n_{j}}\right ) - 3(N+1)\)

H = 6.0778.

Using the critical value table, the critical value will be 5.656.

As H < critical value, the null hypothesis is rejected and it is concluded that there is no significant evidence to show that the population medians are equal.

Difference between Parametric and Non-Parametric Test

Depending upon the type of distribution that the data has been obtained from both, a parametric test and a non-parametric test can be used in hypothesis testing. The table given below outlines the main difference between parametric and non-parametric tests.

Advantages and Disadvantages of Non-Parametric Test

Non-parametric tests are used when the conditions for a parametric test are not satisfied. In some cases when the data does not match the required assumptions but has a large sample size then a parametric test can still be used. Some of the advantages and disadvantages of a non-parametric test are listed as follows:

Advantages of Non-Parametric Test

The advantages of a non-parametric test are listed as follows:

  • Knowledge of the population distribution is not required.
  • The calculations involved in such a test are shorter.
  • A non-parametric test is easy to understand.
  • These tests are applicable to all data types.

Disadvantages of Non-Parametric Test

The disadvantages of a non-parametric test are given below:

  • They are not as efficient as their parametric counterparts.
  • As these are distribution-free tests the level of accuracy is reduced.

Related Articles:

  • Summary Statistics
  • Probability and Statistics
  • T-Distribution

Important Notes on Non-Parametric Test

  • A non-parametric test is a statistical test that is performed on data belonging to a distribution whose parameters are unknown.
  • It is used on skewed distributions and the measure of central tendency used is the median.
  • Kruskal Wallis test, sign test, Wilcoxon signed test and the Mann Whitney u test are some important non-parametric tests used in hypothesis testing.

Examples on Non-Parametric Test

Example 1: A surprise quiz was taken and the scores of 6 students are given as follows:

After giving a month's time to practice, the same quiz was taken again and the following scores were obtained.

Assigning signed ranks to the differences

\(H_{0}\): Median difference is 0. \(H_{1}\): Median difference is positive. W1: Sum of positive ranks = 17.5 W2: Sum of negative ranks = 3.5 As W2 < W1, thus, W2 is the test statistic. Now from the table, the critical value is 2. Since W2 > 2, thus, the null hypothesis cannot be rejected and it can be concluded that there is no difference between the scores of the two tests. Answer: Fail to reject the null hypothesis

\(H_{0}\): Two groups report same number of cases \(H_{1}\): Two groups report different number of cases \(R_{1}\) = 15.5, \(R_{2}\) = 39.5 \(n_{1}\) = \(n_{2}\) = 5 Using the formulas, \(U_{1} = n_{1}n_{2}+\frac{n_{1}(n_{1}+1)}{2}-R_{1}\) and \(U_{2} = n_{1}n_{2}+\frac{n_{2}(n_{2}+1)}{2}-R_{2}\) \(U_{1}\) = 24.5, \(U_{2}\) = 0.5 As \(U_{2}\) < \(U_{1}\), thus, \(U_{2}\) is the test statistic. From the table the critical value is 2 As \(U_{2}\) < 2, the null hypothesis is rejected and it is concluded that there is no evidence to prove that the two groups have the same number of sleepwalking cases. Answer: Null hypothesis is rejected

go to slide go to slide go to slide

hypothesis testing example non parametric

Book a Free Trial Class

FAQs on Non-Parametric Test

What is a non-parametric test.

A non-parametric test in statistics is a test that is performed on data belonging to a distribution that has flexible parameters. Thus, they are also known as distribution-free tests.

When Should a Non-Parametric Test be Used?

A non-parametric test should be used under the following conditions.

  • The distribution is skewed.
  • The size of the distribution is small.
  • The data is nominal or ordinal.

What is the Test Statistic Used for the Mann-Whitney U Non-Parametric Test?

The Mann Whitney U non-parametric test is the non parametric version of the sample t-test. The test statistic used for hypothesis testing is U . U should be smaller of \(U_{1} = n_{1}n_{2}+\frac{n_{1}(n_{1}+1)}{2}-R_{1}\) or \(U_{2} = n_{1}n_{2}+\frac{n_{2}(n_{2}+1)}{2}-R_{2}\)

What is the Test Statistic Used for the Kruskal Wallis Non-Parametric Test?

The parametric counterpart of the Kruskal Wallis non parametric test is the one way ANOVA test. The test statistic used is H = \(\left ( \frac{12}{N(N+1)}\sum_{1}^{m} \frac{R_{j}^{2}}{n_{j}}\right ) - 3(N+1)\).

What is the Test Statistic Used for the Sign Non-Parametric Test?

The smaller value among the number of positive and negative signs is the test statistic that is used for the sign non-parametric test.

What is the Difference Between a Parametric and Non-Parametric Test?

A parametric test is conducted on data that is obtained from a parameterized distribution such as a normal distribution. On the other hand, a non-parametric test is conducted on a skewed distribution or when the parameters of the population distribution are not known.

What are the Advantages of a Non-Parametric Test?

A non-parametric test does not rely on the assumed parameters of a distribution and is applicable to all data types. Furthermore, they are easy to understand.

hypothesis testing example non parametric

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 11: introduction to nonparametric tests and bootstrap, overview section  , what are nonparametric methods.

Nonparametric methods require very few assumptions about the underlying distribution and can be used when the underlying distribution is unspecified.

In the next section, we will focus on inference for one parameter. There are many methods we will not cover for one sample and also many methods for more than one parameter. We present the Sign Test in some detail because it uses many of the concepts we learned in the course. We leave out the details of the other tests.

  • Determine when to use nonparametric methods.
  • Explain how to conduct the Sign test.
  • Generate a bootstrap sample.
  • Find a confidence interval for any statistic from the bootstrap sample.

Nonparametric Hypothesis Tests in R

Nathaniel e. helwig department of psychology & school of statistics university of minnesota, january 04, 2021, 1.1 motivation and goals, 1.2 permutations in r, 1.3 hypothesis testing basics, 1.4 exact nonparametric tests, 1.5 approximate (monte carlo) tests, 2.1 overview of problem, 2.2 choice of test statistic, 2.3 randomization inference, 2.4 simulation study (helwig, 2019a), 2.5 example 3.1: depression treatment, 3.1 overview of problem, 3.2 choice of test statistic, 3.3 randomization inference, 3.4 simulation study (helwig, 2019a), 3.5 example 4.2: alcohol treatment, 4.1 overview of problem, 4.2 choice of test statistic, 4.3 randomization inference, 4.4 simulation study (helwig, 2019a), 4.5 example 8.5: twins’ test scores, 5.1 overview of problem, 5.2 choice of test statistic, 5.3 randomization inference, 5.4 simulation study (helwig, 2019b), 5.5 example: sat and college gpa.

Copyright January 04, 2021 by Nathaniel E. Helwig

1 Introduction

Nonparametric randomization and permutation tests offer robust alternatives to classic (parametric) hypothesis tests. Unlike classic hypothesis tests, which depend on parametric assumptions and/or large sample approximations for valid inference, nonparametric tests use computationally intensive methods to provide valid inferencial results under a wide collection of data generating conditions. As will be elaborated in upcoming sections, nonparametric tests are a form of conditional inference , where inference is conducted after conditioning on properties of the data.

The stats package in R implements some basic nonparametric tests, but lacks general functionality for robust nonparametric inference. As a result, this document focuses on the nptest R package, which implements robust nonparametric tests for location, correlation, and regression problems. As I demonstrate, the nptest package provides a user-friendly unification of many parametric and nonparametric tests, as well as novel implementations of several robust nonparametric tests.

The word permutation refers to the arrangement of a set of objects into some specified order.

Given a vector \(\mathbf{x} = (x_1, \ldots, x_n)^\top\) of length \(n\) , there are \[ n! = n \times (n - 1) \times \cdots \times 2 \] possible permutations of the \(n\) values.

To generate all possible permutations of the integers \(1,\ldots,n\) , use the permn function in the nptest R package:

Warning: For large n this function will consume a lot of memory and may even crash R.

To obtain a random permutation of the integes \(1,\ldots,n\) , use the sample.int function in the base R package:

Note that the above ideas can be used to permute the elements of any vector, given that the integers \(1,\ldots,n\) can be used to index the vector’s elements. Or, for a random permutation of a generic vector, the sample function in R’s base package can be used:

Null hypothesis significance testing involves the following steps:

Form a null hypothesis \(H_0\) and an alternative hypothesis \(H_1\) about some parameter \(\theta = t(F)\) of a distribution \(F\) .

Calculate some test statistic \(T = s(\mathbf{x})\) from the observed data \(\mathbf{x} = (x_1, \ldots, x_n)^\top\) where \(x_i \stackrel{\mathrm{iid}}{\sim}F\) for \(i = 1, \ldots, n\) .

Calculate the p-value, which is the probability of observing a test statistic as or more extreme than \(T\) under the assumption \(H_0\) is true.

Reject \(H_0\) if the p-value is below some user-determined threshold, e.g., \(\alpha = 0.05\) or some other small value.

The notation \(\theta = t(F)\) denotes that the parameter is a function of the distribution. Similarly, the notation \(T = s(\mathbf{x})\) denotes that the statistic is a function of the data.

The sampling distribution of the test statistic \(T\) refers to the distribution of \(T\) that would be obtained by applying the test statistic function \(s(\cdot)\) to many different random samples of size \(n\) drawn from the population \(F\) . Note that the sampling distribution of \(T\) (under \(H_0\) ) is needed for the p-value calculation in step 3.

In parametric applications of hypothesis tests, the sampling distribution of \(T\) (under \(H_0\) ) is derived from either (i) parametric assumptions about the data generating process, e.g., \(F\) is a normal distribution, or (ii) large sample assumptions about the test statistic, e.g., \(T\) is asymptotically normal.

Permutation Distribution

Nonparametric tests derive the sampling distribution of \(T\) (under \(H_0\) ) by (i) enumerating all data arrangements (or permutations) that are equally likely under \(H_0\) , and then (ii) calculating the test statistic \(T\) for each possible data permutation.

The exact null distribution of the test statistic \(T\) refers to the (discrete) distribution formed by calculating \(T\) for all possible data permutations that are equally likely under \(H_0\) . This distribution will also be referred to as the exact permutation distribution of the test statistic.

Let \(M\) denote the number of data permutations that are equally likely under \(H_0\) :

\(M = 2^n\) for one-sample problems

\(M = {m + n \choose m}\) for two-sample problems

\(M = n!\) for correlation and regression problems

The exact null distribution will be denoted by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{x}_j)\) with \(\mathbf{x}_j\) denoting the \(j\) -th permutation of the data (for \(j = 1,\ldots, M\) ).

Permutation Inference

In nonparametric tests, exact p-values are defined as \[ \mbox{p-value} = \frac{1}{M} \sum_{j = 1}^M I(|T_j| \geq |T|) \] where \(I(\cdot)\) is an indicator function and \(T\) is the observed test statistic. Note that the p-value can range from \(1/M\) to 1 given that \(T\) is one of the \(M\) values in the exact null distribution.

The above p-value calculation assumes a two-sided alternative hypothesis. For directional alternatives, (i) the absolute value is removed from the test statistics in the p-value calculation, and (ii) the \(\geq\) sign is changed to \(\leq\) for “less than” alternatives.

Approximate Permutation Distributions

The total number of permutations \(M\) is often too large to feasibly compute all elements of the exact null distribution.

In such cases, we can conduct an approximate (or Monte Carlo) nonparametric test:

Randomly select \(R < M\) elements from \(\mathcal{T}\)

Define the approximate null distribution \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\)

Compute p-values using \(\mathcal{T}_{R+1}\) in place of \(\mathcal{T}\)

Note that \(\mathcal{T}_{R+1}\) has \(R + 1\) elements ( \(R\) permutations plus 1 observed), so the minimum possible p-value for an approximate nonparametric test is \(1/(R+1)\) . Thus, the accuracy of the Monte Carlo approximation is affected by the number of random permutations \(R\) that are sampled from \(M\) .

Monte Carlo Standard Errors

Consider a nonparametric test where the total number of possible outcomes \(M\) is very large (e.g., \(M \approx \infty\) ), and define the following notation

\(F(t)\) is the distribution function for the exact null distribution \(\mathcal{T}\)

\(G(t)\) is the distribution function for the approximate null distribution \(\mathcal{T}_{R+1}\)

Reminder: the distribution function is a probability calculation \(F(t) = P(T < t)\) .

The Monte Carlo standard error (MCSE) of a nonparametric test is defined as \[ \sigma(t) = \sqrt{ \frac{F(t) [1 - F(t)]}{R + 1} } \] which is the classic formula for the standard error of a sample proportion. Note that the MCSE is the standard deviation of \(G(t)\) , which is our (sample) estimate of the (population) probability \(F(t)\) .

Given the MCSE, a symmetric \(100(1 - \tilde{\alpha})\%\) confidence interval for \(F(t)\) can be approximated as \[ G(t) \pm Z_{1 - \tilde{\alpha}/2} \sigma(t) \] where \(Z_{1 - \tilde{\alpha}/2}\) is the quantile of a standard normal distribution that cuts-off \(\tilde{\alpha} / 2\) in the upper tail (e.g., \(Z_{0.975} \approx 1.96\) for a 95% confidence interval). Note that \(\alpha\) and \(\tilde{\alpha}\) can be different: the value of \(\alpha\) relates to the significance level of the nonparametric test, whereas the value of \(\tilde{\alpha}\) relates to the confidence of conclusions drawn from the approximate test.

Accuracy of Approximate Tests

Let \(\alpha\) denote the significance level for a nonparametric test with a one-sided alternative hypothesis ( \(\alpha\) is one-half the significance level if \(H_1\) is two-sided). Furthermore, let \(t_\alpha\) denote the value of the test statistic such that \(F(t_\alpha) = \alpha\) . To define the accuracy of the approximate test, consider the probability statement \[ P(| G(t_\alpha) - \alpha | < \alpha \delta) \approx 1 - \tilde{\alpha} \] where \(0 < \delta < 1\) quantifies the accuracy of the approximation (e.g., \(\delta = 0.1\) corresponds to 10% accuracy), and \(\tilde{\alpha}\) controls the confidence of the approximation (e.g., \(\tilde{\alpha} = 0.05\) corresponds to 95% confidence). Note that \(G(t_\alpha) \sim N(\alpha, \sigma^2(t_\alpha))\) , so the probability statement will be true when \[ \sigma(t_\alpha) = \sqrt{ \frac{\alpha (1 - \alpha)}{(R + 1)} } = \frac{\alpha \delta}{ Z_{1 - \tilde{\alpha}/2} } \] which can be guaranteed by adjusting the number of permutations \(R\) and/or the accuracy parameter \(\delta\) .

Fixing \(R\) and solving for \(\delta\) gives the accuracy of an approximate test \[ \delta = Z_{1 - \tilde{\alpha}/2} \sqrt{ \frac{(1 - \alpha)}{\alpha (R + 1)} } \] which is a function of the number of permutations \(R\) , the significance level \(\alpha\) , and confidence level \(1-\tilde{\alpha}\) . Fixing \(\delta\) and solving for \(R\) gives the number of permutations needed to achieve a test with a given accuracy \[ R + 1 = \left\lceil \frac{Z_{1 - \tilde{\alpha}/2}^2 (1 - \alpha)}{\alpha \delta^2} \right\rceil \] which is a function of the accuracy parameter \(\delta\) , the significance level \(\alpha\) , and confidence level \(1-\tilde{\alpha}\) . Note that the notation \(\lceil \cdot \rceil\) denotes the ceiling function.

Examples in R

The mcse function (in the nptest package) can be used to find (A) the accuracy of a given test, or (B) the number of permutations needed for a given accuracy.

Find \(\delta\) for a given \(R = 9999\) :

Interpretation: Using an \(\alpha = 0.05\) significance level and a two-sided alternative hypothesis, the approximate nonparametric test with \(R = 9999\) permutations has a MCSE of \(\sigma(t_{0.025}) = 0.0016\) and an accuracy of \(\delta = 0.1224\) (assuming 95% confidence).

Find \(R+1\) for a given \(\delta = 0.1\) (two-sided):

Interpretation: Using an \(\alpha = 0.05\) significance level and a two-sided alternative hypothesis, the approximate nonparametric test with accuracy \(\delta = 0.1\) has a MCSE of \(\sigma(t_{0.025}) = 0.0013\) and a minimum number of resamples \(R + 1 = 14982\) (assuming 95% confidence).

Find \(R+1\) for a given \(\delta = 0.1\) (one-sided):

Interpretation: Using an \(\alpha = 0.05\) significance level and a one-sided alternative hypothesis, the approximate nonparametric test with accuracy \(\delta = 0.1\) has a MCSE of \(\sigma(t_{0.05}) = 0.0026\) and a minimum number of resamples \(R + 1 = 7299\) (assuming 95% confidence).

2 One-Sample Location Tests

For the one-sample location problem, we have \(n\) observations

\(Z_1,\ldots,Z_n \stackrel{\mathrm{iid}}{\sim}F\) if one-sample situation

\(Z_1,\ldots,Z_n \stackrel{\mathrm{iid}}{\sim}F\) with \(Z_i = X_i - Y_i\) if paired sample situation

The observations are assumed to be sampled from a continuous distribution \(F\) that depends on the location parameter \(\mu\) .

Null hypothesis is \(H_{0}: \mu = \mu_0\) where \(\mu_0\) is known

Three possible alternatives: \(H_{1}: \mu < \mu_0\) , \(H_{1}: \mu > \mu_0\) , \(H_{1}: \mu \neq \mu_0\)

Throughout our discussion, \(\mu\) will denote either the mean or median of \(F\) .

The np.loc.test function (in the nptest package) can be used to implement one-sample and paired sample location tests.

np.loc.test(x, y = NULL,                         alternative = c("two.sided", "less", "greater"),                         mu = 0, paired = FALSE, var.equal = FALSE,                         median.test = FALSE, symmetric = TRUE,                         R = 9999, parallel = FALSE, cl = NULL,                         perm.dist = TRUE)

Specifying the Four Options

Four different test statistics are available:

Note that the chosen test statistic relates to the type of test being conducted (e.g., mean or median test), as well as the assumptions made about the data (e.g., symmetric or skewed).

Student’s t test statistic (1908)

Student’s \(t\) test statistic has the form \[ T = \frac{\bar{Z} - \mu_0}{S / \sqrt{n}} \] where \(\bar{Z} = \frac{1}{n} \sum_{i = 1}^n Z_i\) is the sample mean, and \(S^2 = \frac{1}{n-1}\sum_{i = 1}^n (Z_i - \bar{Z})^2\) is the sample variance.

Johnson’s (skew-adjusted) t test statistic (1978)

Johnson proposed a correction to Student’s \(t\) test statistic for skewed data: \[ T = \frac{(\bar{Z} - \mu_0) + \frac{\hat{\mu}_3}{6 S^2 n} + \frac{\hat{\mu}_3}{3 S^4} (\bar{Z} - \mu_0)^2 }{S / \sqrt{n}} \] where \(\hat{\mu}_3 = \frac{1}{n-1}\sum_{i=1}^n (Z_i - \bar{Z})^3\) is the (estimated) third central moment of \(F\) .

Wilcoxon’s signed rank test statistic (1945)

Wilcoxon’s signed rank test statistic has the form \[ T = \frac{V - E(V)}{\sqrt{\mathrm{var}(V)}} \] where \(V = \sum_{i = 1}^n R_i \psi_i\) , \(R_i = \mathrm{rank}(|Z_i - \mu_0|)\) , and \(\psi_i = I(Z_i > \mu_0)\) . The expectataion and variance (under \(H_0\) ) have the form \(E(V) = n(n+1)/4\) and \(\mathrm{var}(V) = n(n+1)(2n+1)/24\) .

Fisher’s sign test statistic (1925)

Fisher’s sign test statistic has the form \[ T = \frac{S - E(S)}{\sqrt{\mathrm{var}(S)}} \] where \(S = \sum_{i = 1}^n \psi_i\) is the number of \(Z_i\) that are greater than \(\mu_0\) . The expectataion and variance (under \(H_0\) ) have the form \(E(S) = n/2\) and \(\mathrm{var}(S) = n/4\) .

Number of Possible Outcomes

For the one-sample (or paired sample) problem, there are \(M = 2^n\) possible outcomes:

Two possibilities for each \(Z_i\) : (1) \(Z_i > \mu_0\) or (2) \(Z_i < \mu_0\)

The \(Z_i\) are iid so each combination of \(\pm Z_i\) is possible

Define \(\tilde{Z}_i = Z_i − \mu_0\) to be the centered data for the \(i\) -th observation, which has mean (or median) zero under \(H_0\) . Note that we can write \[ \tilde{Z}_i = s_i | \tilde{Z}_i | \] where \(s_i = 2\psi_i - 1\) and \(\psi_i = I(Z_i > \mu_0)\) .

The \(M = 2^n\) possible outcomes correspond to all vectors of the form \[ \mathbf{s} = (s_1, s_2, \ldots, s_n)^\top \] where \(s_i \in \{-1, 1\}\) for \(i = 1,\ldots, n\) . Such vectors are referred to as “sign-flipping” vectors, given that \(s_i\) controls the sign of the centered score \(\tilde{Z}_i\) . Note that this is a form of conditional inference where we condition on the magnitudes of the centered scores, i.e., \(| \tilde{Z}_i | \ \forall i\) .

The flipn function (in the nptest package) can be used to obtain all \(M = 2^n\) possible sign-flipping vectors:

Exact and Approximate Null Distributions

Define \(\{\mathbf{s}_j\}_{j = 1}^M\) to be the collection of \(M = 2^n\) sign-flipping vectors for a given sample size \(n\) , where \(\mathbf{s}_j = (s_{1j}, \ldots, s_{nj})^\top\) . For example, if \(n = 3\) , then the \(j\) -th sign-flipping vector is the \(j\) -th column of the matrix returned by the flipn function (see above).

Let \(\tilde{Z}_{ij} = s_{ij} | \tilde{Z}_i |\) denote the \(j\) -th “sign-flipped” version of the \(i\) -th observation’s data, where \(s_{ij}\) is the \(i\) -th obervation’s sign for the \(j\) -th sign-flipping vector.

Let \(\tilde{\mathbf{Z}}_j = (\tilde{Z}_{1j}, \ldots, \tilde{Z}_{nj})^\top\) denote the sample of the \(n\) observations’ centered and sign-flipped data for the \(j\) -th sign-flipping vector.

Let \(\mathbf{Z}_j = (Z_{1j}, \ldots, Z_{nj})^\top\) where \(Z_{ij} = \mu_0 + \tilde{Z}_{ij}\) denotes the “uncentered” and sign-flipped data for the \(j\) -th sign-flipping vector.

The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{Z}_j)\) denotes the test statistic corresponding to the \(j\) -th sign-flipping vector. Note that \(T_j\) is obtained by applying the given test statistic function \(s(\cdot)\) to the \(j\) -th sign-flipped version of the data \(\mathbf{Z}_j\) .

An approximate null distribution is formed by calculating the test statistic for \(R\) randomly sampled sign-flipping vectors (plus the observed data vector), i.e., \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\) where \(T_j = s(\mathbf{Z}_j)\) denotes the test statistic corresponding to the \(j\) -th sign-flipping vector.

Necessary Assumptions

If \(F\) is symmetric around the median \(\mu\) , the test will be exact regardless of the chosen test statistic.

If \(F\) is a skewed distribution (so mean \(\neq\) median), the test will be…

inexact but asymptotically valid when \(\mu\) is the mean (i.e., median.test = FALSE ) for Johnson’s and Student’s test statistics

inexact and asymptotically invalid when \(\mu\) is the median (i.e., median.test = TRUE ) and Wilcoxon’s test statistic is used (i.e., symmetric = TRUE )

exact when \(\mu\) is the median (i.e., median.test = TRUE ) and Fisher’s test statistic is used (i.e., symmetric = FALSE )

Helwig, N. E. (2019). Statistical nonparametric mapping: Multivariate permutation tests for location, correlation, and regression problems in neuroimaging. WIREs Computational Statistics, 11(2), e1457. https://doi.org/10.1002/wics.1457

The simulation study manipulated two data generating factors:

the distribution for \(Z_i\) (see figure below)

the sample size \(n \in \{10, 25, 50, 100, 200\}\)

Figure 1: Six univariate distributions with mean \(\mu = 0\) and variance \(\sigma^2 = 1\) .

10,000 independent datasets (replications) were generated for each of the 30 (6 distribution × 5 n) cells of the simulation design. For each replication, the null hypothesis \(H_0: \mu = 0\) was tested using the alternative hypothesis \(H_1: \mu > 0\) and an \(\alpha = 0.05\) significance level.

The significance testing results were compared using two methods:

Student’s t test using the \(t_{n−1}\) distribution for inference

Permutation test using Student’s t test statistic

For the permutation tests, the number of resamples was set at \(R = \min(2^n, 9999)\) .

The type I error rate for each combination of test statistic and data generating condition is given in the figure below.

Figure 2: Type I error rate calculated across 10,000 replications for the \(t\) test (red squares) and the permutation test (black circles). The vertical bars denote approximate 99% confidence intervals, and the dotted horizontal lines denote the nominal rate of \(\alpha = 0.05\) .

The simulation results reveal that…

the \(t\) test and the permutation test perform similarly

for symmetric distributions, the results are exact for all \(n\)

for skewed distributions, the results are asymptotically valid

Data and Hypotheses

Consider the dataset in Example 3.1 from Nonparametric Statistical Methods, 3rd Ed. (Hollander et al., 2014).

Nine psychiatric patients were treated with a tranquilizer drug. Before and after the treatment (i.e., traquilizer drug), the patients’ suicidal tendencies were measured using the Hamilton Depression Scale (Factor IV).

Let \(X\) denote the pre-treatment score

Let \(Y\) denote the post-treatment score

Higher scores on the Hamilton Depression Scale (Factor IV) correspond to more suicidal tendencies. We want to test if the tranquilizer significantly reduced suicidal tendencies:

\(H_0: \mu = 0\) versus \(H_1: \mu > 0\)

\(\mu\) is the mean (or median) of \(Z = X - Y\)

Enter the data into R:

Results with Student’s Test Statistic

Student’s \(t\) test:

Permutation test using Student’s \(t\) statistic:

Note that both approaches use the same observed test statistic is \(T = 3.0354\) , but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0081\) is obtained by comparing \(T = 3.0354\) to a \(t_8\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)

NP test: \(p = 0.0137\) is obtained by enumerating the exact null distribution with \(2^n = 512\) possible outcomes

Plot the exact null distribution used for the nonparametric test:

Results with Wilcoxon’s Test Statistic

Wilcoxon’s signed rank test:

Permutation test using Wilcoxon’s signed rank statistic:

Note that both approaches use the same observed test statistic, but return the result on a different scale: \(T = (V - E(V)) / \sqrt{\mathrm{var}(V)}\) . In this case, the number of elements of the exact null distribution is manageable ( \(M = 512\) ), so both functions are implementing an exact test. Consequently, we obtain the same (exact) p-value from both functions.

Results with Fisher’s Test Statistic

Fishers’s sign test:

Permutation test using Fisher’s sign statistic:

Note that both approaches use the same observed test statistic, but return the result on a different scale: \(T = (S - E(S)) / \sqrt{\mathrm{var}(S)}\) . In this case, the number of elements of the exact null distribution is manageable ( \(M = 512\) ), so both functions are implementing an exact test. Consequently, we obtain the same (exact) p-value from both functions.

3 Two-Sample Location Tests

For the two-sample location problem, we have \(N = m + n\) observations

\(X_1, \ldots , X_m\) are an iid random sample from population \(F_X\)

\(Y_1, \ldots , Y_n\) are an iid random sample from population \(F_Y\)

\(Z_1, \ldots , Z_N = (X_1, \ldots , X_m, Y_1, \ldots , Y_n)\) denotes the combined sample

\(\psi_1, \ldots, \psi_N = (1, \ldots, 1, 0, \ldots, 0)\) denotes the group label vector ( \(\psi_k = 1\) if \(Z_k\) is an \(X\) )

The \(X_i\) observations are assumed to be independent samples from a continuous distribution \(F_X\) that depends on the location parameter \(\mu_X\) . Similarly, the \(Y_j\) observations are assumed to be independent samples from a continuous distribution \(F_Y\) that depends on the location parameter \(\mu_Y\) . The \(X_i\) and \(Y_j\) observations are assumed to be independent of one another for all combiations of \(i,j\) .

Null hypothesis is same location \(\Leftrightarrow H_0 : \mu_X = \mu_Y\)

Three possible alternatives: \(H_1 : \mu_X < \mu_Y\) , \(H_1 : \mu_X > \mu_Y\) , \(H_1 : \mu_X \neq \mu_Y\)

Note that \(\mu_X\) and \(\mu_Y\) will denote either the means or medians of \(F_X\) and \(F_Y\) , respectively.

The np.loc.test function (in the nptest package) can be used to implement two-sample location tests.

Note that the chosen test statistic relates to the type of test being conducted (e.g., mean or median test), as well as the assumptions made about the data (e.g., equal variance or not).

Student’s two-sample \(t\) test statistic has the form \[ T = \frac{\bar{X} - \bar{Y}}{S_p \sqrt{\frac{1}{m} + \frac{1}{n}}} \] where \(\bar{X} = \frac{1}{m} \sum_{i = 1}^n X_i\) and \(\bar{Y} = \frac{1}{n} \sum_{j = 1}^n Y_j\) are the sample means, and \[ S_p^2 = \frac{1}{N - 2} \left(\sum_{i=1}^m (X_i - \bar{X})^2 + \sum_{j=1}^n (Y_j - \bar{Y})^2 \right) \] is the pooled estimate of the (common) variance for the two populations.

Welch’s t test statistic (1938)

Welch proposed a modification of Student’s two-sample \(t\) test statistic for unequal variances: \[ T = \frac{\bar{X} - \bar{Y}}{\sqrt{\frac{S_X^2}{m} + \frac{S_Y^2}{n}}} \] where \(S_X^2 = \frac{1}{m - 1}\sum_{i=1}^m (X_i - \bar{X})^2\) and \(S_Y^2 = \frac{1}{n - 1}\sum_{j=1}^n (Y_j - \bar{Y})^2\) are the estimates of the variances for \(F_X\) and \(F_Y\) , respectively.

Wilcoxon Rank Sum test statistic (1945)

Wilcoxon’s rank sum test statistic has the form \[ T = \frac{W - E(W)}{\sqrt{\mathrm{var}(W)}} \] where \(W = \sum_{k = 1}^N R_k \psi_k\) , \(R_k = \mathrm{rank}(Z_k)\) , and \(\psi_k = I(Z_k \sim F_X)\) . The expectation and varaince of \(W\) (under \(H_0\) ) have the form \(E(W) = m(N+1)/2\) and \(\mathrm{var}(W) = mn (N + 1) / 12\) .

Studentized Wilcoxon test statistic (2016)

Chung and Romano (2016) proposed a modification of Wilcoxon’s rank sum test statistic for unequal variances: \[ T = \frac{W - E(W)}{\sqrt{\widetilde{\mathrm{var}}(W)}} \] where \(W\) and \(E(W)\) are the same as before, and the modified variance term is \[ \widetilde{\mathrm{var}}(W) = (mn)^2 \left(\frac{\xi_x^2}{m} + \frac{\xi_y^2}{n} \right) \] with the two components of the variance defined as \[ \begin{split} \xi_x^2 &= \frac{1}{m-1} \sum_{i=1}^m \left(\bar{U}_i - \frac{1}{m} \sum_{i=1}^m \bar{U}_i \right)^2 \\ \xi_y^2 &= \frac{1}{n-1} \sum_{j=1}^n \left(\bar{V}_j - \frac{1}{n} \sum_{j=1}^n \bar{V}_j \right)^2 \\ \end{split} \] where \(\bar{U}_i = \frac{1}{n}\sum_{j=1}^n I(Y_j \leq X_i)\) and \(\bar{V}_j = \frac{1}{m}\sum_{i=1}^m I(X_i < Y_j)\) .

For the two-sample problem, there are \(M = {m+n \choose m}\) possible outcomes given a sample of \(N = m + n\) observations.

Test statistic depends on which \(Z_k\) are labeled \(X\) versus \(Y\)

Need to choose \(m\) of the \(N\) values to receive \(X\) labels

Let \(Z_k\) denote the \(k\) -th observation from the combined sample, which has common mean (or median) \(\mu\) under \(H_0\) .

\((Z_k, \psi_k)\) is observed where \(\psi_k = 1\) if \(Z_k\) is an \(X\) and \(\psi_k = 0\) if \(Z_k\) is a \(Y\)

The \(\psi_k\) labels are arbitrary if the two populations are equivalent, i.e., if \(F_X = F_Y\)

The \(M\) possible outcomes correspond to the \({m + n \choose m}\) vectors of the form \[ \boldsymbol\psi = (\psi_1, \psi_2, \ldots, \psi_N) \] with \(\psi_k \in \{0,1\}\) for all \(k = 1,\ldots,N\) and \(\sum_{k = 1}^N \psi_k = m\) . Such vectors are referred to as “combination” vectors given that \(\psi_k\) describes the how the observations combine to form the two groups. Note that this is a form of conditional inference where we condition on the \(Z_k \ \forall k\) .

The combn function (in the utils package) can be used to obtain all \(M = {m + n \choose m}\) possible combination vectors:

Note: each column of the matrix returned by combn denotes which \(\psi_k\) are equal to 1 for the combination vector.

Define \(\{ \boldsymbol\psi_j \}_{j = 1}^M\) to be the collection of \(M = {m + n \choose m}\) combination vectors for the given sample sizes \((m, n)\) , where \(\boldsymbol\psi_j = (\psi_{1j}, \ldots, \psi_{Nj})\) . For example, if \(m = 3\) and \(n = 2\) , then the \(j\) -th combination vector contains 1’s in the locations denoted by the \(j\) -th column of the matrix returned by the combn function (see above) and zeros elsewhere.

Let \(\mathbf{Z} = (Z_1, \ldots, Z_N)^\top\) denote the combined vector of data

Let \(\boldsymbol\psi = (\psi_1, \ldots, \psi_N)^\top\) denote the observed combination vector

The observed test statistic can be written as a function \(T = s(\mathbf{Z}, \boldsymbol\psi)\)

The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{Z}, \boldsymbol\psi_j)\) denotes the test statistic corresponding to the \(j\) -th combination vector. Note that \(T_j\) is obtained by applying the given test statistic function \(s(\cdot, \cdot)\) to the data using the \(j\) -th combination vector.

An approximate null distribution is formed by calculating the test statistic for \(R\) random combination vectors (plus the observed combination vector), i.e., \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\) where \(T_j = s(\mathbf{Z}, \boldsymbol\psi_j)\) denotes the test statistic corresponding to the \(j\) -th combination vector.

If the location-shift model is true, i.e., if \(F_X(z - \delta) = F_Y(z)\) where \(\delta = \mu_1 - \mu_2\) is the location shift, then the permutation test is exact. This is because, assuming the location-shift model is true, the group labels are arbitrary under \(H_0\) : \(\delta = 0 \ \leftrightarrow \ F_X(z) = F_Y(z) \ \forall z\)

If the location-scale model is true, i.e., if \(F_X(z) = G([z - \mu_X] / \sigma_X)\) and \(F_Y(z) = G([z - \mu_Y] / \sigma_Y)\) , where \((\mu_X, \mu_Y)\) are the population-specific location parameters and \((\sigma_X^2, \sigma_Y^2)\) are the population-specific variance parameters, then the test will be

inexact but asymptotically valid when using the Welch or the studentized Wilcoxon test statistic (i.e., var.equal = FALSE)

inexact and asymptotically invalid when using the Student or Wilcoxon test statistic (i.e., var.equal = TRUE) … unless you get lucky

If the two groups differ more generally, the test will be

inexact but asymptotically valid using the Welch test statistic (for testing \(H_0: \mu_1 = \mu_2\) )

inexact but asymptotically valid using the Studentized Wilcoxon test statistic (for testing \(H_0: P(X < Y) = 1/2\) )

inexact and asymptotically invalid using the Student and Wilcoxon test statistics (i.e., var.equal = TRUE) … unless you get lucky

Note: in the more general scenario, the Wilcoxon test statistics (median.test = TRUE) are not testing \(H_0: \mu_1 = \mu_2\) , which is a common misconception. Instead, they are testing the null hypothesis \(H_0: P(X < Y) = 1/2\) , i.e., that the median of \(F_{X-Y}\) is equal to zero, where \(F_{X-Y}\) is the distribution of the difference score \(X - Y\) .

The simulation study manipulated three data generating factors:

the distribution for data (see top row of Figure 1)

the standard deviation of the second group: \(\sigma_2 \in \{0.5, 1, 2\}\)

the sample size of the first group: \(n_1 \in \{10, 25, 50, 100, 200\}\)

Throughout the simulation study…

the standard deviation of the first group was fixed at \(\sigma_1 = 1\)

the sample size of the second group was defined as \(n_2 = 2 n_1\)

10,000 independent datasets (replications) were generated for each of the 45 (3 distribution \(\times\) 3 \(\sigma_2\) \(\times\) 5 \(n_1\) ) cells of the simulation design. For each replication, the null hypothesis \(H_{0}: \mu_1 = \mu_2\) was tested using the alternative hypothesis \(H_{1}: \mu_1 > \mu_2\) and an \(\alpha = 0.05\) significance level.

The significance testing results were compared using four methods:

Student’s \(t\) test which assumes normality and equal variances

Welch’s \(t\) test which assumes normality and unequal variances

Permutation test using Student’s (pooled variance) \(T\) test statistic

Permutation test using Welch’s (unequal variance) \(T^*\) test statistic

For the permutation tests, the number of resamples was set at \(R = 9999\) .

Figure 3: Type I error rate calculated across 10,000 replications for the \(t\) test (red squares) and the permutation test (black circles). The unfilled points denote results using the (Student) \(T\) test statistic, whereas the filled points denote results using the (Welch) \(T^*\) test statistic. The vertical bars denote approximate 99% confidence intervals, and the dotted horizontal lines denote the nominal rate of \(\alpha = 0.05\) .

When \(\sigma_1 = \sigma_2\) (equal variance)

Permutation test produces valid results for all \(n\) for both Student and Welch test statistic

\(t\) test produces asymptotically valid results for skewed data (and valid results for normal data)

When \(\sigma_1 \neq \sigma_2\) (unequal variance)

For \(t\) and permutation tests, using Student’s test statistic produces invalid results

For \(t\) and permutation tests, using Welch’s test statistic produces asymptotically valid results

Consider the dataset in Example 4.2 from Nonparametric Statistical Methods, 3rd Ed. (Hollander et al., 2014).

\(N = 23\) patients are in an alcohol treatment program

\(m = 12\) assigned to Control condition ( \(X\) )

\(n = 11\) assigned to Social Skills Training ( \(Y\) )

The resposne variable is the post-treatment alcohol intake for 1 year following the program. The resposne is measured in centiliters of pure alcohol consumed throughout the duration of the year.

The goal is to test if the SST program reduced alcohol intake:

\(H_0: \mu_X = \mu_Y\) versus \(H_1: \mu_X > \mu_Y\) (median.test = FALSE)

\(H_0: P(X < Y) = 1/2\) versus \(H_1: P(X < Y) < 1/2 \Longleftrightarrow H_1: \mbox{median}(X-Y) > 0\) (median.test = TRUE)

R code to input the data:

Note that both approaches use the same observed test statistic is \(T = 3.9835\) , but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0003\) is obtained by comparing \(T = 3.9835\) to a \(t_{21}\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)

NP test: \(p = 0.0006\) is obtained by comparing \(T = 3.9835\) to the approximate null distribution with 10,000 elements

Plot the approximate null distribution used for the nonparametric test:

Results with Welch’s Test Statistic

Welch’s \(t\) test:

Permutation test using Welch’s \(t\) statistic:

Note that both approaches use the same observed test statistic is \(T = 3.9747\) , but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0004\) is obtained by comparing \(T = 3.9747\) to a \(t_{20.599}\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)

NP test: \(p = 0.0006\) is obtained by comparing \(T = 3.9747\) to the approximate null distribution with 10,000 elements

Wilcoxon’s rank sum test:

Permutation test using Wilcoxon’s rank sum statistic:

Note that both approaches use the same observed test statistic, but return the result on a different scale: \(T = (W - E(W)) / \sqrt{\mathrm{var}(W)}\) . In this case, the number of elements of the exact null distribution is large ( \(M\) = 1,352,078). The wilcox.test function is implementing an exact test (b/c there are less than \(N = 50\) observations and no ties). The np.loc.test is implementing an approximate test using \(R = 9999\) random samples of the \(M\) possible combination vectors. The p-values for the two approaches are quite similar:

\(t\) test: \(p = 0.0005\) is obtained by comparing \(W = 117\) to the exact null distribution

NP test: \(p = 0.0009\) is obtained by comparing \(T = 3.1388\) to the approximate null distribution with 10,000 elements

Results with Studentized Wilcoxon Test Statistic

Permutation test using studentized Wilcoxon’s rank sum statistic:

4 Association / Correlation Tests

Suppose we have paired data \((X_i,Y_i) \stackrel{\mathrm{iid}}{\sim}F_{XY}\) for \(i = 1, \ldots, n\) , where \(F_{XY}\) is some bivariate distribution.

The goal is to test whether or not \(X\) and \(Y\) correlated (or associated) with one another.

\(X\) and \(Y\) are independent if and only if \(F_{XY}(x,y) = F_X(x)F_Y(y)\)

If \(X\) and \(Y\) are correlated/associated, they are dependent

Null hypothesis is \(H_0: \rho = 0\) where \(\rho = \mbox{cor}(X,Y)\)

Different definitions of \(\rho\) measure different types of association

We will focus on testing the significance of the Pearson product moment correlation coefficient between \(X\) and \(Y\) , which has the form \[ \rho = \frac{\sigma_{XY}}{\sigma_X \sigma_Y} \] where \(\sigma_{XY} = E[(X - \mu_X) (Y - \mu_Y)]\) is the covariance between \(X\) and \(Y\) , \(\sigma_X^2 = E[(X - \mu_X)^2]\) is the variance of \(X\) , and \(\sigma_Y^2 = E[(Y - \mu_Y)^2]\) is the variance of \(Y\) . Note that the Pearson correlation coefficient measures the degree of linear association between \(X\) and \(Y\) .

The np.cor.test function (in the nptest package) can be used to implement Pearson correlation tests.

np.cor.test(x, y, z = NULL,                         alternative = c("two.sided", "less", "greater"),                         rho = 0, independent = FALSE, partial = TRUE,                         R = 9999, parallel = FALSE, cl = NULL,                         perm.dist = TRUE)

Note: the function can also be used to test partial and semi-partial (part) correlations between \(X\) and \(Y\) controlling for \(Z = (Z_1,\ldots,Z_q)\) .

Specifying the Two Options

Two different test statistics are available:

Note that the chosen test statistic relates to hypothesis being tested:

\(H_0: \rho = 0\) for independent = FALSE

\(H_0: F_{XY}(x,y) = F_X(x)F_Y(y)\) for independent = TRUE

Classic \(t\) test statistic (1908)

Student’s \(t\) test statistic for testing \(H_0: \rho = 0\) has the form \[ T = r \sqrt{\frac{n - 2}{1 - r^2}} \] where \[ r = \frac{S_{XY}}{S_X S_Y} = \frac{\sum_{i=1}^n (X_i - \bar{X}) (Y_i - \bar{Y})}{\sqrt{ \sum_{i=1}^n (X_i - \bar{X})^2 } \sqrt{ \sum_{i=1}^n (Y_i - \bar{Y})^2 }} \] is the sample correlation coefficient, \(S_{XY} = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}) (Y_i - \bar{Y})\) is the sample covariance, \(S_X^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2\) is the sample variance of \(X\) , and \(S_Y^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})^2\) is the sample variance of \(Y\) .

Pitman correlation statistic (1937)

Pitman (1937) proposed using the sample correlation coefficient \(r\) as the test statistic, which is equivalent to using \[ T = r \sqrt{n} \] as the test statistic. If \(H_0\) is true, this test statistic is asymptotically equivalent to Student’s \(t\) test statistic, given that \(r \rightarrow 0\) as \(n \rightarrow \infty\) when \(H_0\) is true.

Studentized \(t\) test statistic (2017)

DiCiccio and Romano (2017) proposed a modification of the test statistic that is appropriate for testing \(H_0: \rho = 0\) when the bivariate distribution \(F_{X,Y}\) is any continuous distrbution. The modified (or “studentized”) test statistic has the form \[ T = \frac{r \sqrt{n}}{\hat{\tau}} \] where \[ \hat{\tau}^2 = \frac{\frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2 (Y_i - \bar{Y})^2 }{ \left[ \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2 \right] \left[ \frac{1}{n} \sum_{i=1}^n(Y_i - \bar{Y})^2 \right] } \] is an estimate of the asymptotic variance of \(r \sqrt{n}\) under the assumption \(H_0: \rho = 0\) is true.

Under \(H_0\) , Pitman’s test statistic \(r \sqrt{n}\) asymptotically follows a normal distribution with mean \(0\) and variance \[ \tau^2 = \frac{E[(X - \mu_X)^2 (Y - \mu_Y)^2]}{\sigma_X^2 \sigma_Y^2} \] where \(\sigma_X^2 = E[(X - \mu_X)^2]\) is the variance of \(X\) and \(\sigma_Y^2 = E[(Y - \mu_Y)^2]\) is the variance of \(Y\) . Note that \(\tau^2 = 1\) if \(X\) and \(Y\) are independent, but \(\tau^2\) can take other values if \(X\) and \(Y\) are dependent and uncorrelated (which can occur for non-Gaussian data).

For the association / correlation problem, there are \(M = n!\) possible outcomes given a sample \(n\) observations.

Test statistic depends on which \(X_i\) are paired with which \(Y_i\)

There are \(n!\) possible ways to permute the \(Y_i\) observations

There should be no linear association between \(X\) and \(Y\) if \(H_0: \rho = 0\) is true.

\((X_i, Y_i)\) are the observed data, which is one possible pairing of \(X_i\) ’s with \(Y_i\) ’s

All possible pairings are equally likely if \(X\) and \(Y\) are independent of one another

The \(M\) possible outcomes correspond to the \(n!\) vectors of the form \[ \boldsymbol\pi = (\pi_1, \ldots, \pi_n) \] where the \(\pi_i\) are some permutation of the integers \(1,\ldots,n\) . Such vectors are referred to as “permutation vectors” given that the \(\pi_i\) control the ordering of the \(Y\) observations, i.e., \(Y_{\pi_i}\) is used to permute the \(Y\) values. Note that this is a form of conditional inference where we condition on the \(X_i\) and \(Y_i\) values.

Reminder: the permn function (in the nptest package) can be used to obtain all \(n!\) possible permutation vectors:

Define \(\{\boldsymbol\pi_j \}_{j = 1}^M\) to be the collection of \(M = n!\) permutation vectors for the given sample size \(n\) , where \(\boldsymbol\pi_j = (\pi_{1j}, \ldots, \pi_{nj})\) . For example, if \(n = 3\) , then the \(j\) -th permutation vector is the \(j\) -th column of the matrix output by the permn function (see above).

\(\mathbf{X} = (X_1, \ldots, X_n)\) and \(\mathbf{Y} = (Y_1, \ldots, Y_n)\) are the observed data vectors

\(\boldsymbol\pi = (\pi_1, \ldots, \pi_n)\) is the observed permutation vector (with \(\pi_i = i \ \forall i\) )

The observed test statistic can be written as a function \(T = s(\mathbf{X}, \mathbf{Y})\)

The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(\mathbf{X}, \mathbf{Y}_j)\) denotes the test statistic corresponding to the j-th permutation of the \(\mathbf{Y}\) vector. Note that \(T_j\) is obtained by applying the given test statistic function \(s(\cdot, \cdot)\) to the data using the \(j\) -th permutation \(\mathbf{Y}_j = (Y_{\pi_{1j}}, \ldots, Y_{\pi_{nj}})\) .

An approximate null distribution is formed by calculating the test statistic for \(R\) random permutation vectors (plus the observed permutation vector), i.e., \(\mathcal{T}_{R+1} = \{T_j\}_{j = 1}^{R+1}\) where \(T_j = s(\mathbf{X}, \mathbf{Y}_j)\) denotes the test statistic corresponding to the \(j\) -th permutation vector.

Using the studentized \(t\) test statistic (independent = FALSE), the test of \(H_0: \rho = 0\) will be (i) exact if \(X\) and \(Y\) are independent, and (ii) asymptotically valid otherwise.

Point (i) is because each possible pairing of \(X_i\) with \(Y_i\) is equally likely when \(X\) and \(Y\) are independent

Point (ii) is because \(r \sqrt{n}\) is asymptotically normal with mean 0 and variance \(\tau^2\) under \(H_0: \rho = 0\)

Using the classic \(t\) test statistic (independent = TRUE), the test of \(H_0: F_{XY}(x,y) = F_X(x) F_Y(y)\) will be (i) exact if \(X\) and \(Y\) are independent, and (ii) invalid if \(X\) and \(Y\) are dependent but uncorrelated.

Point (i) is the same reason as before (i.e., each pairing is equally likely)

Point (ii) is because the test uses a measure of linear association to quantify dependence

Note: rejecting \(H_0: F_{XY}(x,y) = F_X(x) F_Y(y)\) does not allow you to draw conclusions about the direction of the correlation \(\rho\) , even though the sample correlation is used for the test statistic.

These results require some basic regularity conditions for the distribution \(F_{XY}\) , i.e.,

\(E([X - \mu_X] [Y - \mu_Y]) < \infty\) , \(E([X - \mu_X]^2) < \infty\) , and \(E([Y - \mu_Y]^2) < \infty\)

\(E([X - \mu_X]^2 [Y - \mu_Y]^2) < \infty\) , \(E([X - \mu_X]^4) < \infty\) , and \(E([Y - \mu_Y]^4) < \infty\)

the bivariate distribution \(F_{XY}\) (see figure below)

Figure 4: Three bivariate distributions with \(\mu_X = \mu_Y = 0\) , \(\sigma_X^2 = \sigma_Y^2 = 1\) , and \(\rho_{XY} = 0\) .

10,000 independent datasets (replications) were generated for each of the 15 (3 distribution \(\times\) 5 \(n\) ) cells of the simulation design. For each replication, the null hypothesis \(H_0: \rho(X, Y) = 0\) was tested using the alternative hypothesis \(H_0: \rho(X, Y) > 0\) and an \(\alpha = 0.05\) significance level.

classic \(t\) test assuming bivariate normality

normal approximation using studentized \(t\) test statistic

permutation test using classic \(t\) test statistic

permutation test using studentized \(t\) test statistic

Figure 5: Type I error rate calculated across 10,000 replications for the parametric tests (red squares) and the permutation tests (black circles). The unfilled points denote the results using the (classic) \(T\) test statistic, whereas the filled points denote the results using the (studentized) \(T^*\) test statistic. The vertical bars denote approximate 99% confidence intervals, and the dotted horizontal lines denote the nominal rate of \(\alpha = 0.05\) .

Circular Uniform ( \(\tau < 1\) ): using classic \(t\) test or Pitman’s permutation test produces invalid results (error rates are too small), whereas using the studentized \(t\) test statistic produces asymptotically valid results.

MVN ( \(\tau = 1\) ): all methods perform equally well (b/c \(X\) and \(Y\) are independent).

MVT ( \(\tau > 1\) ): using classic \(t\) test or Pitman’s permutation test produces invalid results (error rates are too large), whereas using the studentized \(t\) test statistic produces asymptotically valid results.

Consider the dataset in Table 8.5 from Nonparametric Statistical Methods, 3rd Ed. (Hollander et al., 2014).

The dataset contains psychological test scores from \(n = 13\) pairs of twins.

  • \(X_i\) is first twin from \(i\) -th pair, and \(Y_i\) is second twin from \(i\) -th pair

Goal: test if the test scores are positively associated with one another:

\(H_0: \rho(X,Y) = 0\) versus \(H_1: \rho(X,Y) > 0\) (independent = FALSE)

\(H_0: F_{XY}(x,y) = F_X(x) F_Y(y)\) versus \(H_1: \rho(X,Y) > 0\) (independent = TRUE)

Results using Classic Test Statistic

Classic \(t\) test statistic:

Permutation test using classic \(t\) statistic:

Note that both approaches use the same observed test statistic is \(T = 2.8284\) , but the two approaches produce slightly different p-values:

\(t\) test: \(p = 0.0082\) is obtained by comparing \(T = 2.8284\) to a \(t_{11}\) distribution (note that \(t_\nu\) is Student’s \(t\) distribution with \(\nu\) d.f.)

NP test: \(p = 0.0091\) is obtained by comparing \(T = 2.8284\) to the approximate null distribution with 10,000 elements

Results using Studentized Test Statistic

Permutation test using studentized \(t\) statistic:

5 Regression Coefficient Tests

Consider a linear model of the form \[ Y_i = \alpha + \sum_{j = 1}^p \beta_j X_{ij} + \epsilon_i \quad \leftrightarrow \quad Y = \alpha + X \beta + \epsilon \] or a linear model of the form \[ Y_i = \alpha + \sum_{j = 1}^p \beta_j X_{ij} + \sum_{k = 1}^q \gamma_k Z_{ik} + \epsilon_i \quad \leftrightarrow \quad Y = \alpha + X \beta + Z \gamma + \epsilon \] where the goal is to test the null hypothesis \(H_0: \beta = \beta_0\) versus the alternative hypothesis \(H_1: \beta \neq \beta_0\) . Note that the covariates in \(Z\) are being “controlled for” (or “conditioned on”) while testing the effects of the variables in \(X\) .

The np.reg.test function (in the nptest package) can be used to implement nonparametric tests of regression coefficients.

np.reg.test(x, y, z = NULL, method = NULL,                         beta = NULL, homosced = FALSE,                         R = 9999, parallel = FALSE, cl = NULL,                         perm.dist = TRUE)

Note that the chosen test statistic relates to assumptions about the variance of the error terms:

\(F\) statistic assumes \(\mathrm{var}(\epsilon_i) = \sigma^2\)  (homoscedastic)

\(W\) statistic assumes \(\mathrm{var}(\epsilon_i) = \sigma_i^2\)  (heteroscedastic)

Least Squares Estimates

The linear model can be rewritten as \[ Y = \alpha + M \psi + \epsilon \] where \(M = (X, Z)\) denotes the combined design matrix, and \(\psi = \left[ \begin{smallmatrix} \beta \\ \gamma \end{smallmatrix} \right]\) denotes the combined coefficient vector. Note that we can write \[ \beta = S^\top \psi \] where the selection matrix \(S = \left( \begin{smallmatrix} I \\ 0 \end{smallmatrix} \right)\) returns the \(\beta\) portion of \(\psi\) .

The least squares estimate of the coefficient vector has the form \[ \hat{\psi} = \left[ \begin{smallmatrix} \hat{\beta} \\ \hat{\gamma} \end{smallmatrix} \right] = \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1} M_{\mathrm{c}}^\top Y \] where \(M_{\mathrm{c}} = C M\) is the columnwise mean centered version of \(M\) , and the matrix \(C = I - n^{-1} 1 1^\top\) is a centering matrix.

Note: the least squares estimate of the intercept has the form \[ \hat{\alpha} = \bar{Y} - \bar{M}^\top \hat{\psi} \] where \(\bar{Y}\) is the mean of the response, and \(\bar{M}\) is the mean predictor vector.

Classic \(F\) test statistic

The classic \(F\) test statistic has the form \[ F = \frac{1}{p} \left( \hat{\beta} - \beta_0 \right)^\top \hat{\Sigma}_{\hat{\beta}}^{-1} \left( \hat{\beta} - \beta_0 \right) \] where

\(\hat{\Sigma}_{\hat{\beta}} = S^\top \hat{\Sigma}_{\hat{\psi}} S\) is the (estimated) covariance matrix of \(\hat{\beta}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma^2\)

\(\hat{\Sigma}_{\hat{\psi}} = \hat{\sigma}^2 \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1}\) is the (estimated) covariance matrix of \(\hat{\psi}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma^2\)

\(\hat{\sigma}^2 = \frac{1}{n-r-1} \| \hat{\epsilon} \|^2\) is the error variance estimate assuming \(\mathrm{var}(\epsilon_i) = \sigma^2\)

\(\hat{\epsilon} = Y - \hat{Y}\) is the residual vector and \(\hat{Y} = \hat{\alpha} + M \hat{\psi}\) is the fitted value vector.

If the errors are iid \(N(0, \sigma^2)\) , then the \(F\) statistic follows an \(F_{p, n - r - 1}\) distribution under \(H_0\) .

Robust \(W\) test statistic

The robust Wald test statisic has the form \[ W = \left( \hat{\beta} - \beta_0 \right)^\top \hat{\Omega}_{\hat{\beta}}^{-1} \left( \hat{\beta} - \beta_0 \right) \] where

\(\hat{\Omega}_{\hat{\beta}} = S^\top \hat{\Omega}_{\hat{\psi}} S\) is the (estimated) asymptotic covariance matrix of \(\hat{\beta}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma_i^2\)

\(\hat{\Omega}_{\hat{\psi}} = \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1} M_{\mathrm{c}}^{^\top} D_{\hat{\epsilon}} M_{\mathrm{c}} \left( M_{\mathrm{c}}^{^\top} M_{\mathrm{c}} \right)^{-1}\) is the (estimated) asymptotic covariance matrix of \(\hat{\psi}\) assuming \(\mathrm{var}(\epsilon_i) = \sigma_i^2\)

\(D_{\hat{\epsilon}} = \mathrm{diag}(\hat{\epsilon}_1^2, \ldots, \hat{\epsilon}_n^2 )\) is a diagonal matrix containing the squared residuals

\(D_{\hat{\epsilon}} = \mathrm{diag}((Y_1 - \bar{Y})^2, \ldots, (Y_n - \bar{Y})^2 )\) if there are no covariates in the model

Under a variety of conditions, the \(W\) statistic asymptotically follows a \(\chi_p^2\) distribution under \(H_0\) .

For the regression problem, there are \(M = n!\) possible outcomes given a sample \(n\) observations.

Test statistic depends on which \(X_i = (X_{i1}, \ldots, X_{ip})^\top\) are paired with which \(Y_i\)

There are \(n!\) possible ways to permute the rows of \(X\) .

If errors are symmetric, can permute and resign (so \(M = n! 2^n\) ).

There should be no linear association between \(X\) and \(Y\) if \(H_0: \beta = 0\) is true.

The \(M\) possible outcomes correspond to the \(n!\) vectors of the form \[ \boldsymbol\pi = (\pi_1, \ldots, \pi_n) \] where the \(\pi_i\) are some permutation of the integers \(1,\ldots,n\) . Such vectors are referred to as “permutation vectors” given that the \(\pi_i\) control the ordering of the rows of \(X\) .

With no covariates ( \(Y = \alpha + X \beta + \epsilon\) ), the exact null distribution is formed by calculating the test statistic for all possible permutations of \(Y\) .

The observed test statistic can be written as a function \(T = s(X, Y)\)

The exact null distribution is given by \(\mathcal{T} = \{T_j\}_{j = 1}^M\) where \(T_j = s(X, Y_j)\) denotes the test statistic corresponding to the \(j\) -th permutation of the \(Y\) vector.

With covariates ( \(Y = \alpha + X \beta + Z \gamma + \epsilon\) ), there are many possible permutation strategies (see Table 1).

Reproduction of Table 1 from Helwig (2019b).

With no covariates ( \(Y = \alpha + X \beta + \epsilon\) ), we need the following assumptions:

\(\{ (X_i, Y_i) \}_{i = 1}^n\) are an iid sample from some distribution \(F_{XY}\) that satisfies the assumed linear model with \(E(\epsilon_i X_i) = 0\) .

\(E(\epsilon_i) = 0\) but \(E(\epsilon_i | X_i)\) may be non-zero because \(\epsilon_i\) may depend on \(X_i\)

With covariates ( \(Y = \alpha + X \beta + Z \gamma + \epsilon\) ), we need the following assumptions:

\(\{ (X_i, Z_i, Y_i) \}_{i = 1}^n\) are an iid sample from some distribution \(F_{XZY}\) that satisfies the assumed linear model.

\(E(\epsilon_i) = 0\) and \(E(\epsilon_i | X_i, Z_i) = 0\) (i.e., errors have expectation zero)

In both cases, we make some basic regularity assumptions about the data:

The matrices \(M_{\mathrm{c}}^{^\top} M_{\mathrm{c}}\) and \(M_{\mathrm{c}}^{^\top} D_{\hat{\epsilon}} M_{\mathrm{c}}\) are almost surely invertible, and their expectations are nonsingular

The response and predictor variables have finite fourth moments

Helwig, N. E. (2019). Robust nonparametric tests of general linear model coefficients: A comparison of permutation methods and test statistics. NeuroImage, 201, 116030. https://doi.org/10.1016/j.neuroimage.2019.116030

The simulation study manipulated four data generating conditions:

the distribution for data (MVN and MVT with \(\nu = 5\) )

the correlation between \(X\) and \(Z\) (3 levels: \(\rho \in \{0, 1/3, 2/3\}\) )

the number of observations: \(n \in \{10, 25, 50, 100, 200 \}\)

the true \(\beta\) coefficient (2 levels: \(\beta \in \{0, 1/4\}\) )

The multivariate normal (MVN) condition is used to explore the methods when the error terms are homoscedastic, whereas the MVT (with \(\nu = 5\) ) is included to compare the methods with heteroscedastic error terms (see Appendix A of Helwig, 2019b).

The \(\beta = 0\) condition is used to explore each method’s type I error rate, whereas the \(\beta = 1/4\) condition is used to explore each method’s power.

10,000 independent datasets (replications) were generated for each of the 60 (2 distribution \(\times\) 3 \(\rho\) \(\times\) 5 \(n\) \(\times\) 2 \(\beta\) ) cells of the simulation design. For each replication, the null hypothesis \(H_0: \beta = 0\) was tested using the alternative hypothesis \(H_0: \beta \neq 0\) and an \(\alpha = 0.05\) significance level.

The significance testing results were compared using 18 different approaches:

all combinations of 2 test statistics with 8 permutation methods (16 possibilities)

two corresponding parameteric tests (2 possibilities)

The type I error rate ( \(\beta = 0\) ) for each combination of test statistic and data generating condition is given in the figure below.

Figure 5: Type I error rates for various simulation conditions. PA = parametric.

The plot of the type I error rates reveals that…

The \(F\) statistic is exact for MVN data ( \(\epsilon_i\) homoscedastic), but invalid for MVT data ( \(\epsilon_i\) heteroscedastic)

The \(W\) statistic is exact for MVN data ( \(\epsilon_i\) homoscedastic), and is asymptotically valid for MVT data ( \(\epsilon_i\) heteroscedastic)

The Still-White (SW) permutation method is invalid when \(X\) and \(Z\) are correlated

Otherwise the different permutation methods perform similarly to one another

The power ( \(\beta = 1/4\) ) for each combination of test statistic and data generating condition is given in the figure below.

Figure 6: Statistical power for various simulation conditions. PA = parametric.

The plot of the power for each approach reveals that…

Power increased as \(n\) increased for both \(F\) and \(W\) statistics

Correlation between \(X\) and \(Z\) reduced the power for both \(F\) and \(W\) statistics

For MVN data, using \(W\) statistic led to slightly reduced power (compared to \(F\) statistic), but the difference disappeared as \(n\) increased

For MVT data, the power of the \(W\) statistic was slightly reduced compared to the MVN data

We will use the SAT and College GPA example from http://onlinestatbook.com/2/case_studies/sat.html

The dataset contains information from \(n = 105\) students that graduated from a state university with a B.S. degree in computer science.

high_GPA = High school grade point average

math_SAT = Math SAT score

verb_SAT = Verbal SAT score

comp_GPA = Computer science grade point average

univ_GPA = Overall university grade point average

R code to read-in and look at the data

Is University GPA Linearly Related to High School GPA?

Consider the simple linear regression model \[ Y = \alpha + X \beta + \epsilon \] where \(Y\) is the University GPA and \(X\) is the high school GPA.

Testing \(H_0: \beta = 0\) versus \(H_1: \beta \neq 0\)

Classic \(F\) test statistic:

Permutation test with \(F\) test statistic:

Permutation test with \(W\) test statistic:

Note that the classic approach uses the same \(F\) test statistic as the permutation test that assumes homoscedasticity ( \(F = 159.5667\) ). In this case, all three tests produce similar results: we reject the null hypothesis of no linear relationship.

Do SAT Scores Help Predict University GPA?

Consider the multiple linear regression model \[ Y = \alpha + X_1 \beta_1 + X_2 \beta_2 + Z \gamma + \epsilon \] where \(Y\) is the University GPA, \(X_1\) is the Math SAT score, \(X_2\) is the Verbal SAT score, and \(Z\) is the high school GPA.

Testing \(H_0: \beta_1 = \beta_2 = 0\) versus \(H_1: (\beta_1 \neq 0) \mbox{ and/or } (\beta_2 \neq 0)\)

Interestingly, the methods using the \(F\) test statistic produce a p-value that would fail to reject the null using a standard significance level. In contrast, the permutation test using the robust \(W\) test statistic produces a p-value of \(p = 0.0283\) , which would reject the null using a standard \(\alpha = 0.05\) significance level. The difference between the results seems to be because the error variances are heteroscedastic.

Visualize the heteroscedasticity in the data

Six Sigma Study Guide

Six Sigma Study Guide

Study notes and guides for Six Sigma certification tests

Ted Hessing

Non-Parametric Hypothesis Tests and Data Analysis

Posted by Ted Hessing

Non-parametric tests, as their name tells us, are statistical tests without parameters. For these types of tests, you need not characterize your population’s distribution based on specific parameters. Non-parametric tests are also referred to as distribution-free tests due to the fact that they are based n fewer assumptions (e.g. normal distribution).

“This always reminds me of the Ghostbuster’s scene when they get their first call and head into the hotel where the manager says ‘I want this thing taken care of quickly!’ Venkmann of course replies ‘Hold on, we don’t even know what you have yet.’”

Parametric tests involve specific probability distribution and the tests involve estimation of the key parameters of that distribution. Whereas, non-parametric tests are particularly for testing hypotheses, whose data is usually non-normal and resists transformation of any kind. Due to the lesser amount of assumptions needed, these tests are relatively easier to perform. They are also more robust.

When to use Non-parametric testing?

Non-parametric methods can be used to study data that are ranked in order but has no or little clear numerical interpretation.

Due to the small number of assumptions involved, non-parametric tests have a wide range of applications, especially where there is only a small amount of information available about the application in question.

For data to give you reliable results with non-parametric tests it should not follow a normal distribution. A common test to check that is the Anderson-Darling Test which helps us determine the type of distribution the data may follow. Perform a non-parametric test, If the test result is statistically significant and the data does not follow a normal distribution.

Perform non-parametric tests easily where some of the situations when the data is not following a normal distribution:

  • When the outcome is a rank or an ordinal variable – For example in the case of movie ranking etc.
  • In case there are a number of explicit outliers – The samples may show a continuous pattern with some very extreme-ended outliers.
  • When the outcome has a clear limit of detection – This means that the outcome has with some limitations or imprecision.

Applications of Non-parametric tests

  • When data does not follow parametric test conditions
  • Where you need quick data analysis
  • Whose data is usually non-normal and resists transformation of any kind.
  • When the sample size is too small

Assumptions of Non-parametric tests

Usually, we don’t assume that the data is following the normal distribution while we are performing the nonparametric test; however, that does not mean we don’t have any assumption about the non-parametric test.

  • Samples are independent and derived from the same population
  • Need to have an equal shape and spread for two sample designs

Types of Non-parametric Tests

There are many types of non-parametric tests. The following are a few:

Sign Test – It is a rudimentary test that can be applied when the typical conditions for the single sample t-test are not met. The test itself is very simple and involves doing a binomial test on the signs.

Mood’s Median Test (for two samples) – This is a rudimentary two-sample version of the above-mentioned sign test. It is to estimate whether the median of any two independent samples is equal. This test can be applied to more than two samples.

Wilcoxon Signed-Rank Test for a Single Sample – If the requirements for the t-test are not fulfilled then this test can be used only if the two dependent samples to be used have been derived from populations with an ordinal distribution. This is also a rudimentary test. It has two subtypes: the exact test and the advanced one.

Mann-Whitney Test for Independent Samples – This is also an alternative version of the t-test for two independent populations. This test is completely equivalent and resembles the Wilcoxon test in some ways. This test has three types: the exact test, the median confidence interval, and the advanced one.

Wilcoxon Signed-Rank Test for Paired Samples – This test is mainly an alternative to the t-test for paired samples i.e. if the requirements for the two paired t-tests are not satisfied then we can easily perform this test. It has two methods: the exact one and the advanced one.

hypothesis testing example non parametric

McNemar Test –

This test is basically a type of matched pair test and is used to analyze data before and after an event has occurred. It tells us whether there is a significant change in the data before and after the occurrence of any said event. Use McNemar’s Test with paired samples where the dependent variable is dichotomous.

Runs Test – This test is to determine whether the sequence of a series of events is random or not. Use one or two sample types depending on the data available at hand and the resources available. The two-sample test determines whether the two samples come from the same distribution of data or not.

Resampling Procedures – Works on the assumption that the original population distribution is the same as in the given sample. This helps us create a large number of samples from this pseudo-population and then in end draw valuable conclusions.

Additional Non-Parametric Hypothesis Tests

Apart from the above non-parametric test, some of the other examples of non-parametric tests used in our everyday lives are the Chi-square Test of Independence , Kolmogorov-Smirnov (KS) test, Kruskal-Wallis Test , Mood’s Median Test , Spearman’s Rank Correlation, Kendall’s Tau Correlation, Friedman Test and the Cochran’s Q Test.

Also, see one and two sample proportion non-parametric hypothesis tests , 1 Sample Sign Non Parametric Hypothesis Test ,

Advantages of Non-parametric tests

  • Non-parametric tests are distribution free
  • An added advantage is the reduction in the effect of outliers and variance heterogeneity on our results.
  • It can be applied to nominal (such as sex, race, employment status, etc.) or ordinal scaled data ( on a 1-10 scale, 10 being delighted and 1 being extremely dissatisfied)
  • Computations are easier than the parametric test
  • Easy to understand and less time-consuming especially when the sample size is small

Disadvantages of Non-parametric tests

  • The results that they provide may be less efficient or powerful compared to the results provided by the parametric tests.
  • Non-parametric tests are useful and important in many cases, but it is difficult to compute manually.
  • Results are usually more difficult to interpret than parametric tests
  • Use Non-parametric tests for correlation studies
  • Tests for equality of population medians – Mood’s Median, Mann Whitney, and Kruskal Wallis
  • Non-Parametric Test of equality of population variances – Levene’s Test
  • Levene’s test – makes an evaluation using a t-test http://en.wikipedia.org/wiki/Levene’s_test  (Levene’s test of equal variances)

Non-Parametric test videos

Six Sigma Black Belt Certification Non-Parametric Tests and Data Analysis Questions:

Question: A black belt would use non-parametric statistical methods when:

(A) knowledge of the underlying distribution of the population is limited (B) the measurement scale is either nominal or ordinal (C) the statistical estimation is required to have higher assurance (D) management requires substantial statistical analysis prior to implementing

Unlock Additional Members-only Content!

Thank you for being a member.

A: knowledge of the underlying distribution of the population is limited.

You use non-parametrics when you can’t identify or assume what kind of distribution you have so A is the easy choice. Also, you can eliminate b, c, and d as they have no bearing on the problem.

Comments (4)

I would believe the correct answer is B not A.

I can run a normality test to know if the data follow normal distribution or not, and that’s enough information to whether use non-parametric statistical methods.

However if the measurement scale is either ordinal or nominal, then by definition I have to use non-parametric statistical methods.

Thanks, Ahmed

Thanks for the question. This question comes from ASQ’s published practice exam and that is their provided answer.

I think it comes down to which is the best possible answer. While you are right in terms of B, A is the better answer as it is more encompassing.

Thanks for reaching out. I moved this conversation to the private member’s forum. I’ll follow up there.

As a reminder, please only discuss the questions from the practice exams in the member’s forum.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Insert/edit link

Enter the destination URL

Or link to existing content

Parametric and Non-Parametric Tests

Abdisalam hassan muse (phd).

This primer provides an overview of 25 different hypothesis testing methods, including parametric and non-parametric tests, using reproducible R software. The tests are categorized based on research questions, including analysis of effects, analysis of association, analysis of difference, and analysis of dependency.

For each test, we provide a definition, hypothesis, use, and real-life applications in research. We also present R code examples to illustrate the hypothesis testing process, including data preparation, test selection, test execution, and result interpretation.

In the analysis of effects category, we cover tests such as the t-test, ANOVA, and MANOVA, which are used to determine the significance of a treatment or intervention. In the analysis of association category, we cover tests such as the Pearson correlation and chi-squared tests, which are used to determine the relationship between two variables.

In the analysis of difference category, we cover tests such as the Wilcoxon signed-rank, Kruskal-Wallis, and Friedman tests, which are used to determine the difference between two or more groups. In the analysis of dependency category, we cover tests such as the McNemar and Cochran’s Q tests, which are used to determine the dependency between two categorical variables.

The primer emphasizes the importance of reproducibility in hypothesis testing and demonstrates how to achieve this using R. We also discuss the assumptions and limitations of each test and provide guidance on how to choose the appropriate test based on the research question and data type.

Overall, this primer provides a practical guide to hypothesis testing using R, suitable for researchers and data analysts at all levels. The primer covers a wide range of tests and provides R code examples that can be easily adapted to suit individual research needs.

Module 7: Statistical Hypothesis Testing

Statistical tools and software’s for hhypothesis testing:

PARAMETRIC TESTS

Dependent t-test.

  • Definition: a test that compares the means of two related groups (e.g., pre-treatment and post-treatment) to determine whether there is a significant difference.
  • Assumptions: normality of the differences, homogeneity of variance.
  • Application: comparing the effectiveness of a new drug treatment by measuring the pre-treatment and post-treatment blood pressure of patients.
  • Real-life example: comparing the average commute times of employees before and after a change in the company’s transportation policy.

Independent t-test

  • Definition: a test that compares the means of two independent groups to determine whether there is a significant difference.
  • Assumptions: normality of the data, homogeneity of variance.
  • Application: comparing the effectiveness of two different brands of a pain reliever by measuring the pain levels of patients who receive each brand.
  • Real-life example: comparing the average sales figures of two different stores that sell the same product.

Paired z-test

  • Definition: a test that compares the means of two related groups (e.g., pre-treatment and post-treatment) to determine whether there is a significant difference, using the normal distribution.
  • Assumptions: normality of the differences, known population standard deviation.
  • Application: comparing the effectiveness of a new diet plan by measuring the pre-diet and post-diet weights of participants.
  • Real-life example: comparing the average scores of students before and after a tutoring program.

Unpaired z-test

  • Definition: a test that compares the means of two independent groups to determine whether there is a significant difference, using the normal distribution.
  • Assumptions: normality of the data, known population standard deviation.
  • Application: comparing the effectiveness of two different teaching methods by measuring the test scores of students who receive each method.
  • Real-life example: comparing the average salaries of male and female employees in a company.

One-way ANOVA

  • Definition: a test that compares the means of three or more independent groups to determine whether there is a significant difference.
  • Application: comparing the effectiveness of three different types of fertilizers by measuring the yield of crops grown with each fertilizer.
  • Real-life example: comparing the average customer satisfaction scores for three different airlines.

Two-way ANOVA

  • Definition: a test that compares the means of two or more independent groups, considering the effects of two or more categorical variables.
  • Application: comparing the effectiveness of two different advertising campaigns for three different products.
  • Real-life example: comparing the average salaries of employees in different departments of a company, considering the effects of both job title and years of experience.
  • Definition: a test that compares the means of two or more groups on two or more dependent variables.
  • Assumptions: normality of the data, homogeneity of variance-covariance matrices.
  • Application: comparing the effectiveness of three different treatments for a particular medical condition, measuring both pain levels and quality of life.
  • Real-life example: comparing the average scores of students on multiple tests across different subjects.

Definition: a test that compares the means of two or more groups on a dependent variable, while controlling for the effects of one or more continuous variables. Assumptions: normality of the data, homogeneity of regression slopes, homogeneity of variance. * Application: comparing the effectiveness of two different teaching methods, while controlling for the effect of student age. * Real-life example: comparing the average salaries of employees in different departments of a company, while controlling for the effect of years of experience.

  • Definition: a test that compares the means of two or more groups on two or more dependent variables, while controlling for the effects of one or more continuous variables.
  • Assumptions: normality of the data, homogeneity of regression slopes, homogeneity of variance-covariance matrices.
  • Application: comparing the effectiveness of two different training programs on multiple measures of job performance, while controlling for the effect of years of experience.
  • Real-life example: comparing the average test scores of students across multiple subjects, while controlling for the effect of socioeconomic status. Overall, these parametric tests provide useful tools for data analysis in a wide range of applications. By understanding their definitions, assumptions, and real-life examples, researchers can choose the appropriate test for their data and draw valid conclusions from their analyses.

Real-life Applications using R software

Examples of how to perform each of the tests listed above using R software:

NON-PARAMETRIC TESTS:

List of some common non-parametric tests, along with their alternative, application, and real-life example:

Wilcoxon rank-sum test (Mann-Whitney U test)

  • Alternative: t-test
  • Application: comparing the medians of two independent groups when the assumptions of normality or equal variances are not met.
  • Real-life example: comparing the prices of two different brands of the same product when the prices are not normally distributed.

Wilcoxon signed-rank test

  • Alternative: paired t-test
  • Application: comparing the medians of two dependent groups when the assumptions of normality or equal variances are not met.
  • Real-life example: comparing the pre- and post-treatment scores of patients in a clinical trial when the scores are not normally distributed.

Kruskal-Wallis test

  • Alternative: one-way ANOVA
  • Application: comparing the medians of three or more independent groups when the assumptions of normality or equal variances are not met.
  • Real-life example: comparing the salaries of employees across three different departments of a company when the salaries are not normally distributed.

Friedman test

  • Alternative: repeated measures ANOVA
  • Application: comparing the medians of three or more dependent groups when the assumptions of normality or equal variances are not met.
  • Real-life example: comparing the satisfaction scores of customers for three different products over time when the scores are not normally distributed.

Spearman rank correlation

  • Alternative: Pearson correlation
  • Application: measuring the strength of association between two variables when one or both variables are ordinal.
  • Real-life example: measuring the correlation between the rankings of different restaurants based on their ratings and the number of customers they have.

Chi-squared test

  • Alternative: z-test or t-test for proportions
  • Application: testing the independence of two categorical variables.
  • Real-life example: determining whether there is a significant association between smoking status and lung cancer diagnosis.

Wilcoxon-Mann-Whitney test

Alternative: t-test or ANOVA

  • Application: comparing the distribution of two independent samples when the assumptions of normality or equal variances are not met.
  • Real-life example: comparing the distributions of the size of fish caught in two different fishing spots.

Kolmogorov-Smirnov test

  • Application: testing whether two or more samples are drawn from the same distribution.
  • Real-life example: determining whether the heights of male and female students in a school are drawn from the same distribution.

Permutation test

  • Application: testing the significance of a difference between two or more groups by randomly permuting the labels of the observations.
  • Real-life example: determining whether there is a significant difference in the mean scores of students in two different schools on a standardized test.
  • Application: testing whether the median of a sample is equal to a specified value.
  • Real-life example: testing whether the median age of patients in a hospital is equal to 50 years.

Kendall rank correlation

  • Application: measuring the strength of association between two variables when one or both variables are ordinal, and assessing the degree of similarity of rankings between two or more judges or raters.
  • Real-life example: measuring the correlation between the rankings of different cities based on their quality of life scores.
  • Application: testing for randomness or independence in a sequence of observations, by counting the number of runs (i.e., consecutive increasing or decreasing values) in the sequence.
  • Real-life example: testing whether the sequence of daily stock prices for a particular stock follows a random pattern.

Siegel-Tukey test

  • Application: testing for differences in dispersion or variance between two or more groups, by comparing the range of the data within each group.
  • Real-life example: comparing the variability of the salaries of employees in different departments of a company.

Mood’s median test

Alternative: t-test or ANOVA Application: testing for differences in medians between two or more groups, by comparing the medians of the data within each group. * Real-life example: comparing the median ages of participants in different treatment groups in a clinical trial.

Cramer-von Mises test

Application: testing whether a sample comes from a specified distribution, by comparing the empirical cumulative distribution function (CDF) to the theoretical CDF.

Real-life example: testing whether the distribution of heights of students in a school follows a normal distribution.

Overall, non-parametric tests provide useful alternatives to parametric tests and can be used in a wide range of applications. By understanding the alternatives, applications, and real-life examples of non-parametric tests, researchers can choose the appropriate test for their data and draw valid conclusions from their analyses.

Real-life Examples using R software

Examples of how to perform each of the 15 non-parametric tests using R software:

Thanks for your attention

logo image missing

  • > Statistics

Non-Parametric Statistics: Types, Tests, and Examples

  • Pragya Soni
  • May 12, 2022

Non-Parametric Statistics: Types, Tests, and Examples title banner

Statistics, an essential element of data management and predictive analysis , is classified into two types, parametric and non-parametric. 

Parametric tests are based on the assumptions related to the population or data sources while, non-parametric test is not into assumptions, it's more factual than the parametric tests. Here is a detailed blog about non-parametric statistics.

What is the Meaning of Non-Parametric Statistics ?

Unlike, parametric statistics, non-parametric statistics is a branch of statistics that is not solely based on the parametrized families of assumptions and probability distribution. Non-parametric statistics depend on either being distribution free or having specified distribution, without keeping any parameters into consideration.

Non-parametric statistics are defined by non-parametric tests; these are the experiments that do not require any sample population for assumptions. For this reason, non-parametric tests are also known as distribution free tests as they don’t rely on data related to any particular parametric group of probability distributions.

In other terms, non-parametric statistics is a statistical method where a particular data is not required to fit in a normal distribution. Usually, non-parametric statistics used the ordinal data that doesn’t rely on the numbers, but rather a ranking or order. For consideration, statistical tests, inferences, statistical models, and descriptive statistics.

Non-parametric statistics is thus defined as a statistical method where data doesn’t come from a prescribed model that is determined by a small number of parameters. Unlike normal distribution model,  factorial design and regression modeling, non-parametric statistics is a whole different content.

Unlike parametric models, non-parametric is quite easy to use but it doesn’t offer the exact accuracy like the other statistical models. Therefore, non-parametric statistics is generally preferred for the studies where a net change in input has minute or no effect on the output. Like even if the numerical data changes, the results are likely to stay the same.

Also Read | What is Regression Testing?

How does Non-Parametric Statistics Work ?

Parametric statistics consists of the parameters like mean,  standard deviation , variance, etc. Thus, it uses the observed data to estimate the parameters of the distribution. Data are often assumed to come from a normal distribution with unknown parameters.

While, non-parametric statistics doesn’t assume the fact that the data is taken from a same or normal distribution. In fact, non-parametric statistics assume that the data is estimated under a different measurement. The actual data generating process is quite far from the normally distributed process.

Types of Non-Parametric Statistics

Non-parametric statistics are further classified into two major categories. Here is the brief introduction to both of them:

1. Descriptive Statistics

Descriptive statistics is a type of non-parametric statistics. It represents the entire population or a sample of a population. It breaks down the measure of central tendency and central variability.

2. Statistical Inference

Statistical inference is defined as the process through which inferences about the sample population is made according to the certain statistics calculated from the sample drawn through that population.

Some Examples of Non-Parametric Tests

In the recent research years, non-parametric data has gained appreciation due to their ease of use. Also, non-parametric statistics is applicable to a huge variety of data despite its mean, sample size, or other variation. As non-parametric statistics use fewer assumptions, it has wider scope than parametric statistics.

Here are some common  examples of non-parametric statistics :

Consider the case of a financial analyst who wants to estimate the value of risk of an investment. Now, rather than making the assumption that earnings follow a normal distribution, the analyst uses a histogram to estimate the distribution by applying non-parametric statistics.

Consider another case of a researcher who is researching to find out a relation between the sleep cycle and healthy state in human beings. Taking parametric statistics here will make the process quite complicated. 

So, despite using a method that assumes a normal distribution for illness frequency. The researcher will opt to use any non-parametric method like quantile regression analysis.

Similarly, consider the case of another health researcher, who wants to estimate the number of babies born underweight in India, he will also employ the non-parametric measurement for data testing.

A marketer that is interested in knowing the market growth or success of a company, will surely employ a non-statistical approach.

Any researcher that is testing the market to check the consumer preferences for a product will also employ a non-statistical data test. As different parameters in nutritional value of the product like agree, disagree, strongly agree and slightly agree will make the parametric application hard.

Any other science or social science research which include nominal variables such as age, gender, marital data, employment, or educational qualification is also called as non-parametric statistics. It plays an important role when the source data lacks clear numerical interpretation.

Also Read | Applications of Statistical Techniques

What are Non-Parametric Tests ?

Types of Non-Parametric Tests:1. Wilcoxon test 2. Mann-Whitney test 3. Kruskal Wallis test 4. Friedmann test

Types of Non-Parametric Tests

  Here is the list of non-parametric tests that are conducted on the population for the purpose of statistics tests :

Wilcoxon Rank Sum Test

The Wilcoxon test also known as rank sum test or signed rank test. It is a type of non-parametric test that works on two paired groups. The main focus of this test is comparison between two paired groups. The test helps in calculating the difference between each set of pairs and analyses the differences.

The Wilcoxon test is classified as a statistical  hypothesis tes t and is used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean rank is different or not.

Mann- Whitney U Test

The Mann-Whitney U test also known as the Mann-Whitney-Wilcoxon test, Wilcoxon rank sum test and Wilcoxon-Mann-Whitney test. It is a non-parametric test based on null hypothesis. It is equally likely that a randomly selected sample from one sample may have higher value than the other selected sample or maybe less.

Mann-Whitney test is usually used to compare the characteristics between two independent groups when the dependent variable is either ordinal or continuous. But these variables shouldn’t be normally distributed. For a Mann-Whitney test, four requirements are must to meet. The first three are related to study designs and the fourth one reflects the nature of data.

Kruskal Wallis Test

Sometimes referred to as a one way ANOVA on ranks, Kruskal Wallis H test is a nonparametric test that is used to determine the statistical differences between the two or more groups of an independent variable. The word ANOVA is expanded as Analysis of variance.

The test is named after the scientists who discovered it, William Kruskal and W. Allen Wallis. The major purpose of the test is to check if the sample is tested if the sample is taken from the same population or not.

Friedman Test

The Friedman test is similar to the Kruskal Wallis test. It is an alternative to the ANOVA test. The only difference between Friedman test and ANOVA test is that Friedman test works on repeated measures basis. Friedman test is used for creating differences between two groups when the dependent variable is measured in the ordinal.

The Friedman test is further divided into two parts, Friedman 1 test and Friedman 2 test. It was developed by sir Milton Friedman and hence is named after him. The test is even applicable to complete block designs and thus is also known as a special case of Durbin test.

Distribution Free Tests

Distribution free tests are defined as the mathematical procedures. These tests are widely used for testing statistical hypotheses. It makes no assumption about the probability distribution of the variables. An important list of distribution free tests is as follows:

  •  Anderson-Darling test: It is done to check if the sample is drawn from a given distribution or not.
  • Statistical bootstrap methods: It is a basic non-statistical test used to estimate the accuracy and sampling distribution of a statistic.
  • Cochran’s Q: Cochran’s Q is used to check constant treatments in block designs with 0/1 outcomes.
  • Cohen’s kappa: Cohen kappa is used to measure the inter-rater agreement for categorical items.
  • Kaplan-Meier test: Kaplan Meier test helps in estimating the survival function from lifetime data, modeling, and censoring.
  • Two-way analysis Friedman test: Also known as ranking test, it is used to randomize different block designs.
  • Kendall’s tau: The test helps in defining the statistical dependency between two different variables.
  • Kolmogorov-Smirnov test: The test draws the inference if a sample is taken from the same distribution or if two or more samples are taken from the same sample.
  • Kendall’s W: The test is used to measure the inference of an inter-rater agreement .
  • Kuiper’s test: The test is done to determine if the sample drawn from a given distribution is sensitive to cyclic variations or not.
  • Log Rank test: This test compares the survival distribution of two right-skewed and censored samples.
  • McNemar’s test: It tests the contingency in the sample and revert when the row and column marginal frequencies are equal to or not.
  • Median tests: As the name suggests, median tests check if the two samples drawn from the similar population have similar median values or not.
  • Pitman’s permutation test: It is a statistical test that yields the value of p variables. This is done by examining all possible rearrangements of labels.
  • Rank products: Rank products are used to detect expressed genes in replicated microarray experiments.
  • Siegel Tukey tests: This test is used for differences in scale between two groups.
  • Sign test: Sign test is used to test whether matched pair samples are drawn from distributions from equal medians.
  • Spearman’s rank: It is used to measure the statistical dependence between two variables using a monotonic function.
  • Squared ranks test: Squared rank test helps in testing the equality of variances between two or more variables.
  • Wald-Wolfowitz runs a test: This test is done to check if the elements of the sequence are mutually independent or random.

Also Read | Factor Analysis

Advantages and Disadvantages of Non-Parametric Tests

The benefits of non-parametric tests are as follows:

It is easy to understand and apply.

It consists of short calculations.

The assumption of the population is not required.

Non-parametric test is applicable to all data kinds

The limitations of non-parametric tests are:

It is less efficient than parametric tests.

Sometimes the result of non-parametric data is insufficient to provide an accurate answer.

Applications of Non-Parametric Tests

Non-parametric tests are quite helpful, in the cases :

Where parametric tests are not giving sufficient results.

When the testing hypothesis is not based on the sample.

For the quicker analysis of the sample.

When the data is unscaled.

The current scenario of research is based on fluctuating inputs, thus, non-parametric statistics and tests become essential for in-depth research and data analysis .

Share Blog :

hypothesis testing example non parametric

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

What is PESTLE Analysis? Everything you need to know about it

An Overview of Descriptive Analysis

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

hypothesis testing example non parametric

brenwright30

THIS IS HOW YOU CAN RECOVER YOUR LOST CRYPTO? Are you a victim of Investment, BTC, Forex, NFT, Credit card, etc Scam? Do you want to investigate a cheating spouse? Do you desire credit repair (all bureaus)? Contact Hacker Steve (Funds Recovery agent) asap to get started. He specializes in all cases of ethical hacking, cryptocurrency, fake investment schemes, recovery scam, credit repair, stolen account, etc. Stay safe out there! [email protected] https://hackersteve.great-site.net/

hypothesis testing example non parametric

  • Math Article
  • Non Parametric Test

Non-Parametric Test

Class Registration Banner

Non-parametric tests are experiments that do not require the underlying population for assumptions. It does not rely on any data referring to any particular parametric group of probability distributions . Non-parametric methods are also called distribution-free tests since they do not have any underlying population.  In this article, we will discuss what a non-parametric test is, different methods, merits, demerits and examples of non-parametric testing methods.

Table of Contents:

  • Non-parametric T Test
  • Non-parametric Paired T-Test

Mann Whitney U Test

Wilcoxon signed-rank test, kruskal wallis test.

  • Advantages and Disadvantages
  • Applications

What is a Non-parametric Test?

Non-parametric tests are the mathematical methods used in statistical hypothesis testing, which do not make assumptions about the frequency distribution of variables that are to be evaluated. The non-parametric experiment is used when there are skewed data, and it comprises techniques that do not depend on data pertaining to any particular distribution.

The word non-parametric does not mean that these models do not have any parameters. The fact is, the characteristics and number of parameters are pretty flexible and not predefined. Therefore, these models are called distribution-free models.

Non-Parametric T-Test

Whenever a few assumptions in the given population are uncertain, we use non-parametric tests, which are also considered parametric counterparts. When data are not distributed normally or when they are on an ordinal level of measurement, we have to use non-parametric tests for analysis. The basic rule is to use a parametric t-test for normally distributed data and a non-parametric test for skewed data.

Non-Parametric Paired T-Test

The paired sample t-test is used to match two means scores, and these scores come from the same group. Pair samples t-test is used when variables are independent and have two levels, and those levels are repeated measures.

Non-parametric Test Methods

The four different techniques of parametric tests, such as Mann Whitney U test, the sign test, the Wilcoxon signed-rank test, and the Kruskal Wallis test are discussed here in detail. We know that the non-parametric tests are completely based on the ranks, which are assigned to the ordered data. The four different types of non-parametric test are summarized below with their uses, null hypothesis , test statistic, and the decision rule. 

Kruskal Wallis test is used to compare the continuous outcome in greater than two independent samples.

Null hypothesis, H 0 :  K Population medians are equal.

Test statistic:

If N is the total sample size, k is the number of comparison groups, R j is the sum of the ranks in the jth group and n j is the sample size in the jth group, then the test statistic, H is given by:

\(\begin{array}{l}H = \left ( \frac{12}{N(N+1)}\sum_{j=1}^{k} \frac{R_{j}^{2}}{n_{j}}\right )-3(N+1)\end{array} \)

Decision Rule: Reject the null hypothesis H 0 if H ≥ critical value

The sign test is used to compare the continuous outcome in the paired samples or the two matches samples.

Null hypothesis, H 0 : Median difference should be zero 

Test statistic: The test statistic of the sign test is the smaller of the number of positive or negative signs.

Decision Rule: Reject the null hypothesis if the smaller of number of the positive or the negative signs are less than or equal to the critical value from the table.

Mann Whitney U test is used to compare the continuous outcomes in the two independent samples. 

Null hypothesis, H 0 : The two populations should be equal.

If R 1 and R 2 are the sum of the ranks in group 1 and group 2 respectively, then the test statistic “U” is the smaller of:

\(\begin{array}{l}U_{1}= n_{1}n_{2}+\frac{n_{1}(n_{1}+1)}{2}-R_{1}\end{array} \)

\(\begin{array}{l}U_{2}= n_{1}n_{2}+\frac{n_{2}(n_{2}+1)}{2}-R_{2}\end{array} \)

Decision Rule: Reject the null hypothesis if the test statistic, U is less than or equal to critical value from the table.

Wilcoxon signed-rank test is used to compare the continuous outcome in the two matched samples or the paired samples.

Null hypothesis, H 0 : Median difference should be zero.

Test statistic: The test statistic W, is defined as the smaller of W+ or W- .

Where W+ and W- are the sums of the positive and the negative ranks of the different scores.

Decision Rule: Reject the null hypothesis if the test statistic, W is less than or equal to the critical value from the table.

Advantages and Disadvantages of Non-Parametric Test

The advantages of the non-parametric test are:

  • Easily understandable
  • Short calculations
  • Assumption of distribution is not required
  • Applicable to all types of data

The disadvantages of the non-parametric test are:

  • Less efficient as compared to parametric test
  • The results may or may not provide an accurate answer because they are distribution free

Applications of Non-Parametric Test

The conditions when non-parametric tests are used are listed below:

  • When parametric tests are not satisfied.
  • When testing the hypothesis, it does not have any distribution.
  • For quick data analysis.
  • When unscaled data is available.

Frequently Asked Questions on Non-Parametric Test

What is meant by a non-parametric test.

The non-parametric test is one of the methods of statistical analysis, which does not require any distribution to meet the required assumptions, that has to be analyzed. Hence, the non-parametric test is called a distribution-free test.

What is the advantage of a non-parametric test?

The advantage of nonparametric tests over the parametric test is that they do not consider any assumptions about the data.

Is Chi-square a non-parametric test?

Yes, the Chi-square test is a non-parametric test in statistics, and it is called a distribution-free test.

Mention the different types of non-parametric tests.

The different types of non-parametric test are: Kruskal Wallis Test Sign Test Mann Whitney U test Wilcoxon signed-rank test

When to use the parametric and non-parametric test?

If the mean of the data more accurately represents the centre of the distribution, and the sample size is large enough, we can use the parametric test. Whereas, if the median of the data more accurately represents the centre of the distribution, and the sample size is large, we can use non-parametric distribution.

hypothesis testing example non parametric

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

Non-parametric Hypothesis Tests (Psychology)

Contents Toggle Main Menu 1 What is a Non-parametric Test? 1.1 Important Note 2 Sign Test 2.1 Worked Example 3 Mann-Whitney U -Test 3.1 Worked Example 4 Wilcoxon Matched Pairs Test

What is a Non-parametric Test?

Parametric hypothesis tests are based on the assumption that the data of interest has an underlying Normal distribution. The Normal distribution has the form of a symmetric bell-shaped curve, so naturally we need our data to be symmetric for a parametric test to be appropriate. However, sometimes our data is asymmetric so we must use a non-parametric test .

It is a traditional alternative approach because it makes few or no assumptions about the distribution of the data or population. Many non-parametric tests are based on ranks given to the original numerical scores/data. Usually non-parametric tests are regarded as relatively easy to perform but some problems can occur. It can be cumbersome to carry out such tests when working with large amounts of data. In psychological data, there are quite restricted ranges of scores, which can result in the same value appearing several times in a set of data. Tests based on rank can become more complicated with increased tied scores.

Important Note

The examples covered on this page do not necessarily have the best experimental designs. They are also purely hypothetical and any results or data are not from any real studies nor experiments. The purpose of them is to demonstrate how to use the various hypothesis tests covered in this section.

Example: Ranking Data

To rank data we must order a set of scores from smallest to largest. The smallest score is given rank 1, the second smallest score is given 2 and so on. It is purely the sample size that affects the ranks and not the actual numerical values of the scores.

Imagine you have collected a sample of ten students' exam scores (out of fifty) and wish to rank them.

You collect the following scores: $25, 49, 12, 40, 35, 43, 28, 30, 45, 18$.

If we sort them into ascending order, we get: $12, 18, 25, 28, 30, 35, 40, 45, 49$.

These are now in ranked order and we can put them into a table:

|center

The sign test is similar to the paired/related t -test , as it takes the differences between the two related samples of scores. However, you consider the sign of the difference, rather than the size of difference.

  • First, you delete any case where the scores are the same in both groups (so zero differences), they can be ignored in the sign test.
  • Subtract the second group's scores away from the first group's. Remember to include the sign of the difference ($+$ or $-$).
  • Now count the number of differences which have a positive sign and then count the number of differences with a negative sign.
  • Take the smaller number.
  • Look up the significance of the smaller number in a significance table. You look at the row containing the sum of the positive and negative signs (the total number of differences ignoring zero differences.) Your value must be in the range specified in the table for it to be statistically significant.
  • Report your findings and form your conclusion.

Worked Example

hypothesis testing example non parametric

A study has been conducted into the effects of alcohol and reaction time.  Ten participants are asked to watch a video and press a button every time a small red circle appears on the screen.  The total time between the circles appearing and when the button is pressed is recorded for each participant.  If a participant fails to press the button at any time, a time of $5$ seconds is added onto the total time.  

A week later the participants are then asked to repeat the task of watching the video and pressing the button when every red circle appears. However, this time they drink an alcoholic drink containing $2$ units of alcohol $15$ minutes beforehand. The total times are recorded again. Below is a table of the resulting times, to the nearest second.

' Perform a hypothesis test to see if alcohol as an effect on reaction times '

|center

The hypothesis we wish to test is if alcohol has an effect on reaction time. The null hypothesis $H_0$ is that alcohol has no effect on reaction time.

Firstly, remove any rows from the table which have identical scores. In this instance, the fourth participant has the same time under and not under the influence of alcohol. We then calculate the differences by subtracting the first column from the second column.

|center

We can count that $2$ differences have a negative sign, whereas $7$ differences have a postive sign. (Remember, we deleted data from one of our ten participants). So we use $2$ as our value to compare with significance tables.

|center

Looking at the $9 - 11$ row, we can see that the smaller number needs to be either $0$ or $1$ to have a significant significant results. Our value is $2$, so our results are not statistically unusual and we accept the null hypothesis. There is not enough evidence to suggest that alcohol has an effect on reaction time. Perhaps a study with more participants should be carried out.

A concise way of reporting our findings could be:

'Reactions times were slightly slower after consuming alcohol $(\bar{X}=24.444)$ (3 d.p.) compared to when alcohol was not consumed $(\bar{X}=24.111)$ (3 d.p.). However, this did not reach statistical significance, so it was not possible to reject the null hypothesis that alcohol has no effect on reaction time in this particular sample $($sign test$, n = 9, p$ ns$)$.'

Note: ns means not significant.

U -Test">Mann-Whitney U -Test

The Mann-Whitney $U$-test is perhaps the most common non-parametric test for unrelated samples of scores. You would use it when the two groups are independent of each other, for example if you were testing two different groups of people in a conformity study. It can used when the two groups are different sizes and also when they are the same size.

  • First, we state our null and alternative hypotheses.
  • Next, we rank all of the scores (from both groups) from the smallest to largest. Equal scores are allocated the average of the ranks they would have if there was tiny differences between them. For example, say there are two scores of $13$. If there was just one score of $13$ it would have the rank $7$ in this particular example. However, since there are two scores of $13$, we instead assign the rank $\dfrac{7+8}{2} = 7.5$ to both scores.
  • Next we sum the ranks for each group. You call the sum of the ranks for the larger group $R_1$ and for the smaller sized group, $R_2$. If both groups are equally sized then we can label them whichever way round we like.
  • We then input $R_1$ and $R_2$ and also $N_1$ and $N_2$, the respective sizes of each group, into the following formula:

\begin{equation} U = (N_1 \times N_2) + \dfrac{N_1 \times (N_1+1)}{2} - R_1 \end{equation}

  • Then we compare the value of $U$ to significance tables. You find the intersection of the column with the value of $N_1$ and the row with the value of $N_2$. In this intersection there will be two ranges of values of $U$ which are significant at the $5\%$ level. If our value is within one of these ranges, then we have a significant result and we reject the null hypothesis. If our value is not in the range then it is not significant and then the independent variable is unrelated to the dependent variable, we accept the $H_0$.
  • As a check, we also need to examine the means of the two groups, to see which has the higher scores on the dependent variable.
  • We then report our results.

A study into the effect of exercise on memory was carried out. One group (of size $8$) spent an hour sitting in a chair for $15$ minutes (No exercise group), whereas the other group (of size $10$) spent $15$ minutes playing dodgeball (Exercise group). They then were then shown $50$ random objects over a $4$ minute period and then asked to recall as many items as they possibly could in $2$ minutes. The number of objects they could remember was recorded as their scores. The results are in the table below.

Perform a Mann-Whitney $U$-test to see if there is a difference between the two groups.

| center

Here we have, \begin{align} H_0:& \text{Exercise has no effect on memory}.\\ H_1:& \text{Exercise has an effect on memory}.\\ \end{align} Now we need to assign ranks to each score.

An easy way to do this is write all the scores in ascending order and then write their corresponding ranks next to them and then put these back into a table.

So we have:

\begin{align} 17 - & 1\\ 19 - & 2\\ 21 - & 3.5\\ 21 - & 3.5\\ 25 - & 5\\ 27 - & 6\\ 28 - & 7.5\\ 28 - & 7.5\\ 29 - & 9\\ 30 - & 10\\ 31 - & 11\\ 32 - & 12\\ 33 - & 13\\ 34 - & 14\\ 36 - & 15\\ 39 - & 16\\ 41 - & 17\\ 45 - & 18\\ \end{align}

Note, the two scores of $21$ have a rank of $\frac{(3 + 4)}{2} = 3.5$ and the two scores of $28$ have a rank of $\frac{(7 + 8)}{2} = 7.5$.

We now can arrange these into a table.

|center

Now we can calculate $R_1$ and $R_2$. The 'Exercise' group is larger in size so we use those ranks to calculate $R_1$ and we use the smaller 'No exercise' group's ranks to calculate $R_2$. $N_1 = 10$ and $N_2 = 8$.

\begin{align} R_1 &= 3.5 + 18 + 13 + 9 + 6 + 17 + 15 + 16 + 7.5 + 14\\ &= 119\\ R_2 &= 12 + 1 + 2 + 7.5 + 5 + 11+ 3.5 + 10\\ &= 52.\\ \end{align}

Now we can calculate our $U$-value:

\begin{align} U &= (N_1 \times N_2) + \dfrac{N_1 \times (N_1 +1)}{2} - R_1\\ &= (10 \times 8) + \dfrac{10 \times (10+1)}{2} - 119\\ &= 80 + \dfrac{110}{2} - 119\\ &= 16.\\ \end{align}

We then compare it to a significance table.

|center

We can see that the $U$ -value of $16$ lies within the range $0 - 17$, thus we have a significant result at the $5\%$ level. This suggests we have evidence that exercise does have an effect on memory. Note: the mean scores for the 'Exercise' and 'No exercise' groups are respectively $25.375$ and $33.3$.

In a report, we would state our findings as follows.

'It was found that the scores of the memory tests were significantly higher $(U=16, n = 18. p<0.05)$ in the exercise group $(\bar{X}=33.3)$ than in the no exercise group$(\bar{X}=25.375)$.'

Wilcoxon Matched Pairs Test

The Wilcoxon matched pairs test, also known as the Wilcoxon signed ranks test, is similar to the sign test. The only alteration is that we rank the differences ignoring their signs (but we do keep a note of them). As the name implies, we use the Wilcoxon matched pairs test on related data, so each sample or group will be equal in size.

  • Calculate the difference scores between your two samples of data. We then remove difference scores of zero.
  • Rank them. If scores are tied then you use the same method as in the Mann-Whitney tests . You assign the difference scores the average rank if it was possible to separate the tied difference scores.
  • The ranks of the differences can now have the sign of the difference reattached (we will use superscripts - see example below).
  • The sum of the positive ranks are calculated.
  • The sum of the negative ranks are calculated.
  • You then choose the smaller sum of ranks and we call this our $T$-value, which we compare with significance tables. You choose the row which has the number of pairs of scores in your sample.
  • Report your findings and make your conclusion.

Consider the example with alcohol and reaction time in the Sign test section above . This time we shall perform the Wilcoxon Matched Pairs Test.

hypothesis testing example non parametric

We are testing the same hypotheses as above.

We already calculated the differences in the Sign test example, so now we just need to assign the ranks and attach the signs as superscripts.

|center

To calculate the rank of $1$ we first count up the number of $1$'s in the table (both $+1$ and $-1$ are included in this). We find that there are $3$. So the rank of these becomes $\dfrac{1+2+3}{3}=2$ as there are three $1$'s so they take the average value of the three individual ranks. Then we attach the signs as superscripts. Hence the rank of $+1$ is $2^+$ and the rank of $-1$ is $2^-$.

The sum of the positive ranks is: $2 + 7 + 2 + 8 + 6 + 4.5 = 29.5$.

The sum of the negative ranks is: $2 + 9 + 4.5 = 15.5$.

Here the smaller sum of ranks is $T = 15.5$, which we compare to a significance table.

|center

Since $11$ does not lie in the range $0 - 6$, we can conclude that our value is not statistically significant. There is no evidence to suggest here that alcohol has an effect on reaction time, we accept the null hypothesis. Once again, these results suggest more experiments should be carried out with changes to the experimental design, such as using more participants or increasing the units of alcohol.

An accurate report of our findings would be:

'The reaction times for the alcohol group $(\bar{X} = 25.444)$ (3 d.p.) were slower than for the no alcohol group $(\bar{X}=24.111)$ (3 d.p.). However, this difference was insufficient; so we cannot reject the null hypothesis that alcohol has no effect on reaction times $(T = 11, n =9, p >0.05,$ ns$)$.

  • Non-parametric tests

Lean Six Sigma Training Certification

6sigma.us

  • Facebook Instagram Twitter LinkedIn YouTube
  • (877) 497-4462

SixSigma.us

Mood’s Median Non-Parametric Hypothesis Test. A Complete Guide

May 17th, 2024

Often in stats research, teams encounter the imperative for comparing groups/samples’ central tendencies. While ANOVA frequently helps, it requires normalcy and homogeneity. When extremes or non-normality mar data, non-parametric exams better interpret. One such is Mood’s median test, which bears its discoverer’s name.

Designed by comparing medians of independent sets, it proves beneficial for exploratory or skewed information interpreters. Mood’s median examines central data proclivities without distorted normal prerequisites.

Particularly valued amid non-normalcy or outliers plaguing parametric test suppositions, it remains sturdy against aberrations.

By flagging central fixtures reliably notwithstanding abnormalities, Mood’s median grasps realities obscured to others. For teams grappling information information-defying widespread methods, it enlightens the next moves without parametric bonds.

Its sturdiness aids comprehension through hindrances to standard stats’ works, steering steady problem-solving as demands evolve. Joined insight lifts enterprises serving communities enduringly.

Key Highlights

  • Mood’s median test gauges medians between independent sets or samples non-parametrically. It proves an alternative when normalcy and homogeneity fail one-way ANOVA demands.
  • Stemming from chi-squared distributions, it examines normally distributed, equal medians hypotheses crosswise multitudes. Unperturbed by outliers or skews, suitability expands to non-regular figures.
  • Allowing bi-sample or multi-sample examination, suppositions include detachment, continued or ordered information alongside near underlying designing forms.
  • Furnishing test analytics and p-values, researchers determine if discernments distinguish significantly. Applicable where aberrations undermine orthodox techniques, it champions comprehension through unforeseen hurdles materializing.
  • By flagging median divergences reliably regardless of incongruences, Mood’s median guides choice-making, and optimization cooperatively sailed.

What is Mood’s Median Test?

Mood’s median test compares groups/samples’ midpoints non-parametrically unlike parametric exams demanding specific distributions.

Not requiring normalized information lets it interpret where those prerequisites limit. It expands bi-sample median investigations to abundance.

Null proposes population-wide medians align against another differing, tested against multi-treatment, independent demographic, or non-regular set median divergences.

Keys involve:

  • Examining multiple test subject brackets
  • Assessing treatment effects on non-standardized figures
  • Analyzing where regular assumptions constrain

While ANOVA outpaces spotting central tendency changes on normalized information, Mood’s median soundly detects divergences without such presumptions.

Proposed in ‘54 by Alexander Mood, it approximates chi-squared as repeats enlarge, providing valid conclusions minus distribution stipulations. For teams grappling with non-parametric realities, it highlights and provides choices.

Assumptions of Mood’s Median Test

Before running it, it’s important to check that the assumptions of the test are met. Violating these assumptions can lead to invalid results and conclusions. The key assumptions are:

  • Random Samples : The data must be collected using random sampling from the respective populations. This ensures the representativeness of the samples.
  • Independent Observations : The observations within each sample should be independent of each other. There should be no relationship between the observations that could influence the values.
  • Continuous or Ordinal Data : It requires the data to be continuous (measured on an interval or ratio scale) or ordinal (ranked data).
  • Similar Shape Distributions : While it does not require the distributions to be normal, the distributions should have similar shapes and spread. Dissimilar shapes can affect the validity of the results.
  • No Outliers : Extreme outliers in the data can significantly influence the median values and distort the test results. It’s recommended to check for and handle any outliers before conducting the test.
  • Tied Values : It can handle tied values (observations with the same value) within the samples. However, an excessive number of ties can reduce the test’s power and sensitivity.

Checking these assumptions is crucial as violations can increase the risk of Type I (false positive) or Type II (false negative) errors. Various graphical and statistical methods, such as histograms , boxplots , and normality tests , can be used to assess the assumptions.

If assumptions are violated, appropriate data transformations or non-parametric alternatives may be considered.

Hypothesis Testing in Mood’s Median Test

The Mood’s median test is a non-parametric hypothesis test that allows you to determine if the medians of two or more groups differ. It tests the null hypothesis that the medians of the groups are equal, against the alternative that at least one population median is different.

Null Hypothesis

The null hypothesis (H0) states that the medians of all groups are equal. Mathematically, this can be represented as:

H0: Median1 = Median2 = … = Mediank

Where k is the number of groups being compared.

Alternative Hypothesis

The alternative hypothesis (Ha) states that at least one median is different from the others. There are three possible alternative hypotheses:

1) Two-tailed test : At least one median differs

Ha : Not all medians are equal

2) Upper-tailed test : At least one median is larger  

Ha : At least one median is larger than the others

3) Lower-tailed test : At least one median is smaller

Ha : At least one median is smaller than the others

The choice between one-tailed or two-tailed depends on the research question.

Test Statistic

Mood’s median test uses a chi-square test statistic to evaluate the null hypothesis. The test statistic follows a chi-square distribution with k-1 degrees of freedom when the null is true.

The test statistic is calculated from the number of observations above and below the grand median in each group. Larger deviations from the expected counts indicate greater evidence against the null hypothesis of equal medians.

The p-value is the probability of observing a test statistic as extreme as the one calculated, assuming the null hypothesis is true. A small p-value (typically <0.05) indicates strong evidence against the null, allowing you to reject it.

Interpretation

If the p-value is less than the chosen significance level (e.g. 0.05), you reject the null hypothesis. This means at least one group median is statistically different from the others. Effect sizes and confidence intervals help quantify the median differences.

The test makes no assumptions about the distribution shapes, making it a robust non-parametric alternative to the one-way ANOVA when data violates normality assumptions.

Performing Mood’s Median Test

To perform it, there are several steps to follow. First, you need to state the null and alternative hypotheses. The null hypothesis (H0) is that the medians of the groups are equal, while the alternative hypothesis (Ha) is that at least one median is different.

Next, you’ll need to combine all the data points across groups and find the overall median. This combined median serves as the test criterion. 

For each group, count how many data points are greater than, less than, or equal to the combined median. These counts form the frequencies needed to calculate the test statistic.

It follows a chi-square distribution with k-1 degrees of freedom, where k is the number of groups. Calculate this test statistic based on the frequency counts and degrees of freedom.

Compare the test statistic to the critical value from the chi-square distribution for your chosen alpha level (e.g. 0.05). If the test statistic exceeds the critical value, you reject the null hypothesis. Otherwise, you fail to reject it.

Calculating the test statistic can be tedious by hand for larger sample sizes. Most statistical software like R, Python, Minitab , etc. have built-in functions to run Mood’s median test and provide the p-value directly. The p-value approach is equivalent – if p < alpha, reject H0.

It’s good practice to report the test statistic value, degrees of freedom, p-value, sample sizes, and your conclusion about the null hypothesis. Effect sizes can also provide more insight into the practical significance beyond statistical significance.

Mood’s Median Test in Statistical Software

It can be performed using various statistical software packages. While the test calculations can be done manually, using software is much more efficient, especially for larger datasets. Here are some examples of how to implement it in popular statistical programs:

Mood’s Median Test in R

In R, the mood.test() function from the RVAideMemoire package allows you to perform Mood’s median test. Here is an example:

install.packages(“RVAideMemoire”)

library(RVAideMemoire)

# Example data 

x1 <- c(42, 37, 39, 44, 36, 38)

x2 <- c(40, 39, 38, 37, 31, 43)

# Perform Mood’s test

mood.test(x1, x2)

This will output the test statistic, p-value, and other relevant metrics for Mood’s median test on the two sample vectors x1 and x2.

Mood’s Median Test in Python

For Python, the scipy.stats module provides the median_test() function to conduct the test. Here’s an example:

“`python

from scipy import stats

# Example data

x1 = [42, 37, 39, 44, 36, 38] 

x2 = [40, 39, 38, 37, 31, 43]

stats.median_test(x1, x2)

The median_test() function returns the chi-square statistic and p-value for the test.

Mood’s Median Test in Excel

Excel does not have a built-in function for this test. However, you can use add-ins or write custom VBA code to perform the test.

The Real Statistics Resource Pack provides a Mood’s Median Test data analysis tool for Excel.

Mood’s Median Test in SPSS, SAS, Minitab

Most major statistical software like SPSS , SAS, and Minitab provide the functionality to run this test, albeit through different function names and syntax. Refer to the respective documentation for implementation details.

No matter which software you use, be sure to verify the assumptions of this test before interpreting the results. Additionally, report the test statistic, p-value, sample sizes, and any other relevant metrics when presenting your findings.

Comparing Mood’s Median Test

When choosing a statistical test, it’s important to understand how Mood’s median test compares to other commonly used non-parametric tests like the Wilcoxon rank-sum test , the Kruskal-Wallis test , and the analysis of variance (ANOVA).

Mood’s Median Test vs Wilcoxon Rank-Sum Test

Both Mood’s median test and the Wilcoxon rank-sum test are non-parametric alternatives to the two-sample t-test . However, the Wilcoxon test assumes that the distributions have the same shape, while Mood’s test does not require this assumption.

Mood’s test is preferred when you cannot make the equal distribution shape assumption.

Mood’s Median Test vs Kruskal-Wallis Test

The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA for comparing more than two independent groups.

Like the Wilcoxon test, it assumes the distributions have the same shape. This test can be used when this assumption is violated, making it more robust for certain data sets.

Mood’s Median Test vs ANOVA

The key difference is that ANOVA is a parametric test that requires assumptions like normality and homogeneity of variances. The test is a non-parametric alternative when these assumptions are not met. It tests for differences in medians rather than means.

While sacrificing some statistical power compared to parametric tests when assumptions are met, this test is a robust option for non-normal data or heterogeneous variances across groups. The choice depends on whether the parametric assumptions can be reasonably satisfied.

Post-Hoc Analysis

If Mood’s median test detects a statistically significant difference among groups, post-hoc tests may be needed to determine which specific groups differ. Options include pairwise Mood’s median tests with a multiplicity adjustment.

Additional Considerations

While this is a useful non-parametric alternative to the one-way ANOVA, there are some additional points to keep in mind:

Power and Sample Size

Like other statistical tests, the power of Mood’s median test to detect an effect depends on the sample size.

With small samples, the test may not have enough power to find a significant difference even if one exists. Researchers should perform power analysis ahead of data collection to ensure adequate sample sizes.

Mood’s median test can handle tied observations within groups. However, it cannot deal with ties across different groups of medians. If there are ties across medians, the test may not be valid and an alternative like the Kruskal-Wallis test should be used instead.

If the overall test is significant, indicating differences between some of the medians, post-hoc tests are needed to determine which specific pairs of groups differ. Common post-hoc approaches include the Mann-Whitney U test or Dunn’s test .

Assumption Violations

While Mood’s test has fewer assumptions than the one-way ANOVA, the assumptions of random sampling and independence of observations still apply. Violations can increase the chance of false positives or false negatives.

Effect Size

Like other hypothesis tests, a significant p-value does not convey the degree of difference between groups. Effect sizes like the probability of superiority should be calculated and interpreted along with the p-value.

When reporting the results, good practice involves stating the test statistic value, degrees of freedom, p-value, sample sizes, medians, and effect size estimate. Graphical displays like boxplots can also aid interpretation.

Overall, Mood’s median test is a robust non-parametric tool. Still, careful checking of assumptions, appropriate sample sizing, post-hoc testing if needed, and comprehensive reporting of results is recommended for valid inference.

SixSigma.us offers both Live Virtual classes as well as Online Self-Paced training. Most option includes access to the same great Master Black Belt instructors that teach our World Class in-person sessions. Sign-up today!

Virtual Classroom Training Programs Self-Paced Online Training Programs

SixSigma.us Accreditation & Affiliations

PMI-logo-6sigma-us

Monthly Management Tips

  • Be the first one to receive the latest updates and information from 6Sigma
  • Get curated resources from industry-experts
  • Gain an edge with complete guides and other exclusive materials
  • Become a part of one of the largest Six Sigma community
  • Unlock your path to become a Six Sigma professional

" * " indicates required fields

  • Key Differences

Know the Differences & Comparisons

Difference Between Parametric and Nonparametric Test

parametric-vs-nonparametric-test

On the other hand, the nonparametric test  is one where the researcher has no idea regarding the population parameter. So, take a full read of this article, to know the significant differences between parametric and nonparametric test.

Content: Parametric Test Vs Nonparametric Test

Comparison chart, hypothesis tests hierarchy, equivalent tests, definition of parametric test.

The parametric test is the hypothesis test which provides generalisations for making statements about the mean of the parent population. A t-test based on Student’s t-statistic, which is often used in this regard.

The t-statistic rests on the underlying assumption that there is the normal distribution of variable and the mean in known or assumed to be known. The population variance is calculated for the sample. It is assumed that the variables of interest, in the population are measured on an interval scale.

Definition of Nonparametric Test

The nonparametric test is defined as the hypothesis test which is not based on underlying assumptions, i.e. it does not require population’s distribution to be denoted by specific parameters.

The test is mainly based on differences in medians. Hence, it is alternately known as the distribution-free test. The test assumes that the variables are measured on a nominal or ordinal level. It is used when the independent variables are non-metric.

Key Differences Between Parametric and Nonparametric Tests

The fundamental differences between parametric and nonparametric test are discussed in the following points:

  • A statistical test, in which specific assumptions are made about the population parameter is known as the parametric test. A statistical test used in the case of non-metric independent variables is called nonparametric test.
  • In the parametric test, the test statistic is based on distribution. On the other hand, the test statistic is arbitrary in the case of the nonparametric test.
  • In the parametric test, it is assumed that the measurement of variables of interest is done on interval or ratio level. As opposed to the nonparametric test, wherein the variable of interest are measured on nominal or ordinal scale.
  • In general, the measure of central tendency in the parametric test is mean, while in the case of the nonparametric test is median.
  • In the parametric test, there is complete information about the population. Conversely, in the nonparametric test, there is no information about the population.
  • The applicability of parametric test is for variables only, whereas nonparametric test applies to both variables and attributes.
  • For measuring the degree of association between two quantitative variables, Pearson’s coefficient of correlation is used in the parametric test, while spearman’s rank correlation is used in the nonparametric test.

parametric vs nonparametric test

To make a choice between parametric and the nonparametric test is not easy for a researcher conducting statistical analysis. For performing hypothesis, if the information about the population is completely known, by way of parameters, then the test is said to be parametric test whereas, if there is no knowledge about population and it is needed to test the hypothesis on population, then the test conducted is considered as the nonparametric test.

You Might Also Like:

onw way anova vs two way anova

Poorvi says

December 1, 2016 at 3:49 pm

Very nice article. students can clearly understand the actual concept.

Osoba Adunola says

December 5, 2016 at 3:11 pm

The information is very detailed and easy to grab. Thanks

Sue Smith says

December 15, 2016 at 11:37 am

This is excellent. The flowchart was really helpful. Thank you.

Surbhi S says

December 15, 2016 at 11:52 am

We are really contented with your views, this means a lot, keep sharing.

Nyadenga wellington says

April 10, 2017 at 6:30 pm

Great!. This clears off subject anxiety. keep sharing

tariro says

May 29, 2017 at 2:23 am

Thank u guys for simplifying this for us…. then u wonder why lecturers make it sound so hectic when things can be explained in such an easy to grasp way saving us the anxiety n pressure

July 11, 2017 at 8:13 am

MD. MEHEDI HASAN says

July 25, 2017 at 6:08 pm

These informations are very helpful to understand the concepts. Thanks

Suparna says

October 20, 2017 at 10:40 am

Thank you very much for the information and the explanation you’ve given… It helped me to understand the topic much better.

Morrel says

November 28, 2017 at 1:38 pm

Super Ball says

November 29, 2017 at 9:05 pm

Information is clear to understand, very helpful. Thank you 🙂

Kimnna says

December 13, 2017 at 12:05 am

This is super helpful! It is well detailed and easy to understand.

kakai brian says

February 6, 2018 at 1:31 pm

am liking this site it is great

Lillian Ramos says

February 11, 2018 at 9:12 am

This was extremely helpful on a very technical and difficult subject such as statistics.

March 13, 2018 at 2:08 pm

Please help me ….. I fail to understand what is meant if the question reads as follows:

State the parametric and non-parametric equivalent of the Wilcoxon Signed Rank Test

ABAH Augustine says

March 24, 2018 at 8:05 pm

very informative and educating. nice one

May 11, 2018 at 3:36 pm

Millions of thanks to all the readers of the page, for liking and sharing your valuable opinions with us, keep reading. 🙂

June 30, 2018 at 5:18 pm

This article is really helpful… Cheers to Surbhi S for creating this article and pls do continue on creating articles like this…

Prakash Mistri says

July 23, 2018 at 8:25 pm

very much effective documents. This material provides very good clarity on the parametric and non-parametric difference.

August 24, 2018 at 9:24 pm

thank you so much for making this more simplified

hanadi says

September 13, 2018 at 1:47 am

thank you for the simple yet detailed elaboration. million thanks you saved me.

Valsamma Cherian says

September 18, 2018 at 1:12 pm

Thank you . For this is very simple and apt information

kinza batool malik says

January 8, 2019 at 6:42 pm

its quite helpful and easy to understand..

Sally Morton says

February 28, 2019 at 10:17 pm

Thank you so much for this article, especially the Hypothesis Test Hierarchy chart. I am reviewing statistics, and this chart serves as a roadmap.

Shahzad rauf says

April 1, 2019 at 1:27 pm

Very helpful, I wll say awsome

Vishnu says

May 24, 2019 at 11:29 pm

Very good content and clear explanation

June 17, 2019 at 3:42 am

Thanks for a wonderfully easy explanation….

June 20, 2019 at 8:00 am

Omony Kamau says

October 7, 2019 at 7:08 pm

Very precise and to the points. I just like it

Ashenafi Tadesse says

June 15, 2020 at 7:45 pm

savita says

December 13, 2020 at 10:39 am

thanku very much….. very very helpful especially the hypothesis test chart

Stella Nwodo says

December 17, 2020 at 8:43 am

This document is very simplified, thank you for the knowledge.

nompe jeminah sebeko cekwana says

May 23, 2021 at 4:48 pm

Thank you very much, this information is clear and effective.

S.Sadiq Al-Alawi says

January 23, 2023 at 1:02 am

Thank you, very helpful, simply written, straight forward, easy to understand. I really liked it.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Non-Parametric Statistics in Python: Exploring Distributions and Hypothesis Testing

Non Parametric Statistics

Non-parametric statistics do not assume any strong assumptions of the distribution, which contrasts with parametric statistics. Non-parametric statistics focus on ranks and signs along with minimal assumptions.

Non-parametric statistics focus on analyzing data without making strong assumptions about the underlying distribution. Python offers various methods for exploring data distributions, such as histograms, kernel density estimation (KDE), and Q-Q plots. Apart from this, non-parametric hypothesis testing techniques like the Wilcoxon rank-sum test, Kruskal-Wallis test, and chi-square test allow for inferential analysis without relying on parametric assumptions.

In this article, we have divided non-parametric statistics into two parts – Methods for Exploring the underlying distribution and Hypothesis Testing and Inference.

Recommended: How To Calculate Power Statistics?

Exploring Data Distributions

Exploration of distribution helps us visualize the data and pin it to a theoretical distribution. It also helps us summarize the stats.

This subheading will teach us about Histograms, Kernel Density Estimation, and Q-Q Plots. We will also implement each of them in Python.

Visualizing Data with Histograms

Histograms are used to visualize the distribution of numerical data. The histogram gives us the range and shows the frequency of the range. They are very similar to Bar charts. Let us understand it further with Python code.

Let us look at the output for the above code.

Histogram Output

Estimating Probability Density with Kernel Density Estimation

Kernel Density Estimation (KDE) approximates the random variable’s probability density function (pdf). It provides us with continuous and much smoother visualization of the distribution. Let us look at the Python code for the same.

Let us look at the output of the above code.

Kernel Density Estimation Plot

Comparing Distributions with Q-Q Plots

Q-Q Plots or quantile=quantile plots are used to compare two probability distributions. They help us visualize whether two datasets came from some population or have the same distribution. Let us look at the Python code for the same.

Let us look at the output of the plot.

Q Q Plot Output

Now let us move on and see what are the methods for Hypothesis Testing and Inference.

Non-Parametric Hypothesis Testing and Inference

In Hypothesis testing and inference for non-parametric statistics, minimal assumptions about the underlying distribution are made and more focus is on rank-based statistics.

Under this subheading, we will learn about the Wilcoxon rank-sum, Krusal-Wallis, and Chi-square tests. Let us learn all of these with their Python implementation.

Comparing Means with Wilcoxon Rank-Sum Test

The Wilcoxon rank sum test, or the Mann-Whitney U test, is a non-parametric statistical test used to compare the means of two independent groups. In the code below, we have two datasets, and we want to conclude if there is any difference between the mean of the datasets. Let us look at the code below.

Let us look at its output.

Wilcox Rank Sum Test

Since the p-value is greater than 0.05, we can conclude that there is no difference between the mean of the datasets.

One-way ANOVA on ranks/Krusal Wallis Test

One-way ANOVA on ranks or Krusal-Wallis test is a non-parametric test to compare the mean of three or more independent groups. It does not assume normally distributed data. Let us look at the output code where we have assumed three datasets.

Let us look at the output of the code below.

Krusal Wallis Test Output

We fail to reject the null hypothesis since the p-value is greater than 0.05.

Testing Categorical Variables with Chi-Square Test

The chi-square test tests the difference between observed and expected frequencies in one or more categorical variables. It is also used for goodness-of-fit tests or whether they are independent. Let us look at the code below.

Chi Square Test Output

Since the p-value is more than 0.05, we cannot conclude any dependence between datasets.

Here we go! Now you know what non-parametric tests are. In this article, we have learned about the exploration of distribution and hypothesis testing of data without considering any parameters. We also learned about different kinds of tests to compare different datasets.

Hope you enjoyed reading it!!

Recommended: Chi-square test in Python — All you need to know!!

Recommended: Python’s Influence on Cloud Computing Projects: Revealing the Statistics

Key points of statistics for clinically oriented research

Kernpunkte der Statistik für klinisch orientierte Forschung

  • AGA-Komitee-Hefte
  • Open access
  • Published: 23 May 2024

Cite this article

You have full access to this open access article

hypothesis testing example non parametric

  • Brenda Laky PhD 1 , 2 , 3 , 4 &

AGA Research Committee

Processing of research data is a crucial point in every scientific study. The basis of each statistical test is a clear research question and hypothesis. Then the scientist must consider which type of data are to be collected (nominal, ordinal, continuous). It is recommended that study-specific spreadsheets are developed before data collection, which consider the type of data being collected, how the data are to be analyzed, and which software will be used. Then, once the data have been gathered, tests for normality should be applied before proceeding with the specific statistical tests. This article provides a brief overview of common statistical tests based on fictional examples.

Zusammenfassung

Die Datensammlung und -analyse ist ein entscheidender Teil einer wissenschaftlichen Arbeit. Die Grundlage eines jeden wissenschaftlichen Tests liefert dabei eine klar formulierte Fragestellung und Hypothese. Anschließend muss man sich klar machen, welche Art von Daten erhoben werden (nominale, ordinale, kontinuierliche). Hierfür ist es zu empfehlen, bereits vor Beginn der Studie entsprechende Datenblätter zu erstellen, um eine einheitliche und an die spätere Auswertung angepasste Datensammlung zu gewährleisten. Sobald die wissenschaftlichen Daten erhoben wurden, sollten diese auf Normalverteilung überprüft werden, bevor die spezifischen statistischen Tests Anwendung finden. Der vorliegende Artikel liefert anhand von Beispielen eine kurze Übersicht zu den wichtigsten statistischen Verfahren.

Avoid common mistakes on your manuscript.

Statistical considerations should not only be applied after data collection: an exact statistical analysis plan including sample size calculation, a well-described primary research question, and an entire data evaluation plan should already be incorporated into the study protocol!

Furthermore, precisely defined data are a prerequisite to being able to perform statistical analyses. The best way to do this is to create an Excel (Microsoft, Redmond, WA, USA) or SPSS (IMB Corp., Armonk, NY, USA) file including all data to be collected prior to data collection. In the next step, starting with the primary question (primary endpoint), specific questions can be formulated according to the PICO(T) and FINER criteria. The PICO acronym stands for patients (P) with a defined health problem, an intervention (I) with a specific treatment, with a comparison (C; e.g., control intervention) and an outcome (O; e.g., outcome measure), and, if applicable, for a time (T). The planned study design and methods can and should then be “refined” with the FINER method [ 1 ].

Choice of statistical test

The choice of statistical test depends on the following:

The reason for testing

a group/variable.

relationship between (a) groups with respect to one variable or (b) variables.

group comparison between 2 or ≥ 3 groups.

group comparison between independent (e.g., comparison between treatment A vs. B) or paired groups (e.g., comparison between before and after treatment).

type of variables.

nominal: bi- (e.g., yes/no) or polynominal (e.g., treatment A, B, C, D).

ordinal: non-parametric (not normally distributed, e.g., school grading system: very good/good/satisfactory/insufficient).

continuous: parametric (normally distributed, e.g., age in years).

The distribution of metric data can be graphically represented using histograms. Measures of central tendency (e.g., mean, median) describe the center of the data, and measures of dispersion (e.g., standard deviation, interquartile range) indicate how much the data vary. Normally distributed quantitative data should be reported as mean and standard deviation. Outliers within a data series can distort the mean (i.e., move it up or down) and, thus, affect distribution. Therefore, non-parametric continuous and ordinal data should be reported as median with interquartile range.

To make the selection easier, the following chart may help when planning a study (Fig.  1 ).

figure 1

Statistical tests cheat chart. ANOVA  analysis of variance, H0  null hypothesis

However, before a statistical test can be applied, the prerequisites must always be checked. For example, continuous variables must always be checked to see if they are normally distributed. If they are normally distributed, a parametric test must be chosen, if they are not normally distributed, a non-parametric test is more adequate.

Every protocol needs a statistical analysis plan in advance

When the right test is clear, data can be analyzed. There are good online tutorials for this (e.g., https://peterstatistics.com/CrashCourse/ shows how to do certain analyses with MS Excel, SPSS, or R). Besides well-known software like MS Excel, SPSS, SAS, and R (R Foundation for Statistical Computing, Vienna, Austria), the software Bluesky statistics (see https://www.blueskystatistics.com/ ) is a good free alternative.

Statistical analysis plan

Every protocol needs a statistical analysis plan in advance! This should include the following points:

The statistical test planned for the primary question/hypothesis. If the primary question/hypothesis requires more than one test to answer the research question, multiple testing must be considered and, hence, appropriate correction procedures (e.g., Bonferroni) must be applied.

Sample size calculation , if required; sample size calculations are not required for pilot projects and retrospective studies because such study types serve as the basis for prospective studies. Next to the primary question/hypothesis, an approximate hunch regarding the primary outcome from preliminary studies, reviews, and literature is required for sample size calculation. However, for prospective studies, an a priori sample size calculation is always important. Once a primary research question has been precisely defined, a sample size analysis can be performed e.g., by using G*Power [ 2 , 3 ]. Data needed for the sample size analysis should be taken from previous pilot studies or from similar studies that have already been published.

Descriptive statistics to describe demographic data.

For each further exploratory question, the planned test should be described ( analysis of secondary outcome parameters ). Exploratory, because a study should generally answer only one primary question/hypothesis at a time and all further questions serve to generate further hypotheses and, thus, could be used for future studies’ sample size calculations.

Practical approach

The following sections outline the procedure in practice.

Formulation of the research idea (general question).

Example : Does arthroscopic therapy improve orthopedic pathology more than endoprosthetic treatment?

Formulation of the primary question/hypothesis according to the PICO method:

patients with specific orthopedic pathologies

arthroscopic therapy (treatment A)

endoprosthetic treatment (treatment B)

outcome score (difference between pre- and postoperative)

preoperative and 5 years postoperative

Primary question

Does arthroscopic therapy improve the outcome score difference from preoperative to 5 years postoperative more than endoprosthetic treatment?

Null hypothesis (H0)

Mean outcome score difference = between treatment A and B

Alternative hypothesis (HA)

Possibility 1:.

mean outcome score difference ≠ between treatment A and B (if a difference is expected but one does not yet know which of the two treatments might be better).

Possibility 2:

mean outcome score difference < between treatment A and B (if one has the hypothesis that treatment A will show less mean outcome score difference than treatment B).

Possibility 3:

mean outcome score difference > between treatment A and B (if one has the hypothesis that treatment A will show more mean outcome score difference than treatment B).

Assuming the outcome score is a continuous variable, it would first need to be checked whether it is normally distributed (e.g., using Kolmogorov–Smirnov test or Shapiro–Wilk test). If the outcome score is parametric, the independent t‑test can be used to compare the means of the two independent groups. If the outcome score is not normally distributed, the Mann–Whitney test must be used.

Formulation and selection of statistical tests of all further questions

Other subsidiary questions of this project could be formulated as follows:

Question 3a

preoperative and 2 years postoperative

Does arthroscopic therapy improve the outcome score difference from preoperative to 2 years postoperative more than endoprosthetic treatment?

Statistical test.

Question 3b

satisfaction with treatment (satisfied/unsatisfied)

Is there a difference between arthroscopic and endoprosthetic treatment in terms of satisfaction with the treatment?

Since the outcome is a binomial variable (satisfaction with treatment: satisfied or dissatisfied) to be compared between two independent groups (treatment A vs. B), either the chi-square or Fisher exact test (in case of studies with small numbers of cases or when the number of observations in cells is small) must be applied.

Question 3c

outcome score 2 years after arthroscopic therapy (postoperative)

outcome score before arthroscopic therapy (preoperative)

outcome score comparison between pre- and 2‑year postoperative

Is there a difference between outcome score from pre- to 2‑year postoperative?

Here, a continuous variable (outcome score) is to be compared before and 2 years after arthroscopic therapy. Since this is a comparison of two outcome score results of one group of patients at different timepoints (paired), first it must be checked whether the outcome score results are normally distributed (e.g., using Kolmogorov–Smirnov test or Shapiro–Wilk test). If the outcome score is normally distributed and thus, parametric, the paired t‑test can be used for comparison of means between before and after arthroscopic therapy. If the outcome score is not normally distributed, the Wilcoxon signed-rank test must be used as data are non-parametrically distributed.

Practical conclusion

Checklist regarding the statistical analysis plan:

Define primary hypothesis according to the PICO plus FINER method.

Examination of data. Depending on the data type (nominal, ordinal, continuous) and considering the prerequisites of the respective statistical test (e.g., parametric, non-parametric), the adequate statistical test can then be selected.

If the PICOs are precisely defined and the adequate statistical test is selected, and if required, a sample size calculation can be performed.

Neugebauer E, Mutschler W, Claes L (2004) Von der Idee zur Publikation – Eine Anleitung zum erfolgreichen wissenschaftlichen Arbeiten, 1st edn. Georg Thieme Verlag KG

Google Scholar  

Faul F, Erdfelder E, Lang A‑G, Buchner A (2007) G*power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res 39:175–191

Article   Google Scholar  

Faul F, Erdfelder E, Buchner A, Lang A‑G (2009) Statistical power analyses using G*power 3.1: tests for correlation and regression analyses. Behav Res 41:1149–1160

Download references

Open access funding provided by Medical University of Vienna.

Author information

Authors and affiliations.

Austrian Research Group for Regenerative and Orthopedic Medicine (AURROM), Hartmanngasse 15/10, 1050, Vienna, Austria

Brenda Laky PhD

Austrian Society of Regenerative Medicine, Wollzeile 3/2.1, Vienna, Austria

Medical Faculty, Sigmund Freud University Vienna, Freudplatz 3, 1020, Vienna, Austria

Center for Clinical Research, University Clinic of Dentistry, Medical University of Vienna, Vienna, Austria

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Brenda Laky PhD .

Ethics declarations

Conflict of interest.

B. Laky and the AGA Research Committee declare that they have no competing interests.

For this article no studies with human participants or animals were performed by any of the authors. All studies mentioned were in accordance with the ethical standards indicated in each case.

Additional information

D. Günther, Köln

E. Herbst, Münster

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

figure qr

Scan QR code & read article online

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Laky, B., AGA Research Committee. Key points of statistics for clinically oriented research. Arthroskopie (2024). https://doi.org/10.1007/s00142-024-00685-8

Download citation

Accepted : 15 April 2024

Published : 23 May 2024

DOI : https://doi.org/10.1007/s00142-024-00685-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data interpretation, statistical
  • Data collection
  • Data visualization
  • Sample size
  • Research design

Schlüsselwörter

  • Dateninterpretation, statistische
  • Datensammlung
  • Datenvisualisierung
  • Stichprobengröße
  • Studiendesign
  • Find a journal
  • Publish with us
  • Track your research

Help | Advanced Search

Statistics > Methodology

Title: a non-parametric box-cox approach to robustifying high-dimensional linear hypothesis testing.

Abstract: The mainstream theory of hypothesis testing in high-dimensional regression typically assumes the underlying true model is a low-dimensional linear regression model, yet the Box-Cox transformation is a regression technique commonly used to mitigate anomalies like non-additivity and heteroscedasticity. This paper introduces a more flexible framework, the non-parametric Box-Cox model with unspecified transformation, to address model mis-specification in high-dimensional linear hypothesis testing while preserving the interpretation of regression coefficients. Model estimation and computation in high dimensions poses challenges beyond traditional sparse penalization methods. We propose the constrained partial penalized composite probit regression method for sparse estimation and investigate its statistical properties. Additionally, we present a computationally efficient algorithm using augmented Lagrangian and coordinate majorization descent for solving regularization problems with folded concave penalization and linear constraints. For testing linear hypotheses, we propose the partial penalized composite likelihood ratio test, score test and Wald test, and show that their limiting distributions under null and local alternatives follow generalized chi-squared distributions with the same degrees of freedom and noncentral parameter. Extensive simulation studies are conducted to examine the finite sample performance of the proposed tests. Our analysis of supermarket data illustrates potential discrepancies between our testing procedures and standard high-dimensional methods, highlighting the importance of our robustified approach.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. Parametric and Nonparametric Test with key differences

    hypothesis testing example non parametric

  2. Parametric Versus Nonparametric Test

    hypothesis testing example non parametric

  3. Parametric and Non-Paramtric test in Statistics

    hypothesis testing example non parametric

  4. Non-Parametric Hypothesis Testing in Excel, with the QI Macros

    hypothesis testing example non parametric

  5. Assumptions Of Nonparametric Tests

    hypothesis testing example non parametric

  6. Parametric and Non-Paramtric test in Statistics

    hypothesis testing example non parametric

VIDEO

  1. Bivariate Analysis: Hypothesis tests (Parametric Non-parametric tests)

  2. Session 8- Hypothesis testing by Non Parametric Tests (7/12/23)

  3. 6-19: Hypothesis Testing: Central Tendency for Non-Normal Distributions (Non-Parametric Overview)

  4. Statistics Part 4

  5. Chapter 09: Hypothesis testing: non-directional worked example

  6. Parametric Tests & Interpretation of Result

COMMENTS

  1. Nonparametric Tests vs. Parametric Tests

    Non-Parametric Test (Kruskal-Wallis H-test): The results show a significant difference in the distribution of returns across the portfolios (p-values < 0.05). Given that my data does not meet the normality assumption required for parametric tests, I am inclined to rely on the non-parametric test results.

  2. Parametric and Non-Parametric Tests: The Complete Guide

    Types of Non-parametric Tests Chi-Square Test. 1. It is a non-parametric test of hypothesis testing. 2. As a non-parametric test, chi-square can be used: test of goodness of fit. as a test of independence of two variables. 3. It helps in assessing the goodness of fit between a set of observed and those expected theoretically. 4.

  3. Nonparametric Tests

    Hypothesis Testing with Nonparametric Tests. In nonparametric tests, the hypotheses are not about population parameters (e.g., μ=50 or μ 1 =μ 2). Instead, the null hypothesis is more general. For example, when comparing two independent groups in terms of a continuous outcome, the null hypothesis in a parametric test is H 0: μ 1 =μ 2.

  4. Non-Parametric Tests in Hypothesis Testing

    Krusal-Wallis H Test (KW Test — Nonparametric version of one-way ANOVA) The Krusal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. It is a non-parametric version of ANOVA. A significant Kruskal-Wallis test indicates that at least one sample stochastically dominates one other sample.

  5. Choosing the Right Statistical Test

    Choosing a nonparametric test. Non-parametric tests don't make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. ... A Step-by-Step Guide with Easy Examples Hypothesis testing is a formal procedure for investigating our ideas about the world. It allows you to statistically ...

  6. Non Parametric Test

    A non-parametric test acts as an alternative to a parametric test for mathematical models where the nature of parameters is flexible. Usually, when the assumptions of parametric tests are violated then non-parametric tests are used. In this article, we will learn more about a non-parametric test, the types, examples, advantages, and disadvantages.

  7. Comprehensive Guide on Non Parametric Tests

    Introduction. In this article, we will explore what is hypothesis testing, focusing on the formulation of null and alternative hypotheses, setting up hypothesis tests and we will deep dive into parametric and non-parametric tests, discussing their respective assumptions and implementation in python.But our main focus will be on non-parametric tests like the Mann-Whitney U test and the Kruskal ...

  8. Lesson 11: Introduction to Nonparametric Tests and Bootstrap

    Objectives. Upon the completion of this lesson, you shoul be able to: Determine when to use nonparametric methods. Explain how to conduct the Sign test. Generate a bootstrap sample. Find a confidence interval for any statistic from the bootstrap sample. 11.1 - Inference for the Population Median. 11.2 - Introduction to Bootstrapping.

  9. Non-Parametric Tests for Beginners (Part 1: Rank and Sign Tests)

    Non-parametric test is an important branch of inferential statistics. Yet, it is not widely used, nor fully understood, by many data scientists and analysts. ... Wilcoxon tests (McDonald, 2014) are non-parametric alternatives to Welch's two-sample t-tests. The null hypothesis is that the median values of two populations are equal, against ...

  10. Nonparametric Hypothesis Tests in R

    1.1 Motivation and Goals. Nonparametric randomization and permutation tests offer robust alternatives to classic (parametric) hypothesis tests. Unlike classic hypothesis tests, which depend on parametric assumptions and/or large sample approximations for valid inference, nonparametric tests use computationally intensive methods to provide valid inferencial results under a wide collection of ...

  11. Non Parametric Tests

    In a non-parametric test, the observed sample is converted into ranks and then ranks are treated as a test statistic. Set decision rule. A decision rule is just a statement that tells when to reject the null hypothesis. Calculate test statistic. In non-parametric tests, we use the ranks to compute the test statistic.

  12. Hypothesis Testing, Parametric vs Nonparametric

    Hypothesis Testing, Parametric vs Nonparametric, Table 1 Statistical tests and their nonparametric analogs. There are many other tests such as the Kolmogorov-Smirnov test for comparing two distributions or the Wald-Wolfowitz runs test for randomness. An alternative to rank tests is to apply randomization (Edgington 1995 ), permutation, or other ...

  13. Non-Parametric Hypothesis Tests and Data Analysis

    Also, see one and two sample proportion non-parametric hypothesis tests, 1 Sample Sign Non Parametric Hypothesis Test, Advantages of Non-parametric tests. Non-parametric tests are distribution free; An added advantage is the reduction in the effect of outliers and variance heterogeneity on our results.

  14. Parametric and Non-Parametric Tests

    This primer provides an overview of 25 different hypothesis testing methods, including parametric and non-parametric tests, using reproducible R software. ... By understanding the alternatives, applications, and real-life examples of non-parametric tests, researchers can choose the appropriate test for their data and draw valid conclusions from ...

  15. PDF Lecture 7: Hypothesis Testing and ANOVA

    Parametric and Non-Parametric Tests •Parametric Tests: Relies on theoretical distributions of the test statistic under the null hypothesis and assumptions about the distribution of the sample data (i.e., normality) •Non-Parametric Tests: Referred to as "Distribution Free" as they do not assume that data are drawn from any particular ...

  16. Non-Parametric Statistics: Types, Tests, and Examples

    It is a non-parametric test based on null hypothesis. It is equally likely that a randomly selected sample from one sample may have higher value than the other selected sample or maybe less. Mann-Whitney test is usually used to compare the characteristics between two independent groups when the dependent variable is either ordinal or continuous.

  17. Parametric and Nonparametric Tests

    Parametric and non-parametric tests: If you want to calculate a hypothesis test, you must first check the prerequisites of the hypothesis test. A very common...

  18. Nonparametric statistics

    Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as is parametric statistics. [1] Nonparametric statistics can be used for descriptive statistics or statistical inference.

  19. Non-parametric Test (Definition, Methods, Merits, Demerits & Example)

    Non-parametric tests are the mathematical methods used in statistical hypothesis testing, which do not make assumptions about the frequency distribution of variables that are to be evaluated. The non-parametric experiment is used when there are skewed data, and it comprises techniques that do not depend on data pertaining to any particular ...

  20. Hypothesis, Parametric and Non-Parametric Testing

    Hypothesis testing is used for both parametric and non-parametric testing. So before going into detail about parametric and non-parametric testing we can discuss general terms used in hypothesis ...

  21. Non-parametric Hypothesis Tests (Psychology)

    What is a Non-parametric Test? Parametric hypothesis tests are based on the assumption that the data of interest has an underlying Normal distribution. The Normal distribution has the form of a symmetric bell-shaped curve, so naturally we need our data to be symmetric for a parametric test to be appropriate. However, sometimes our data is asymmetric so we must use a non-parametric test.

  22. Mood's Median Non-Parametric Hypothesis Test. A Complete Guide

    Hypothesis Testing in Mood's Median Test. The Mood's median test is a non-parametric hypothesis test that allows you to determine if the medians of two or more groups differ. It tests the null hypothesis that the medians of the groups are equal, against the alternative that at least one population median is different.

  23. Difference Between Parametric and Nonparametric Test

    The parametric test is the hypothesis test which provides generalisations for making statements about the mean of the parent population. A t-test based on Student's t-statistic, which is often used in this regard. ... Parametric Test Non-Parametric Test; Independent Sample t Test: Mann-Whitney test: Paired samples t test: Wilcoxon signed Rank ...

  24. Non-Parametric Statistics in Python: Exploring Distributions and

    Non-parametric statistics focus on analyzing data without making strong assumptions about the underlying distribution. Python offers various methods for exploring data distributions, such as histograms, kernel density estimation (KDE), and Q-Q plots. Apart from this, non-parametric hypothesis testing techniques like the Wilcoxon rank-sum test ...

  25. A Non-Parametric Box-Cox Approach to Robustifying High-Dimensional

    Keywords: High-dimensional testing, Linear hypothesis, Box-Cox model, Non-parametric trans-formation, Composite estimation, Composite likelihood ratio test, Score test, Wald test 1. Introduction In a high-dimensional regression problem, the number of covariates might diverge with the sample size or even exceed the sample size.

  26. Key points of statistics for clinically oriented research

    Define primary hypothesis according to the PICO plus FINER method. Examination of data. Depending on the data type (nominal, ordinal, continuous) and considering the prerequisites of the respective statistical test (e.g., parametric, non-parametric), the adequate statistical test can then be selected.

  27. A Non-Parametric Box-Cox Approach to Robustifying High-Dimensional

    View PDF Abstract: The mainstream theory of hypothesis testing in high-dimensional regression typically assumes the underlying true model is a low-dimensional linear regression model, yet the Box-Cox transformation is a regression technique commonly used to mitigate anomalies like non-additivity and heteroscedasticity. This paper introduces a more flexible framework, the non-parametric Box-Cox ...