Hypothesis Testing Calculator

$H_a$: μ μ₀
$n$ =   $\bar{x}$ =   =
$\text{Test Statistic: }$ =
$\text{Degrees of Freedom: } $ $df$ =
$ \text{Level of Significance: } $ $\alpha$ =

Type II Error

$H_o$: $\mu$
$H_a$: $\mu$ $\mu_0$
$n$ =   σ =   $\mu$ =
$\text{Level of Significance: }$ $\alpha$ =

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

$\sigma$ Known $\sigma$ Unknown
Test Statistic $ z = \dfrac{\bar{x}-\mu_0}{\sigma/\sqrt{{\color{Black} n}}} $ $ t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{n}} $

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

Lower Tail Test Upper Tail Test Two-Tailed Test
$H_0 \colon \mu \geq \mu_0$ $H_0 \colon \mu \leq \mu_0$ $H_0 \colon \mu = \mu_0$
$H_a \colon \mu $H_a \colon \mu \neq \mu_0$

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

Lower Tail Test Upper Tail Test Two-Tailed Test
If $z \leq -z_\alpha$, reject $H_0$. If $z \geq z_\alpha$, reject $H_0$. If $z \leq -z_{\alpha/2}$ or $z \geq z_{\alpha/2}$, reject $H_0$.
If $t \leq -t_\alpha$, reject $H_0$. If $t \geq t_\alpha$, reject $H_0$. If $t \leq -t_{\alpha/2}$ or $t \geq t_{\alpha/2}$, reject $H_0$.

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

$H_0$ True $H_a$ True
Conclusion Accept $H_0$ Correct Type II Error
Reject $H_0$ Type I Error Correct

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.


  • Math Lessons
  • Math Formulas
  • Calculators

Math Calculators, Lessons and Formulas

It is time to solve your math problem

  • HW Help (paid service)
  • Statistics and probability
  • T-test calculator

T-Test calculator

google play badge

The Student's t-test is used to determine if means of two data sets differ significantly. This calculator will generate a step by step explanation on how to apply t – test.

Groups Have Equal Variance (default)
Groups Have Unequal Variance (Welch t-test)
Two Tailed Test (default)
One Tailed Test
0.05 (default)
Unpaired T Test (default)
Paired (Dependent) T Test
  • Long Division
  • Evaluate Expressions
  • Fraction Calculator
  • Greatest Common Divisor GCD
  • Least Common Multiple LCM
  • Prime Factorization
  • Scientific Notation
  • Percentage Calculator
  • Dec / Bin / Hex
  • Factoring Polynomials
  • Polynomial Roots
  • Synthetic Division
  • Polynomial Operations
  • Graphing Polynomials
  • Simplify Polynomials
  • Generate From Roots
  • Simplify Expression
  • Multiplication / Division
  • Addition / Subtraction
  • Rationalize Denominator
  • Simplifying
  • Quadratic Equations Solver
  • Polynomial Equations
  • Solving Equations - With Steps
  • Solving (with steps)
  • Quadratic Plotter
  • Factoring Trinomials
  • Equilateral Triangle
  • Right Triangle
  • Oblique Triangle
  • Square Calculator
  • Rectangle Calculator
  • Circle Calculator
  • Hexagon Calculator
  • Rhombus Calculator
  • Trapezoid Calculator
  • Triangular Prism
  • Distance calculator
  • Midpoint Calculator
  • Triangle Calculator
  • Graphing Lines
  • Lines Intersection
  • Two Point Form
  • Line-Point Distance
  • Parallel/Perpendicular
  • Circle Equation
  • Circle From 3 Points
  • Circle-line Intersection
  • Modulus, inverse, polar form
  • Vectors (2D & 3D)
  • Add, Subtract, Multiply
  • Determinant Calculator
  • Matrix Inverse
  • Characteristic Polynomial
  • Eigenvalues
  • Eigenvectors
  • Matrix Decomposition
  • Limit Calculator
  • Derivative Calculator
  • Integral Calculator
  • Arithmetic Sequences
  • Geometric Sequences
  • Find n th Term
  • Degrees to Radians
  • Trig. Equations
  • Probability Calculator
  • Probability Distributions
  • Descriptive Statistics
  • Standard Deviation
  • Z - score Calculator
  • Normal Distribution
  • T-Test Calculator
  • Correlation & Regression
  • Simple Interest
  • Compound Interest
  • Amortization Calculator
  • Annuity Calculator
  • Work Problems

Hire MATHPORTAL experts to do math homework for you.

Prices start at $3 per problem.

Twelve younger adults and twelve older adults conducted a life satisfaction test. The data are presented in the table below. Compute the appropriate t-test.

Are the means between two data sets are significantly different at level $\alpha < 0.05$.

Welcome to MathPortal. This website's owner is mathematician Miloš Petrović. I designed this website and wrote all the calculators, lessons, and formulas .

If you want to contact me, probably have some questions, write me using the contact form or email me on [email protected]

Email (optional)

An open portfolio of interoperable, industry leading products

The Dotmatics digital science platform provides the first true end-to-end solution for scientific R&D, combining an enterprise data platform with the most widely used applications for data analysis, biologics, flow cytometry, chemicals innovation, and more.

2 tailed hypothesis testing calculator

Statistical analysis and graphing software for scientists

Bioinformatics, cloning, and antibody discovery software

Plan, visualize, & document core molecular biology procedures

Electronic Lab Notebook to organize, search and share data

Proteomics software for analysis of mass spec data

Modern cytometry analysis platform

Analysis, statistics, graphing and reporting of flow cytometry data

Software to optimize designs of clinical trials


T test calculator

A t test compares the means of two groups. There are several types of two sample t tests and this calculator focuses on the three most common: unpaired, welch's, and paired t tests. Directions for using the calculator are listed below, along with more information about two sample t tests and help on which is appropriate for your analysis. NOTE: This is not the same as a one sample t test; for that, you need this One sample t test calculator .

1. Choose data entry format

Caution: Changing format will erase your data.

2. Choose a test

Help me choose

3. Enter data

Help me arrange the data

4. View the results

What is a t test.

A t test is used to measure the difference between exactly two means. Its focus is on the same numeric data variable rather than counts or correlations between multiple variables. If you are taking the average of a sample of measurements, t tests are the most commonly used method to evaluate that data. It is particularly useful for small samples of less than 30 observations. For example, you might compare whether systolic blood pressure differs between a control and treated group, between men and women, or any other two groups.

This calculator uses a two-sample t test, which compares two datasets to see if their means are statistically different. That is different from a one sample t test , which compares the mean of your sample to some proposed theoretical value.

The most general formula for a t test is composed of two means (M1 and M2) and the overall standard error (SE) of the two samples:

t test formula

See our video on How to Perform a Two-sample t test for an intuitive explanation of t tests and an example.

How to use the t test calculator

  • Choose your data entry format . This will change how section 3 on the page looks. The first two options are for entering your data points themselves, either manually or by copy & paste. The last two are for entering the means for each group, along with the number of observations (N) and either the standard error of that mean (SEM) or standard deviation of the dataset (SD) standard error. If you have already calculated these summary statistics, the latter options will save you time.
  • Choose a test from the three options: Unpaired t test, Welch's unpaired t test, or Paired t test. Use our Ultimate Guide to t tests if you are unsure which is appropriate, as it includes a section on "How do I know which t test to use?". Notice not all options are available if you enter means only.
  • Enter data for the test, based on the format you chose in Step 1.
  • Click Calculate Now and View the results. All options will perform a two-tailed test .

Performing t tests? We can help.

Sign up for more information on how to perform t tests and other common statistical analyses.

Common t test confusion

In addition to the number of t test options, t tests are often confused with completely different techniques as well. Here's how to keep them all straight.

Correlation and regression are used to measure how much two factors move together. While t tests are part of regression analysis, they are focused on only one factor by comparing means in different samples.

ANOVA is used for comparing means across three or more total groups. In contrast, t tests compare means between exactly two groups.

Finally, contingency tables compare counts of observations within groups rather than a calculated average. Since t tests compare means of continuous variable between groups, contingency tables use methods such as chi square instead of t tests.

Assumptions of t tests

Because there are several versions of t tests, it's important to check the assumptions to figure out which is best suited for your project. Here are our analysis checklists for unpaired t tests and paired t tests , which are the two most common. These (and the ultimate guide to t tests ) go into detail on the basic assumptions underlying any t test:

  • Exactly two groups
  • Sample is normally distributed
  • Independent observations
  • Unequal or equal variance?
  • Paired or unpaired data?

Interpreting results

The three different options for t tests have slightly different interpretations, but they all hinge on hypothesis testing and P values. You need to select a significance threshold for your P value (often 0.05) before doing the test.

While P values can be easy to misinterpret , they are the most commonly used method to evaluate whether there is evidence of a difference between the sample of data collected and the null hypothesis. Once you have run the correct t test, look at the resulting P value. If the test result is less than your threshold, you have enough evidence to conclude that the data are significantly different.

If the test result is larger or equal to your threshold, you cannot conclude that there is a difference. However, you cannot conclude that there was definitively no difference either. It's possible that a dataset with more observations would have resulted in a different conclusion.

Depending on the test you run, you may see other statistics that were used to calculate the P value, including the mean difference, t statistic, degrees of freedom, and standard error. The confidence interval and a review of your dataset is given as well on the results page.

Graphing t tests

This calculator does not provide a chart or graph of t tests, however, graphing is an important part of analysis because it can help explain the results of the t test and highlight any potential outliers. See our Prism guide for some graphing tips for both unpaired and paired t tests.

Prism is built for customized, publication quality graphics and charts. For t tests we recommend simply plotting the datapoints themselves and the mean, or an estimation plot . Another popular approach is to use a violin plot, like those available in Prism.

For more information

Our ultimate guide to t tests includes examples, links, and intuitive explanations on the subject. It is quite simply the best place to start if you're looking for more about t tests!

If you enjoyed this calculator, you will love using Prism for analysis. Take a free 30-day trial to do more with your data, such as:

  • Clear guidance to pick the right t test and detailed results summaries
  • Custom, publication quality t test graphics, violin plots, and more
  • More t test options, including normality testing as well as nested and multiple t tests
  • Non-parametric test alternatives such as Wilcoxon, Mann-Whitney, and Kolmogorov-Smirnov

Check out our video on how to perform a t test in Prism , for an example from start to finish!

Remember, this page is just for two sample t tests. If you only have one sample, you need to use this calculator instead.

We Recommend:

Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required.



Conversion Calculator


Follow Us On:

P Value Calculator

Enter the values and calculate the P-value from the statistical test you performed.

Significance Level:

Add this calculator to your site

P Value Calculator:

Use this P value calculator to calculate either one or two-tailed P values from statistical scores such as Z score, T score, F ratio score, Pearson (R) score, chi-square value, and Tukey Q score. With that, it automatically determines whether your results are statistically significant or not based on your chosen significance level.

What Is P-Value?

“The p-value is the probability of obtaining results at least as extreme as the observed ones, assuming the null hypothesis is true”

P value statistical image:

Null Hypothesis (H0):

When there is no difference between the observed value and the expected value, then this condition is known as the null hypothesis. 

Alternative Hypothesis (H1):

This condition shows a difference between the expected and the observed value. Meanwhile, it proposed that there is an effect on the data. 


It means your results are so unlikely to happen randomly.

How To Calculate P Value?

There are different statistical tests(Z score, T score, Chi-square, etc) and each test requires different parameters to calculate the p-value. A p-value is based on the probability distribution of the test under the null hypothesis (H₀).

P value from Z Score:

A z score tells you how far a specific point is from the average(mean) value. The Z score depends on the standard normal distribution. It is used to find the difference for both large and small samples until the data follows the normal distribution. 

Z = X- µ σ

P Value From T Score:

A t score, like a z score, is a standardized score used in statistics to know the distance of a point from the mean value. It is expressed in terms of standard deviation.

Z = X- µ S ÷ n

  • Positive T-score: The data point's value is above the mean value. It means the higher the p score, the above the data point is
  • Negative T-score: It means that the data point is below the mean value
  • T-score of 0: It means that the data point is equal to the mean

P value From Chi-Square (X2):

A chi-square test is used to determine the relationship between the categorical variables. With the help of the chi-square test, you can determine whether there’s a statistically significant difference or not between what you expected and what you observed in your data, especially when analyzing surveys with categorized answers. It helps you understand how likely the results are due to chance. 

Chi-square distribution:

If the value of the difference is large, then it suggests a relationship between the variables. It does not provide any information about the direction(positive or negative). 

X 2 = Σ (O- E) 2 E

P Value From F Statistic:

The f statistic is used in conjunction with an F-test to assess the difference between variances of two or more groups (populations or samples). There are different F tests so the interpretation depends upon the type of test that is used and the associated degree of freedom. The interpretation of the F text depends upon the resulting p-value. Therefore stay attentive and focused whether you are performing the manual calculation or doing it with the help of the p value calculator. If the value of p is low, then it means that the variance is likely different. While a greater p score indicates that the null hypothesis of variance can not be rejected. 

F = (s 1 ) 2 (s 2 ) 2

The degrees of freedom in the nominator is df 1 = n 1 - 1 and the degree of freedom for denominator is df 2 = n 2 - 1

  • (s 1 ) 2 indicates the first sample variance
  • (s 2 ) 2 represents the second sample variance

The F statistic is commonly used in:

  • Analysis of Variance (ANOVA)
  • Regression Analysis 

P-Value From Pearson (r) Score:

Pearson (r) score is a statistical measure that finds the degree of linear relationship between two quantitive variables. It gives the value between -1 and +1, indicating the relationship and direction. The positive and negative signs shows the direction of the association. You can use the number between -1 and +1 and the degree of freedom (N-2) to find the P value from the r score. 

  • Strong Positive Correlation: If it gives a value that’s near the +1, then it means on increasing one value, the other value will also increase
  • Strong Negative Correlation: When the value is closer to the -1, then it means that on increasing one value the other value will decrease
  • Weak or No Linear Correlation: If the value is near 0, then it indicates a weak or no relation between the variables

Finding p value from the Pearson (r) score involves the following steps:

Calculate the test statistic (t)

t= r n-2 (1 - r 2 )

Determine the degrees of freedom (df) = n−2

Use the t-distribution table to determine the critical t-value and interpolate (if necessary)

y = y + (x - x 1 )(y 2 - y 1 ) x 2 - x 1

One Tail








Two Tails










































































































































































Source: tutorialspoint.com .

Approximate p value

P-Value From Tukey q (Studentized Range Distribution) Score:

Tukey's HSD (Honestly Significant Difference) is the test that compares groups in the data and finds significant differences to determine whether they are significant or not. 

To find p value from Tukey q Score:

  • Determine the Tukey Q Score: It shows the magnitude of the difference between two groups
  • Calculate the Degrees of Freedom(df): It relies upon the number of groups that are compared and the sample size
  • Studentized Range Distribution Table: Use the table that contains the calculated q-score and degrees of freedom
  • Approximate the interpolation: If the q score does not match with any value of the table then use the interpolation to find p value

While we mentioned how easily you can calculate p-values from various statistics. For more convenient calculations, you can start using our p-value calculator. It uses different scores and appropriate distribution to provide you p-value directly. 


How to interpret p-value results.

  • When the p-value is lower than the significance, then it means that the result is statically significant
  • When the p-value is higher than the significance, then it indicates that the result is not significant and the p-value does not provide the evidence to reject the null hypothesis

When To Use A One-Tailed or Two-Tailed Test?

It depends upon the expectation of results and the direction of the effect.

  • One-Tailed Test: This test is used when you have the directional hypothesis. It means you expect that the result can happen in a specific direction. For example, you are a teacher and now you are going to use a new technique. You expect that the result of the new method will have a positive impact.
  • Two-Tailed Test: It is used when you want to perform the test but don't have any directional hypothesis. For instance, you are testing the effect of a medicine but you don't know whether it will increase or decrease the blood pressure.

What Is The Difference Between P-Value And Significance Level?

  • P value: The p-value represents the strength of the evidence against the null hypothesis. This is the probability obtained from your data. It informs you about the probability of the result while considering the null hypothesis true. A low p-value indicates that the obtained result is unlikely due to chance.
  • Significance Level: The significance is the level of evidence required to reject the null hypothesis, when it's true (a type I error). It is typically set to 0.05 but can be changed based on the study


From the source of Wikipedia.org: P-Value .

Sum of Squares Calculator

Critical Value Calculator

Midrange Calculator

Probability Calculator

Add this calculator to your site.

Just copy a given code & paste it right now into your website HTML (source) for suitable page.

Calculator Online

Give Us Your Feedback

Remove image

Share Result


Get the ease of calculating anything from the source of calculator online

Email us at

© Copyrights 2024 by Calculator-Online.net

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

One-Tailed and Two-Tailed Hypothesis Tests Explained

By Jim Frost 61 Comments

Choosing whether to perform a one-tailed or a two-tailed hypothesis test is one of the methodology decisions you might need to make for your statistical analysis. This choice can have critical implications for the types of effects it can detect, the statistical power of the test, and potential errors.

In this post, you’ll learn about the differences between one-tailed and two-tailed hypothesis tests and their advantages and disadvantages. I include examples of both types of statistical tests. In my next post, I cover the decision between one and two-tailed tests in more detail.

What Are Tails in a Hypothesis Test?

First, we need to cover some background material to understand the tails in a test. Typically, hypothesis tests take all of the sample data and convert it to a single value, which is known as a test statistic. You’re probably already familiar with some test statistics. For example, t-tests calculate t-values . F-tests, such as ANOVA, generate F-values . The chi-square test of independence and some distribution tests produce chi-square values. All of these values are test statistics. For more information, read my post about Test Statistics .

These test statistics follow a sampling distribution. Probability distribution plots display the probabilities of obtaining test statistic values when the null hypothesis is correct. On a probability distribution plot, the portion of the shaded area under the curve represents the probability that a value will fall within that range.

The graph below displays a sampling distribution for t-values. The two shaded regions cover the two-tails of the distribution.

Plot that display critical regions in the two tails of the distribution.

Keep in mind that this t-distribution assumes that the null hypothesis is correct for the population. Consequently, the peak (most likely value) of the distribution occurs at t=0, which represents the null hypothesis in a t-test. Typically, the null hypothesis states that there is no effect. As t-values move further away from zero, it represents larger effect sizes. When the null hypothesis is true for the population, obtaining samples that exhibit a large apparent effect becomes less likely, which is why the probabilities taper off for t-values further from zero.

Related posts : How t-Tests Work and Understanding Probability Distributions

Critical Regions in a Hypothesis Test

In hypothesis tests, critical regions are ranges of the distributions where the values represent statistically significant results. Analysts define the size and location of the critical regions by specifying both the significance level (alpha) and whether the test is one-tailed or two-tailed.

Consider the following two facts:

  • The significance level is the probability of rejecting a null hypothesis that is correct.
  • The sampling distribution for a test statistic assumes that the null hypothesis is correct.

Consequently, to represent the critical regions on the distribution for a test statistic, you merely shade the appropriate percentage of the distribution. For the common significance level of 0.05, you shade 5% of the distribution.

Related posts : Significance Levels and P-values and T-Distribution Table of Critical Values

Two-Tailed Hypothesis Tests

Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution. In the example below, I use an alpha of 5% and the distribution has two shaded regions of 2.5% (2 * 2.5% = 5%).

When a test statistic falls in either critical region, your sample data are sufficiently incompatible with the null hypothesis that you can reject it for the population.

In a two-tailed test, the generic null and alternative hypotheses are the following:

  • Null : The effect equals zero.
  • Alternative :  The effect does not equal zero.

The specifics of the hypotheses depend on the type of test you perform because you might be assessing means, proportions, or rates.

Example of a two-tailed 1-sample t-test

Suppose we perform a two-sided 1-sample t-test where we compare the mean strength (4.1) of parts from a supplier to a target value (5). We use a two-tailed test because we care whether the mean is greater than or less than the target value.

To interpret the results, simply compare the p-value to your significance level. If the p-value is less than the significance level, you know that the test statistic fell into one of the critical regions, but which one? Just look at the estimated effect. In the output below, the t-value is negative, so we know that the test statistic fell in the critical region in the left tail of the distribution, indicating the mean is less than the target value. Now we know this difference is statistically significant.

Statistical output from a two-tailed 1-sample t-test.

We can conclude that the population mean for part strength is less than the target value. However, the test had the capacity to detect a positive difference as well. You can also assess the confidence interval. With a two-tailed hypothesis test, you’ll obtain a two-sided confidence interval. The confidence interval tells us that the population mean is likely to fall between 3.372 and 4.828. This range excludes the target value (5), which is another indicator of significance.

Advantages of two-tailed hypothesis tests

You can detect both positive and negative effects. Two-tailed tests are standard in scientific research where discovering any type of effect is usually of interest to researchers.

One-Tailed Hypothesis Tests

One-tailed hypothesis tests are also known as directional and one-sided tests because you can test for effects in only one direction. When you perform a one-tailed test, the entire significance level percentage goes into the extreme end of one tail of the distribution.

In the examples below, I use an alpha of 5%. Each distribution has one shaded region of 5%. When you perform a one-tailed test, you must determine whether the critical region is in the left tail or the right tail. The test can detect an effect only in the direction that has the critical region. It has absolutely no capacity to detect an effect in the other direction.

In a one-tailed test, you have two options for the null and alternative hypotheses, which corresponds to where you place the critical region.

You can choose either of the following sets of generic hypotheses:

  • Null : The effect is less than or equal to zero.
  • Alternative : The effect is greater than zero.

Plot that displays a single critical region for a one-tailed test.

  • Null : The effect is greater than or equal to zero.
  • Alternative : The effect is less than zero.

Plot that displays a single critical region in the left tail for a one-tailed test.

Again, the specifics of the hypotheses depend on the type of test you perform.

Notice how for both possible null hypotheses the tests can’t distinguish between zero and an effect in a particular direction. For example, in the example directly above, the null combines “the effect is greater than or equal to zero” into a single category. That test can’t differentiate between zero and greater than zero.

Example of a one-tailed 1-sample t-test

Suppose we perform a one-tailed 1-sample t-test. We’ll use a similar scenario as before where we compare the mean strength of parts from a supplier (102) to a target value (100). Imagine that we are considering a new parts supplier. We will use them only if the mean strength of their parts is greater than our target value. There is no need for us to differentiate between whether their parts are equally strong or less strong than the target value—either way we’d just stick with our current supplier.

Consequently, we’ll choose the alternative hypothesis that states the mean difference is greater than zero (Population mean – Target value > 0). The null hypothesis states that the difference between the population mean and target value is less than or equal to zero.

Statistical output for a one-tailed 1-sample t-test.

To interpret the results, compare the p-value to your significance level. If the p-value is less than the significance level, you know that the test statistic fell into the critical region. For this study, the statistically significant result supports the notion that the population mean is greater than the target value of 100.

Confidence intervals for a one-tailed test are similarly one-sided. You’ll obtain either an upper bound or a lower bound. In this case, we get a lower bound, which indicates that the population mean is likely to be greater than or equal to 100.631. There is no upper limit to this range.

A lower-bound matches our goal of determining whether the new parts are stronger than our target value. The fact that the lower bound (100.631) is higher than the target value (100) indicates that these results are statistically significant.

This test is unable to detect a negative difference even when the sample mean represents a very negative effect.

Advantages and disadvantages of one-tailed hypothesis tests

One-tailed tests have more statistical power to detect an effect in one direction than a two-tailed test with the same design and significance level. One-tailed tests occur most frequently for studies where one of the following is true:

  • Effects can exist in only one direction.
  • Effects can exist in both directions but the researchers only care about an effect in one direction. There is no drawback to failing to detect an effect in the other direction. (Not recommended.)

The disadvantage of one-tailed tests is that they have no statistical power to detect an effect in the other direction.

As part of your pre-study planning process, determine whether you’ll use the one- or two-tailed version of a hypothesis test. To learn more about this planning process, read 5 Steps for Conducting Scientific Studies with Statistical Analyses .

This post explains the differences between one-tailed and two-tailed statistical hypothesis tests. How these forms of hypothesis tests function is clear and based on mathematics. However, there is some debate about when you can use one-tailed tests. My next post explores this decision in much more depth and explains the different schools of thought and my opinion on the matter— When Can I Use One-Tailed Hypothesis Tests .

If you’re learning about hypothesis testing and like the approach I use in my blog, check out my Hypothesis Testing book! You can find it at Amazon and other retailers.

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Share this:

2 tailed hypothesis testing calculator

Reader Interactions

' src=

August 23, 2024 at 1:28 pm

Thank so much. This is very helpfull

' src=

June 26, 2022 at 12:14 pm

Hi, Can help me with figuring out the null and alternative hypothesis of the following statement? Some claimed that the real average expenditure on beverage by general people is at least $10.

' src=

February 19, 2022 at 6:02 am

thank you for the thoroughly explanation, I’m still strugling to wrap my mind around the t-table and the relation between the alpha values for one or two tail probability and the confidence levels on the bottom (I’m understanding it so wrongly that for me it should be the oposite, like one tail 0,05 should correspond 95% CI and two tailed 0,025 should correspond to 95% because then you got the 2,5% on each side). In my mind if I picture the one tail diagram with an alpha of 0,05 I see the rest 95% inside the diagram, but for a one tail I only see 90% CI paired with a 5% alpha… where did the other 5% go? I tried to understand when you said we should just double the alpha for a one tail probability in order to find the CI but I still cant picture it. I have been trying to understand this. Like if you only have one tail and there is 0,05, shouldn’t the rest be on the other side? why is it then 90%… I know I’m missing a point and I can’t figure it out and it’s so frustrating…

' src=

February 23, 2022 at 10:01 pm

The alpha is the total shaded area. So, if the alpha = 0.05, you know that 5% of the distribution is shaded. The number of tails tells you how to divide the shaded areas. Is it all in one region (1-tailed) or do you split the shaded regions in two (2-tailed)?

So, for a one-tailed test with an alpha of 0.05, the 5% shading is all in one tail. If alpha = 0.10, then it’s 10% on one side. If it’s two-tailed, then you need to split that 10% into two–5% in both tails. Hence, the 5% in a one-tailed test is the same as a two-tailed test with an alpha of 0.10 because that test has the same 5% on one side (but there’s another 5% in the other tail).

It’s similar for CIs. However, for CIs, you shade the middle rather than the extremities. I write about that in one my articles about hypothesis testing and confidence intervals .

I’m not sure if I’m answering your question or not.

' src=

February 17, 2022 at 1:46 pm

I ran a post hoc Dunnett’s test alpha=0.05 after a significant Anova test in Proc Mixed using SAS. I want to determine if the means for treatment (t1, t2, t3) is significantly less than the means for control (p=pathogen). The code for the dunnett’s test is – LSmeans trt / diff=controll (‘P’) adjust=dunnett CL plot=control; I think the lower bound one tailed test is the correct test to run but I’m not 100% sure. I’m finding conflicting information online. In the output table for the dunnett’s test the mean difference between the control and the treatments is t1=9.8, t2=64.2, and t3=56.5. The control mean estimate is 90.5. The adjusted p-value by treatment is t1(p=0.5734), t2 (p=.0154) and t3(p=.0245). The adjusted lower bound confidence limit in order from t1-t3 is -38.8, 13.4, and 7.9. The adjusted upper bound for all test is infinity. The graphical output for the dunnett’s test in SAS is difficult to understand for those of us who are beginner SAS users. All treatments appear as a vertical line below the the horizontal line for control at 90.5 with t2 and t3 in the shaded area. For treatment 1 the shaded area is above the line for control. Looking at just the output table I would say that t2 and t3 are significantly lower than the control. I guess I would like to know if my interpretation of the outputs is correct that treatments 2 and 3 are statistically significantly lower than the control? Should I have used an upper bound one tailed test instead?

' src=

November 10, 2021 at 1:00 am

Thanks Jim. Please help me understand how a two tailed testing can be used to minimize errors in research

' src=

July 1, 2021 at 9:19 am

Hi Jim, Thanks for posting such a thorough and well-written explanation. It was extremely useful to clear up some doubts.

' src=

May 7, 2021 at 4:27 pm

Hi Jim, I followed your instructions for the Excel add-in. Thank you. I am very new to statistics and sort of enjoy it as I enter week number two in my class. I am to select if three scenarios call for a one or two-tailed test is required and why. The problem is stated:

30% of mole biopsies are unnecessary. Last month at his clinic, 210 out of 634 had benign biopsy results. Is there enough evidence to reject the dermatologist’s claim?

Part two, the wording changes to “more than of 30% of biopsies,” and part three, the wording changes to “less than 30% of biopsies…”

I am not asking for the problem to be solved for me, but I cannot seem to find direction needed. I know the elements i am dealing with are =30%, greater than 30%, and less than 30%. 210 and 634. I just don’t know what to with the information. I can’t seem to find an example of a similar problem to work with.

May 9, 2021 at 9:22 pm

As I detail in this post, a two-tailed test tells you whether an effect exists in either direction. Or, is it different from the null value in either direction. For the first example, the wording suggests you’d need a two-tailed test to determine whether the population proportion is ≠ 30%. Whenever you just need to know ≠, it suggests a two-tailed test because you’re covering both directions.

For part two, because it’s in one direction (greater than), you need a one-tailed test. Same for part three but it’s less than. Look in this blog post to see how you’d construct the null and alternative hypotheses for these cases. Note that you’re working with a proportion rather than the mean, but the principles are the same! Just plug your scenario and the concept of proportion into the wording I use for the hypotheses.

I hope that helps!

' src=

April 11, 2021 at 9:30 am

Hello Jim, great website! I am using a statistics program (SPSS) that does NOT compute one-tailed t-tests. I am trying to compare two independent groups and have justifiable reasons why I only care about one direction. Can I do the following? Use SPSS for two-tailed tests to calculate the t & p values. Then report the p-value as p/2 when it is in the predicted direction (e.g , SPSS says p = .04, so I report p = .02), and report the p-value as 1 – (p/2) when it is in the opposite direction (e.g., SPSS says p = .04, so I report p = .98)? If that is incorrect, what do you suggest (hopefully besides changing statistics programs)? Also, if I want to report confidence intervals, I realize that I would only have an upper or lower bound, but can I use the CI’s from SPSS to compute that? Thank you very much!

April 11, 2021 at 5:42 pm

Yes, for p-values, that’s absolutely correct for both cases.

For confidence intervals, if you take one endpoint of a two-side CI, it becomes a one-side bound with half the confidence level.

Consequently, to obtain a one-sided bound with your desired confidence level, you need to take your desired significance level (e.g., 0.05) and double it. Then subtract it from 1. So, if you’re using a significance level of 0.05, double that to 0.10 and then subtract from 1 (1 – 0.10 = 0.90). 90% is the confidence level you want to use for a two-sided test. After obtaining the two-sided CI, use one of the endpoints depending on the direction of your hypothesis (i.e., upper or lower bound). That’s produces the one-sided the bound with the confidence level that you want. For our example, we calculated a 95% one-sided bound.

' src=

March 3, 2021 at 8:27 am

Hi Jim. I used the one-tailed(right) statistical test to determine an anomaly in the below problem statement: On a daily basis, I calculate the (mapped_%) in a common field between two tables.

The way I used the t-test is: On any particular day, I calculate the sample_mean, S.D and sample_count (n=30) for the last 30 days including the current day. My null hypothesis, H0 (pop. mean)=95 and H1>95 (alternate hypothesis). So, I calculate the t-stat based on the sample_mean, pop.mean, sample S.D and n. I then choose the t-crit value for 0.05 from my t-ditribution table for dof(n-1). On the current day if my abs.(t-stat)>t-crit, then I reject the null hypothesis and I say the mapped_pct on that day has passed the t-test.

I get some weird results here, where if my mapped_pct is as low as 6%-8% in all the past 30 days, the t-test still gets a “pass” result. Could you help on this? If my hypothesis needs to be changed.

I would basically look for the mapped_pct >95, if it worked on a static trigger. How can I use the t-test effectively in this problem statement?

' src=

December 18, 2020 at 8:23 pm

Hello Dr. Jim, I am wondering if there is evidence in one of your books or other source you could provide, which supports that it is OK not to divide alpha level by 2 in one-tailed hypotheses. I need the source for supporting evidence in a Portfolio exercise and couldn’t find one.

I am grateful for your reply and for your statistics knowledge sharing!

' src=

November 27, 2020 at 10:31 pm

If I did a one directional F test ANOVA(one tail ) and wanted to calculate a confidence interval for each individual groups (3) mean . Would I use a one tailed or two tailed t , within my confidence interval .

November 29, 2020 at 2:36 am

Hi Bashiru,

F-tests for ANOVA will always be one-tailed for the reasons I discuss in this post. To learn more about, read my post about F-tests in ANOVA .

For the differences between my groups, I would not use t-tests because the family-wise error rate quickly grows out of hand. To learn more about how to compare group means while controlling the familywise error rate, read my post about using post hoc tests with ANOVA . Typically, these are two-side intervals but you’d be able to use one-sided.

' src=

November 26, 2020 at 10:51 am

Hi Jim, I had a question about the formulation of the hypotheses. When you want to test if a beta = 1 or a beta = 0. What will be the null hypotheses? I’m having trouble with finding out. Because in most cases beta = 0 is the null hypotheses but in this case you want to test if beta = 0. so i’m having my doubts can it in this case be the alternative hypotheses or is it still the null hypotheses?

Kind regards, Noa

November 27, 2020 at 1:21 am

Typically, the null hypothesis represents no effect or no relationship. As an analyst, you’re hoping that your data have enough evidence to reject the null and favor the alternative.

Assuming you’re referring to beta as in regression coefficients, zero represents no relationship. Consequently, beta = 0 is the null hypothesis.

You might hope that beta = 1, but you don’t usually include that in your alternative hypotheses. The alternative hypothesis usually states that it does not equal no effect. In other words, there is an effect but it doesn’t state what it is.

There are some exceptions to the above but I’m writing about the standard case.

' src=

November 22, 2020 at 8:46 am

Your articles are a help to intro to econometrics students. Keep up the good work! More power to you!

' src=

November 6, 2020 at 11:25 pm

Hello Jim. Can you help me with these please?

Write the null and alternative hypothesis using a 1-tailed and 2-tailed test for each problem. (In paragraph and symbols)

A teacher wants to know if there is a significant difference in the performance in MAT C313 between her morning and afternoon classes.

It is known that in our university canteen, the average waiting time for a customer to receive and pay for his/her order is 20 minutes. Additional personnel has been added and now the management wants to know if the average waiting time had been reduced.

November 8, 2020 at 12:29 am

I cover how to write the hypotheses for the different types of tests in this post. So, you just need to figure which type of test you need to use. In your case, you want to determine whether the mean waiting time is less than the target value of 20 minutes. That’s a 1-sample t-test because you’re comparing a mean to a target value (20 minutes). You specifically want to determine whether the mean is less than the target value. So, that’s a one-tailed test. And, you’re looking for a mean that is “less than” the target.

So, go to the one-tailed section in the post and look for the hypotheses for the effect being less than. That’s the one with the critical region on the left side of the curve.

Now, you need include your own information. In your case, you’re comparing the sample estimate to a population mean of 20. The 20 minutes is your null hypothesis value. Use the symbol mu μ to represent the population mean.

You put all that together and you get the following:

Null: μ ≥ 20 Alternative: μ 0 to denote the null hypothesis and H 1 or H A to denote the alternative hypothesis if that’s what you been using in class.

' src=

October 17, 2020 at 12:11 pm

I was just wondering if you could please help with clarifying what the hypothesises would be for say income for gamblers and, age of gamblers. I am struggling to find which means would be compared.

October 17, 2020 at 7:05 pm

Those are both continuous variables, so you’d use either correlation or regression for them. For both of those analyses, the hypotheses are the following:

Null : The correlation or regression coefficient equals zero (i.e., there is no relationship between the variables) Alternative : The coefficient does not equal zero (i.e., there is a relationship between the variables.)

When the p-value is less than your significance level, you reject the null and conclude that a relationship exists.

' src=

October 17, 2020 at 3:05 am

I was ask to choose and justify the reason between a one tailed and two tailed test for dummy variables, how do I do that and what does it mean?

October 17, 2020 at 7:11 pm

I don’t have enough information to answer your question. A dummy variable is also known as an indicator variable, which is a binary variable that indicates the presence or absence of a condition or characteristic. If you’re using this variable in a hypothesis test, I’d presume that you’re using a proportions test, which is based on the binomial distribution for binary data.

Choosing between a one-tailed or two-tailed test depends on subject area issues and, possibly, your research objectives. Typically, use a two-tailed test unless you have a very good reason to use a one-tailed test. To understand when you might use a one-tailed test, read my post about when to use a one-tailed hypothesis test .

' src=

October 16, 2020 at 2:07 pm

In your one-tailed example, Minitab describes the hypotheses as “Test of mu = 100 vs > 100”. Any idea why Minitab says the null is “=” rather than “= or less than”? No ASCII character for it?

October 16, 2020 at 4:20 pm

I’m not entirely sure even though I used to work there! I know we had some discussions about how to represent that hypothesis but I don’t recall the exact reasoning. I suspect that it has to do with the conclusions that you can draw. Let’s focus on the failing to reject the null hypothesis. If the test statistic falls in that region (i.e., it is not significant), you fail to reject the null. In this case, all you know is that you have insufficient evidence to say it is different than 100. I’m pretty sure that’s why they use the equal sign because it might as well be one.

Mathematically, I think using ≤ is more accurate, which you can really see when you look at the distribution plots. That’s why I phrase the hypotheses using ≤ or ≥ as needed. However, in terms of the interpretation, the “less than” portion doesn’t really add anything of importance. You can conclude that its equal to 100 or greater than 100, but not less than 100.

' src=

October 15, 2020 at 5:46 am

Thank you so much for your timely feedback. It helps a lot

October 14, 2020 at 10:47 am

How can i use one tailed test at 5% alpha on this problem?

A manufacturer of cellular phone batteries claims that when fully charged, the mean life of his product lasts for 26 hours with a standard deviation of 5 hours. Mr X, a regular distributor, randomly picked and tested 35 of the batteries. His test showed that the average life of his sample is 25.5 hours. Is there a significant difference between the average life of all the manufacturer’s batteries and the average battery life of his sample?

October 14, 2020 at 8:22 pm

I don’t think you’d want to use a one-tailed test. The goal is to determine whether the sample is significantly different than the manufacturer’s population average. You’re not saying significantly greater than or less than, which would be a one-tailed test. As phrased, you want a two-tailed test because it can detect a difference in either direct.

It sounds like you need to use a 1-sample t-test to test the mean. During this test, enter 26 as the test mean. The procedure will tell you if the sample mean of 25.5 hours is a significantly different from that test mean. Similarly, you’d need a one variance test to determine whether the sample standard deviation is significantly different from the test value of 5 hours.

For both of these tests, compare the p-value to your alpha of 0.05. If the p-value is less than this value, your results are statistically significant.

' src=

September 22, 2020 at 4:16 am

Hi Jim, I didn’t get an idea that when to use two tail test and one tail test. Will you please explain?

September 22, 2020 at 10:05 pm

I have a complete article dedicated to that: When Can I Use One-Tailed Tests .

Basically, start with the assumption that you’ll use a two-tailed test but then consider scenarios where a one-tailed test can be appropriate. I talk about all of that in the article.

If you have questions after reading that, please don’t hesitate to ask!

' src=

July 31, 2020 at 12:33 pm

Thank you so so much for this webpage.

I have two scenarios that I need some clarification. I will really appreciate it if you can take a look:

So I have several of materials that I know when they are tested after production. My hypothesis is that the earlier they are tested after production, the higher the mean value I should expect. At the same time, the later they are tested after production, the lower the mean value. Since this is more like a “greater or lesser” situation, I should use one tail. Is that the correct approach?

On the other hand, I have several mix of materials that I don’t know when they are tested after production. I only know the mean values of the test. And I only want to know whether one mean value is truly higher or lower than the other, I guess I want to know if they are only significantly different. Should I use two tail for this? If they are not significantly different, I can judge based on the mean values of test alone. And if they are significantly different, then I will need to do other type of analysis. Also, when I get my P-value for two tail, should I compare it to 0.025 or 0.05 if my confidence level is 0.05?

Thank you so much again.

July 31, 2020 at 11:19 pm

For your first, if you absolutely know that the mean must be lower the later the material is tested, that it cannot be higher, that would be a situation where you can use a one-tailed test. However, if that’s not a certainty, you’re just guessing, use a two-tail test. If you’re measuring different items at the different times, use the independent 2-sample t-test. However, if you’re measuring the same items at two time points, use the paired t-test. If it’s appropriate, using the paired t-test will give you more statistical power because it accounts for the variability between items. For more information, see my post about when it’s ok to use a one-tailed test .

For the mix of materials, use a two-tailed test because the effect truly can go either direction.

Always compare the p-value to your full significance level regardless of whether it’s a one or two-tailed test. Don’t divide the significance level in half.

' src=

June 17, 2020 at 2:56 pm

Is it possible that we reach to opposite conclusions if we use a critical value method and p value method Secondly if we perform one tail test and use p vale method to conclude our Ho, then do we need to convert sig value of 2 tail into sig value of one tail. That can be done just by dividing it with 2

June 18, 2020 at 5:17 pm

The p-value method and critical value method will always agree as long as you’re not changing anything about how the methodology.

If you’re using statistical software, you don’t need to make any adjustments. The software will do that for you.

However, if you calculating it by hand, you’ll need to take your significance level and then look in the table for your test statistic for a one-tailed test. For example, you’ll want to look up 5% for a one-tailed test rather than a two-tailed test. That’s not as simple as dividing by two. In this article, I show examples of one-tailed and two-tailed tests for the same degrees of freedom. The t critical value for the two-tailed test is +/- 2.086 while for the one-sided test it is 1.725. It is true that probability associated with those critical values doubles for the one-tailed test (2.5% -> 5%), but the critical value itself is not half (2.086 -> 1.725). Study the first several graphs in this article to see why that is true.

For the p-value, you can take a two-tailed p-value and divide by 2 to determine the one-sided p-value. However, if you’re using statistical software, it does that for you.

' src=

June 11, 2020 at 3:46 pm

Hello Jim, if you have the time I’d be grateful if you could shed some clarity on this scenario:

“A researcher believes that aromatherapy can relieve stress but wants to determine whether it can also enhance focus. To test this, the researcher selected a random sample of students to take an exam in which the average score in the general population is 77. Prior to the exam, these students studied individually in a small library room where a lavender scent was present. If students in this group scored significantly above the average score in general population [is this one-tailed or two-tailed hypothesis?], then this was taken as evidence that the lavender scent enhanced focus.”

Thank you for your time if you do decide to respond.

June 11, 2020 at 4:00 pm

It’s unclear from the information provided whether the researchers used a one-tailed or two-tailed test. It could be either. A two-tailed test can detect effects in both directions, so it could definitely detect an average group score above the population score. However, you could also detect that effect using a one-tailed test if it was set up correctly. So, there’s not enough information in what you provided to know for sure. It could be either.

However, that’s irrelevant to answering the question. The tricky part, as I see it, is that you’re not entirely sure about why the scores are higher. Are they higher because the lavender scent increased concentration or are they higher because the subjects have lower stress from the lavender? Or, maybe it’s not even related to the scent but some other characteristic of the room or testing conditions in which they took the test. You just know the scores are higher but not necessarily why they’re higher.

I’d say that, no, it’s not necessarily evidence that the lavender scent enhanced focus. There are competing explanations for why the scores are higher. Also, it would be best do this as an experiment with a control and treatment group where subjects are randomly assigned to either group. That process helps establish causality rather than just correlation and helps rules out competing explanations for why the scores are higher.

By the way, I spend a lot of time on these issues in my Introduction to Statistics ebook .

' src=

June 9, 2020 at 1:47 pm

If a left tail test has an alpha value of 0.05 how will you find the value in the table

' src=

April 19, 2020 at 10:35 am

Hi Jim, My question is in regards to the results in the table in your example of the one-sample T (Two-Tailed) test. above. What about the P-value? The P-value listed is .018. I assuming that is compared to and alpha of 0.025, correct?

In regression analysis, when I get a test statistic for the predictive variable of -2.099 and a p-value of 0.039. Am I comparing the p-value to an alpha of 0.025 or 0.05? Now if I run a Bootstrap for coefficients analysis, the results say the sig (2-tail) is 0.098. What are the critical values and alpha in this case? I’m trying to reconcile what I am seeing in both tables.

Thanks for your help.

April 20, 2020 at 3:24 am

Hi Marvalisa,

For one-tailed tests, you don’t need to divide alpha in half. If you can tell your software to perform a one-tailed test, it’ll do all the calculations necessary so you don’t need to adjust anything. So, if you’re using an alpha of 0.05 for a one-tailed test and your p-value is 0.04, it is significant. The procedures adjust the p-values automatically and it all works out. So, whether you’re using a one-tailed or two-tailed test, you always compare the p-value to the alpha with no need to adjust anything. The procedure does that for you!

The exception would be if for some reason your software doesn’t allow you to specify that you want to use a one-tailed test instead of a two-tailed test. Then, you divide the p-value from a two-tailed test in half to get the p-value for a one tailed test. You’d still compare it to your original alpha.

For regression, the same thing applies. If you want to use a one-tailed test for a cofficient, just divide the p-value in half if you can’t tell the software that you want a one-tailed test. The default is two-tailed. If your software has the option for one-tailed tests for any procedure, including regression, it’ll adjust the p-value for you. So, in the normal course of things, you won’t need to adjust anything.

' src=

March 26, 2020 at 12:00 pm

Hey Jim, for a one-tailed hypothesis test with a .05 confidence level, should I use a 95% confidence interval or a 90% confidence interval? Thanks

March 26, 2020 at 5:05 pm

You should use a one-sided 95% confidence interval. One-sided CIs have either an upper OR lower bound but remains unbounded on the other side.

' src=

March 16, 2020 at 4:30 pm

This is not applicable to the subject but… When performing tests of equivalence, we look at the confidence interval of the difference between two groups, and we perform two one-sided t-tests for equivalence..

' src=

March 15, 2020 at 7:51 am

Thanks for this illustrative blogpost. I had a question on one of your points though.

By definition of H1 and H0, a two-sided alternate hypothesis is that there is a difference in means between the test and control. Not that anything is ‘better’ or ‘worse’.

Just because we observed a negative result in your example, does not mean we can conclude it’s necessarily worse, but instead just ‘different’.

Therefore while it enables us to spot the fact that there may be differences between test and control, we cannot make claims about directional effects. So I struggle to see why they actually need to be used instead of one-sided tests.

What’s your take on this?

March 16, 2020 at 3:02 am

Hi Dominic,

If you’ll notice, I carefully avoid stating better or worse because in a general sense you’re right. However, given the context of a specific experiment, you can conclude whether a negative value is better or worse. As always in statistics, you have to use your subject-area knowledge to help interpret the results. In some cases, a negative value is a bad result. In other cases, it’s not. Use your subject-area knowledge!

I’m not sure why you think that you can’t make claims about directional effects? Of course you can!

As for why you shouldn’t use one-tailed tests for most cases, read my post When Can I Use One-Tailed Tests . That should answer your questions.

' src=

May 10, 2019 at 12:36 pm

Your website is absolutely amazing Jim, you seem like the nicest guy for doing this and I like how there’s no ulterior motive, (I wasn’t automatically signed up for emails or anything when leaving this comment). I study economics and found econometrics really difficult at first, but your website explains it so clearly its been a big asset to my studies, keep up the good work!

May 10, 2019 at 2:12 pm

Thank you so much, Jack. Your kind words mean a lot!

' src=

April 26, 2019 at 5:05 am

Hy Jim I really need your help now pls

One-tailed and two- tailed hypothesis, is it the same or twice, half or unrelated pls

April 26, 2019 at 11:41 am

Hi Anthony,

I describe how the hypotheses are different in this post. You’ll find your answers.

' src=

February 8, 2019 at 8:00 am

Thank you for your blog Jim, I have a Statistics exam soon and your articles let me understand a lot!

February 8, 2019 at 10:52 am

You’re very welcome! I’m happy to hear that it’s been helpful. Best of luck on your exam!

' src=

January 12, 2019 at 7:06 am

Hi Jim, When you say target value is 5. Do you mean to say the population mean is 5 and we are trying to validate it with the help of sample mean 4.1 using Hypo tests ?.. If it is so.. How can we measure a population parameter as 5 when it is almost impossible o measure a population parameter. Please clarify

January 12, 2019 at 6:57 pm

When you set a target for a one-sample test, it’s based on a value that is important to you. It’s not a population parameter or anything like that. The example in this post uses a case where we need parts that are stronger on average than a value of 5. We derive the value of 5 by using our subject area knowledge about what is required for a situation. Given our product knowledge for the hypothetical example, we know it should be 5 or higher. So, we use that in the hypothesis test and determine whether the population mean is greater than that target value.

When you perform a one-sample test, a target value is optional. If you don’t supply a target value, you simply obtain a confidence interval for the range of values that the parameter is likely to fall within. But, sometimes there is meaningful number that you want to test for specifically.

I hope that clarifies the rational behind the target value!

' src=

November 15, 2018 at 8:08 am

I understand that in Psychology a one tailed hypothesis is preferred. Is that so

November 15, 2018 at 11:30 am

No, there’s no overall preference for one-tailed hypothesis tests in statistics. That would be a study-by-study decision based on the types of possible effects. For more information about this decision, read my post: When Can I Use One-Tailed Tests?

' src=

November 6, 2018 at 1:14 am

I’m grateful to you for the explanations on One tail and Two tail hypothesis test. This opens my knowledge horizon beyond what an average statistics textbook can offer. Please include more examples in future posts. Thanks

November 5, 2018 at 10:20 am

Thank you. I will search it as well.

Stan Alekman

November 4, 2018 at 8:48 pm

Jim, what is the difference between the central and non-central t-distributions w/respect to hypothesis testing?

November 5, 2018 at 10:12 am

Hi Stan, this is something I will need to look into. I know central t-distribution is the common Student t-distribution, but I don’t have experience using non-central t-distributions. There might well be a blog post in that–after I learn more!

' src=

November 4, 2018 at 7:42 pm

this is awesome.

Comments and Questions Cancel reply

2 tailed hypothesis testing calculator

  • Calculators
  • Descriptive Statistics
  • Merchandise
  • Which Statistics Test?

P Value from Z Score Calculator

This is very easy: just stick your Z score in the box marked Z score, select your significance level and whether you're testing a one or two-tailed hypothesis (if you're not sure, go with the defaults), then press the button!

If you need to derive a Z score from raw data, you can find a Z test calculator here .

Z score:

Enter your z score value, and then press the button.

Additional Z Statistic Calculators

If you're interested in using the z statistic for hypothesis testing and the like, then we have a number of other calculators that might help you.

Z-Test Calculator for a Single Sample Z-Test Calculator for 2 Population Proportions Z Score Calculator for a Single Raw Value (Includes z from p )

2 tailed hypothesis testing calculator

Two Sample t test calculator

Instructions: Use this calculator to work on a two-samples t-test, showing all the steps. In order to run the test, you need two provide two independent samples in the spreadsheet below. You can either type the data or simply paste them from Excel.

2 tailed hypothesis testing calculator

Two-sample t-test calculator

This calculator will allow you to get all the details and steps related to the calculation of a two-sample t-test. The process for conducting a t-test is relatively simple, but it requires lots of calculations often times, which will be shown to you in detail by this calculator.

The first step in using this calculator is to use the spreadsheet in which you need to either type or paste the data. You can have your data originally in Excel and then paste it in, no problem. After you type or paste the data, all you need to do is to click on "Calculate" to get all the steps shown.

There are lots of subtleties involved in the process of conducting a t-test. There are certain distribution assumptions that need to be met, it needs to be assessed whether or not the population standard deviation can be assumed to be equal . Once the assumption requirements are cleared, we can proceed with the test statistic calculation.

Two Sample T Test Calculator

Independent t-test Calculator with Samples

Usually there are two different forms that can lead to calculating an independent t-test. You can either have two samples, or you can have the data already summarized. For the latter, use this independent t-test calculator with summarized data .

For the case of two samples, you will first need to conduct descriptive statistics calculations in order to get a summary of the provided independent samples.

Steps for running a independent t-test

  • Step 1: Identify the samples provided. Those samples need to be at least approximately normal
  • Step 2: Usually it is out of the scope of what is required to conduct formal statistical tests, in which case you would like to create a histogram of the samples, to see if they look at least approximately bell-shaped
  • Step 3: If you do need to formally test for the normality of the samples, you can use this normality test calculator
  • Step 4: Once you have cleared the assumptions (if needed), you can proceed with running the actual t-test
  • Step 5: One previous step that is needed too is that about assessing whether the population standard deviations can be assumed to be equal or not

Why do we need to test for the equality of population variances? This is because there is the need to find the standard error for the test, and it turns out that the optimal choice for the standard error depends on whether the population standard deviations are equal or not.

That is a rather technical topic, but in layman terms, if the population variances are equal, then the best choice is to basically pool the available sample variances to get a good standard error estimate.

But if they are not equal, things get a bit more complicated, and some technical corrections are needed, which is what you see reflected in the fact that the formula used is different, and the degrees of freedom are different too.

What is the t-value in a 2 sample test?

The formula used for the independent samples t-test will depend upon whether or not the population variances are assumed to be equal. If they are assumed to be unequal, the formula used is

But, if the population variances are assumed to be equal, you then need to use the following formula:

Equality of Population Variances

When to assume equality of population variances? There is a formal test, which is the F-test for equality of variances, which is conducted by this calculator if you select the option.

Sometimes, different rules of thumb are used, like taking the highest of sample variance, divide by the lowest sample variance and assume that the population variances are equal if this ratio is less than 3, or another rule like that. That is not a completely bad idea, but if you really need to know, it is best to run a formal test.

What are the steps for computing the t-test formula

  • Step 1: Assess whether or not the population variances are equal. Run a F-test for equality of variances if needed
  • Step 2: Depending on whether equality of population variances is assumed or not, you will choose the right formula for the t-test
  • Step 3: For unequal population variances, you use \(t = \frac{\bar X_1 - \bar X_2}{\sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }}\)
  • Step 4: For equal population variances, you use \(t = \frac{\bar X_1 - \bar X_2}{\sqrt{ \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}(\frac{1}{n_1}+\frac{1}{n_2}) } }\)
  • Step 5: Based on the number of degrees of freedom and type of tail, you compute the corresponding p-value, and if the p-value is less than the significance level, the null hypothesis is rejected

The number of degrees of freedom when equal population variances are assumed is \(df = n_1 + n_2\), where \(n_1\) and \(n_2\) are the corresponding sample sizes . Now for unequal variances, the calculation of degrees of freedoms is a lot more complicated.

Is this a t-test calculator with steps?

Yes! This calculator will show you all the steps of the way, from the calculation of descriptive statistics, to the testing for equality of variances (if required) to the use of the proper t-test formula to the discussion and conclusions.

Why is this test statistic calculator useful? Time! You will save lots of time because an independent samples t-test requires a whole lot of calculations.

 Two Independent Samples T-Test

What is an example of a 2 sample t-test?

Suppose that a teacher believes that the average height of eighth graders for two different schools. There is a sample of n = 10 kids for each school, for which their sample heights (in inches) are available:

School 1: 60, 62, 59, 63, 65, 64, 68, 67, 61, 60

School 1: 60, 61, 61, 61, 60, 59, 59, 60, 60, 59

Is there enough evidence to claim that the population mean heights for two schools are different, at the 0.05 significance level?

Solution: The following information sample information has been provided:

60 60
62 61
59 61
63 61
65 60
64 59
68 59
67 60
61 60
60 59

In order to conduct a two-independent samples t-test, we need to compute descriptive statistics of the samples:

  60 60
  62 61
  59 61
  63 61
  65 60
  64 59
  68 59
  67 60
  61 60
  60 59
62.9 60
3.0714 0.8165
10 10

Summarizing, the following descriptive statistics will be used in the calculation of the t-statistic:

The following information has been provided:

Sample Mean 1 \((\bar X_1)\) = \(62.9\)
Sample Standard Deviation 1 \((s_1)\) = \(3.0714\)
Sample Size \((n_1)\) = \(10\)
Sample Mean 2 \((\bar X_2)\) = \(60\)
Sample Standard Deviation 1 \((s_2)\) = \(0.8165\)
Sample Size \((n_2)\) = \(10\)
Significance Level \((\alpha)\) = \(0.05\)

(1) Null and Alternative Hypotheses

The following null and alternative hypotheses need to be tested:

This corresponds to a two-tailed test, for which a t-test for two population means, with two independent samples, with unknown population standard deviations will be used.

Testing for Equality of Variances

A F-test is used to test for the equality of variances. The following F-ratio is obtained:

The critical values are \(F_L = 0.248\) and \(F_U = 4.026\), and since \(F = 14.15\), then the null hypothesis of equal variances is rejected.

(2) Rejection Region

Based on the information provided, the significance level is \(\alpha = 0.05\), and the degrees of freedom are \(df = 10.266\). In fact, the degrees of freedom are computed as follows, assuming that the population variances are unequal:

Hence, it is found that the critical value for this two-tailed test is \(t_c = 2.22\), for \(\alpha = 0.05\) and \(df = 10.266\).

The rejection region for this two-tailed test is \(R = \{t: |t| > 2.22\}\).

(3) Test Statistics

Since it is assumed that the population variances are unequal, the t-statistic is computed as follows:

(4) Decision about the null hypothesis

Since it is observed that \(|t| = 2.886 > t_c = 2.22\), it is then concluded that the null hypothesis is rejected.

Using the P-value approach: The p-value is \(p = 0.0158\), and since \(p = 0.0158 < 0.05\), it is concluded that the null hypothesis is rejected.

(5) Conclusion

It is concluded that the null hypothesis Ho is rejected. Therefore, there is enough evidence to claim that the population mean \(\mu_1\) is different than \(\mu_2\), at the \(\alpha = 0.05\) significance level.

Confidence Interval

The 95% confidence interval is \(0.669 < \mu < 5.131\).


T-Test Results

Other statistical tests of interest

There is an abundance of related statistical tests that you can use. You can try for example this paired t-test calculator . Also you can you this t-test for two samples when you have summarized sample data instead. In that case, the sample data provided is usually the sample means , sample standard deviations and sample sizes.

Other type of t-test calculators include the t-test for one sample . For different types of statistics, you can try this ANOVA calculator , which is similar to the t-test only that with ANOVA you can compare more than 2 groups.

Related Calculators

McNemar's Test Calculator

log in to your account

Reset password.

Critical Value Calculator

Use this calculator for critical values to easily convert a significance level to its corresponding Z value, T score, F-score, or Chi-square value. Outputs the critical region as well. The tool supports one-tailed and two-tailed significance tests / probability values.

Related calculators

  • Using the critical value calculator
  • What is a critical value?
  • T critical value calculation
  • Z critical value calculation
  • F critical value calculation

    Using the critical value calculator

If you want to perform a statistical test of significance (a.k.a. significance test, statistical significance test), determining the value of the test statistic corresponding to the desired significance level is necessary. You need to know the desired error probability ( p-value threshold , common values are 0.05, 0.01, 0.001) corresponding to the significance level of the test. If you know the significance level in percentages, simply subtract it from 100%. For example, 95% significance results in a probability of 100%-95% = 5% = 0.05 .

Then you need to know the shape of the error distribution of the statistic of interest (not to be mistaken with the distribution of the underlying data!) . Our critical value calculator supports statistics which are either:

  • Z -distributed (normally distributed, e.g. absolute difference of means)
  • T -distributed (Student's T distribution, usually appropriate for small sample sizes, equivalent to the normal for sample sizes over 30)
  • X 2 -distributed ( Chi square distribution, often used in goodness-of-fit tests, but also for tests of homogeneity or independence)
  • F -distributed (Fisher-Snedecor distribution), usually used in analysis of variance (ANOVA)

Then, for distributions other than the normal one (Z), you need to know the degrees of freedom . For the F statistic there are two separate degrees of freedom - one for the numerator and one for the denominator.

Finally, to determine a critical region, one needs to know whether they are testing a point null versus a composite alternative (on both sides) or a composite null versus (covering one side of the distribution) a composite alternative (covering the other). Basically, it comes down to whether the inference is going to contain claims regarding the direction of the effect or not. Should one want to claim anything about the direction of the effect, the corresponding null hypothesis is direction as well (one-sided hypothesis).

Depending on the type of test - one-tailed or two-tailed, the calculator will output the critical value or values and the corresponding critical region. For one-sided tests it will output both possible regions, whereas for a two-sided test it will output the union of the two critical regions on the opposite sides of the distribution.

    What is a critical value?

A critical value (or values) is a point on the support of an error distribution which bounds a critical region from above or below. If the statistics falls below or above a critical value (depending on the type of hypothesis, but it has to fall inside the critical region) then a test is declared statistically significant at the corresponding significance level. For example, in a two-tailed Z test with critical values -1.96 and 1.96 (corresponding to 0.05 significance level) the critical regions are from -∞ to -1.96 and from 1.96 to +∞. Therefore, if the statistic falls below -1.96 or above 1.96, the null hypothesis test is statistically significant.

You can think of the critical value as a cutoff point beyond which events are considered rare enough to count as evidence against the specified null hypothesis. It is a value achieved by a distance function with probability equal to or greater than the significance level under the specified null hypothesis. In an error-probabilistic framework, a proper distance function based on a test statistic takes the generic form [1] :

test statistic

X (read "X bar") is the arithmetic mean of the population baseline or the control, μ 0 is the observed mean / treatment group mean, while σ x is the standard error of the mean (SEM, or standard deviation of the error of the mean).

Here is how it looks in practice when the error is normally distributed (Z distribution) with a one-tailed null and alternative hypotheses and a significance level α set to 0.05:

one tailed z critical value

And here is the same significance level when applied to a point null and a two-tailed alternative hypothesis:

two tailed z critical value

The distance function would vary depending on the distribution of the error: Z, T, F, or Chi-square (X 2 ). The calculation of a particular critical value based on a supplied probability and error distribution is simply a matter of calculating the inverse cumulative probability density function (inverse CPDF) of the respective distribution. This can be a difficult task, most notably for the T distribution [2] .

    T critical value calculation

The T-distribution is often preferred in the social sciences, psychiatry, economics, and other sciences where low sample sizes are a common occurrence. Certain clinical studies also fall under this umbrella. This stems from the fact that for sample sizes over 30 it is practically equivalent to the normal distribution which is easier to work with. It was proposed by William Gosset, a.k.a. Student, in 1908 [3] , which is why it is also referred to as "Student's T distribution".

To find the critical t value, one needs to compute the inverse cumulative PDF of the T distribution. To do that, the significance level and the degrees of freedom need to be known. The degrees of freedom represent the number of values in the final calculation of a statistic that are free to vary whilst the statistic remains fixed at a certain value.

It should be noted that there is not, in fact, a single T-distribution, but there are infinitely many T-distributions, each with a different level of degrees of freedom. Below are some key values of the T-distribution with 1 degree of freedom, assuming a one-tailed T test is to be performed. These are often used as critical values to define rejection regions in hypothesis testing.

Probability to T critical value table
Probability valueDegrees of FreedomT critical value
0.2000 1 1.3764
0.1000 1 3.0777
0.0500 1 6.3138
0.0250 1 12.7062
0.0200 1 15.8946
0.0100 1 31.8205
0.0010 1 318.3088
0.0005 1 636.6193

    Z critical value calculation

The Z-score is a statistic showing how many standard deviations away from the normal, usually the mean, a given observation is. It is often called just a standard score, z-value, normal score, and standardized variable. A Z critical value is just a particular cutoff in the error distribution of a normally-distributed statistic.

Z critical values are computed by using the inverse cumulative probability density function of the standard normal distribution with a mean (μ) of zero and standard deviation (σ) of one. Below are some commonly encountered probability values (significance levels) and their corresponding Z values for the critical region, assuming a one-tailed hypothesis .

Probability to Z critical value table
Probability valueZ critical value
0.2000 0.8416
0.1000 1.2816
0.0500 1.6449
0.0250 1.9600
0.0200 2.0537
0.0100 2.3263
0.0010 3.0902
0.0005 3.2905

The critical region defined by each of these would span from the Z value to plus infinity for the right-tailed case, and from minus infinity to minus the Z critical value in the left-tailed case. Our calculator for critical value will both find the critical z value(s) and output the corresponding critical regions for you.

Chi Square (Χ 2 ) critical value calculation

Chi square distributed errors are commonly encountered in goodness-of-fit tests and homogeneity tests, but also in tests for independence in contingency tables. Since the distribution is based on the squares of scores, it only contains positive values. Calculating the inverse cumulative PDF of the distribution is required in order to convert a desired probability (significance) to a chi square critical value.

Just like the T and F distributions, there is a different chi square distribution corresponding to different degrees of freedom. Hence, to calculate a Χ 2 critical value one needs to supply the degrees of freedom for the statistic of interest.

    F critical value calculation

F distributed errors are commonly encountered in analysis of variance (ANOVA), which is very common in the social sciences. The distribution, also referred to as the Fisher-Snedecor distribution, only contains positive values, similar to the Χ 2 one. Similar to the T distribution, there is no single F-distribution to speak of. A different F distribution is defined for each pair of degrees of freedom - one for the numerator and one for the denominator.

Calculating the inverse cumulative PDF of the F distribution specified by the two degrees of freedom is required in order to convert a desired probability (significance) to a critical value. There is no simple solution to find a critical value of f and while there are tables, using a calculator is the preferred approach nowadays.


1 Mayo D.G., Spanos A. (2010) – "Error Statistics", in P. S. Bandyopadhyay & M. R. Forster (Eds.), Philosophy of Statistics, (7, 152–198). Handbook of the Philosophy of Science . The Netherlands: Elsevier.

2 Shaw T.W. (2006) – "Sampling Student's T distribution – use of the inverse cumulative distribution function", Journal of Computational Finance 9(4):37-73, DOI:10.21314/JCF.2006.150

3 "Student" [William Sealy Gosset] (1908) - "The probable error of a mean", Biometrika 6(1):1–25. DOI:10.1093/biomet/6.1.1

Cite this calculator & page

If you'd like to cite this online calculator resource and information as provided on the page, you can use the following citation: Georgiev G.Z., "Critical Value Calculator" , [online] Available at: https://www.gigacalculator.com/calculators/critical-value-calculator.php URL [Accessed Date: 24 Aug, 2024].

Our statistical calculators have been featured in scientific papers and articles published in high-profile science journals by:


The author of this tool

Georgi Z. Georgiev

     Statistical calculators


Two Sample t-test: Definition, Formula, and Example

A two sample t-test is used to determine whether or not two population means are equal.

This tutorial explains the following:

  • The motivation for performing a two sample t-test.
  • The formula to perform a two sample t-test.
  • The assumptions that should be met to perform a two sample t-test.
  • An example of how to perform a two sample t-test.

Two Sample t-test: Motivation

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. Since there are thousands of turtles in each population, it would be too time-consuming and costly to go around and weigh each individual turtle.

Instead, we might take a simple random sample of 15 turtles from each population and use the mean weight in each sample to determine if the mean weight is equal between the two populations:

Two sample t-test example

However, it’s virtually guaranteed that the mean weight between the two samples will be at least a little different. The question is whether or not this difference is statistically significant . Fortunately, a two sample t-test allows us to answer this question.

Two Sample t-test: Formula

A two-sample t-test always uses the following null hypothesis:

  • H 0 : μ 1  = μ 2 (the two population means are equal)

The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  • H 1 (two-tailed): μ 1  ≠ μ 2 (the two population means are not equal)
  • H 1 (left-tailed): μ 1  < μ 2  (population 1 mean is less than population 2 mean)
  • H 1 (right-tailed):  μ 1 > μ 2  (population 1 mean is greater than population 2 mean)

We use the following formula to calculate the test statistic t:

Test statistic:  ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 )

where  x 1  and  x 2 are the sample means, n 1 and n 2  are the sample sizes, and where s p is calculated as:

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)

where s 1 2  and s 2 2  are the sample variances.

If the p-value that corresponds to the test statistic t with (n 1 +n 2 -1) degrees of freedom is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis.

Two Sample t-test: Assumptions

For the results of a two sample t-test to be valid, the following assumptions should be met:

  • The observations in one sample should be independent of the observations in the other sample.
  • The data should be approximately normally distributed.
  • The two samples should have approximately the same variance. If this assumption is not met, you should instead perform Welch’s t-test .
  • The data in both samples was obtained using a random sampling method .

Two Sample t-test : Example

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. To test this, will perform a two sample t-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.

Suppose we collect a random sample of turtles from each population with the following information:

  • Sample size n 1 = 40
  • Sample mean weight  x 1  = 300
  • Sample standard deviation s 1 = 18.5
  • Sample size n 2 = 38
  • Sample mean weight  x 2  = 305
  • Sample standard deviation s 2 = 16.7

Step 2: Define the hypotheses.

We will perform the two sample t-test with the following hypotheses:

  • H 0 :  μ 1  = μ 2 (the two population means are equal)
  • H 1 :  μ 1  ≠ μ 2 (the two population means are not equal)

Step 3: Calculate the test statistic  t .

First, we will calculate the pooled standard deviation s p :

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)  = √  (40-1)18.5 2  +  (38-1)16.7 2  /  (40+38-2)  = 17.647

Next, we will calculate the test statistic  t :

t = ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 ) =  (300-305) / 17.647(√ 1/40 + 1/38 ) =  -1.2508

Step 4: Calculate the p-value of the test statistic  t .

According to the T Score to P Value Calculator , the p-value associated with t = -1.2508 and degrees of freedom = n 1 +n 2 -2 = 40+38-2 = 76 is  0.21484 .

Step 5: Draw a conclusion.

Since this p-value is not less than our significance level α = 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that the mean weight of turtles between these two populations is different.

Note:  You can also perform this entire two sample t-test by simply using the Two Sample t-test Calculator .

Additional Resources

The following tutorials explain how to perform a two-sample t-test using different statistical programs:

How to Perform a Two Sample t-test in Excel How to Perform a Two Sample t-test in SPSS How to Perform a Two Sample t-test in Stata How to Perform a Two Sample t-test in R How to Perform a Two Sample t-test in Python How to Perform a Two Sample t-test on a TI-84 Calculator

Featured Posts

2 tailed hypothesis testing calculator

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “Two Sample t-test: Definition, Formula, and Example”

I like the detailed information and simplified in the way I can understand and relate easily. Thank you

It seems a couple of parenthesis is missed at the pooled standard deviation formula. Under square root you have (n1-1)s12 + (n2-1)s22 / (n1+n2-2) but it should be [(n1-1)s12 + (n2-1)s22] / (n1+n2-2) I used square bracket

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a mean (two tailed).

A population mean is an average of value a population.

Hypothesis tests are used to check a claim about the size of that population mean.

Hypothesis Testing a Mean

The following steps are used for a hypothesis test:

  • Check the conditions
  • Define the claims
  • Decide the significance level
  • Calculate the test statistic

For example:

  • Population : Nobel Prize winners
  • Category : Age when they received the prize.

And we want to check the claim:

"The average age of Nobel Prize winners when they received the prize is not 60"

By taking a sample of 30 randomly selected Nobel Prize winners we could find that:

  • The mean age in the sample (\(\bar{x}\)) is 62.1
  • The standard deviation of age in the sample (\(s\)) is 13.46

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

  • The sample is randomly selected
  • The population data is normally distributed
  • Sample size is large enough

A moderately large sample size, like 30, is typically large enough.

In the example, the sample size was 30 and it was randomly selected, so the conditions are fulfilled.

Note: Checking if the data is normally distributed can be done with specialized statistical tests.

2. Defining the Claims

We need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking.

The claim was:

In this case, the parameter is the mean age of Nobel Prize winners when they received the prize (\(\mu\)).

The null and alternative hypothesis are then:

Null hypothesis : The average age was 60.

Alternative hypothesis : The average age is not 60.

Which can be expressed with symbols as:

\(H_{0}\): \(\mu = 60 \)

\(H_{1}\): \(\mu \neq 60 \)

This is a ' two-tailed ' test, because the alternative hypothesis claims that the proportion is different from the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.


3. Deciding the Significance Level

The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

  • \(\alpha = 0.1\) (10%)
  • \(\alpha = 0.05\) (5%)
  • \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population mean is:

\(\displaystyle \frac{\bar{x} - \mu}{s} \cdot \sqrt{n} \)

\(\bar{x}-\mu\) is the difference between the sample mean (\(\bar{x}\)) and the claimed population mean (\(\mu\)).

\(s\) is the sample standard deviation .

\(n\) is the sample size.

In our example:

The claimed (\(H_{0}\)) population mean (\(\mu\)) was \( 60 \)

The sample mean (\(\bar{x}\)) was \(62.1\)

The sample standard deviation (\(s\)) was \(13.46\)

The sample size (\(n\)) was \(30\)

So the test statistic (TS) is then:

\(\displaystyle \frac{62.1-60}{13.46} \cdot \sqrt{30} = \frac{2.1}{13.46} \cdot \sqrt{30} \approx 0.156 \cdot 5.477 = \underline{0.855}\)

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic.

With R use built-in math and statistics functions to calculate the test statistic.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

  • The critical value approach compares the test statistic with the critical value of the significance level.
  • The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)).

For a population mean test, the critical value (CV) is a T-value from a student's t-distribution .

This critical T-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population proportion is different from 60, the rejection region is split into both the left and right tail:

The student's t-distribution is adjusted for the uncertainty from smaller samples.

This adjustment is called degrees of freedom (df), which is the sample size \((n) - 1\)

In this case the degrees of freedom (df) is: \(30 - 1 = \underline{29} \)

Choosing a significance level (\(\alpha\)) of 0.05, or 5%, we can find the critical T-value from a T-table , or with a programming language function:

Note: Because this is a two-tailed test the tail area (\(\alpha\)) needs to be split in half (divided by 2).

With Python use the Scipy Stats library t.ppf() function find the T-Value for an \(\alpha\)/2 = 0.025 at 29 degrees of freedom (df).

With R use the built-in qt() function to find the t-value for an \(\alpha\)/ = 0.025 at 29 degrees of freedom (df).

Using either method we can find that the critical T-Value is \(\approx \underline{-2.045}\)

For a two-tailed test we need to check if the test statistic (TS) is smaller than the negative critical value (-CV), or bigger than the positive critical value (CV).

If the test statistic is smaller than the negative critical value, the test statistic is in the rejection region .

If the test statistic is bigger than the positive critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)).

Here, the test statistic (TS) was \(\approx \underline{0.855}\) and the critical value was \(\approx \underline{-2.045}\)

Here is an illustration of this test in a graph:

Since the test statistic is between the critical values we keep the null hypothesis.

This means that the sample data does not support the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data does not support the claim that "The average age of Nobel Prize winners when they received the prize is not 60" at a 5% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)).

The test statistic was found to be \( \approx \underline{0.855} \)

For a population proportion test, the test statistic is a T-Value from a student's t-distribution .

Because this is a two-tailed test, we need to find the P-value of a T-value bigger than 0.855 and multiply it by 2 .

The student's t-distribution is adjusted according to degrees of freedom (df), which is the sample size \((30) - 1 = \underline{29}\)

We can find the P-value using a T-table , or with a programming language function:

With Python use the Scipy Stats library t.cdf() function find the P-value of a T-value bigger than 0.855 for a two tailed test at 29 degrees of freedom (df):

With R use the built-in pt() function find the P-value of a T-Value bigger than 0.855 for a two tailed test at 29 degrees of freedom (df):

Using either method we can find that the P-value is \(\approx \underline{0.3996}\)

This tells us that the significance level (\(\alpha\)) would need to be smaller 0.3996, or 39.96%, to reject the null hypothesis.

This P-value is bigger than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is kept at all of these significance levels.

The sample data does not support the claim that "The average age of Nobel Prize winners when they received the prize is not 60" at a 10%, 5%, or 1% significance level .

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a two tailed hypothesis test for a mean.

Here, the sample size is 30, the sample mean is 62.1, the sample standard deviation is 13.46, and the test is for a mean different from 60.

With R use built-in math and statistics functions find the P-value for a two tailed hypothesis test for a mean.

Left-Tailed and Two-Tailed Tests

This was an example of a left tailed test, where the alternative hypothesis claimed that parameter is smaller than the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

  • Right-Tailed Test
  • Two-Tailed Test

Get Certified



Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

One Sample T Test Calculator

Enter sample data

example from excel

Reporting results in APA style

One sample t-test, what is a one sample t-test, how to use the one sample t test calculator, calculators.

Calculator: One-Sample t-Test

Skip Navigation Links

One-Sample t-Test Calculator

This calculator will conduct a complete one-sample t-test, given the sample mean, the sample size, the hypothesized mean, and the sample standard deviation. The results generated by the calculator include the t-statistic, the degrees of freedom, the critical t-values for both one-tailed (directional) and two-tailed (non-directional) hypotheses, and the one-tailed and two-tailed probability values associated with the test. Please enter the necessary parameter values, and then click 'Calculate'.

Sample mean (x):
Sample size:
Sample standard deviation:

p-value Calculator

What is p-value, how do i calculate p-value from test statistic, how to interpret p-value, how to use the p-value calculator to find p-value from test statistic, how do i find p-value from z-score, how do i find p-value from t, p-value from chi-square score (χ² score), p-value from f-score.

Welcome to our p-value calculator! You will never again have to wonder how to find the p-value, as here you can determine the one-sided and two-sided p-values from test statistics, following all the most popular distributions: normal, t-Student, chi-squared, and Snedecor's F.

P-values appear all over science, yet many people find the concept a bit intimidating. Don't worry – in this article, we will explain not only what the p-value is but also how to interpret p-values correctly . Have you ever been curious about how to calculate the p-value by hand? We provide you with all the necessary formulae as well!

🙋 If you want to revise some basics from statistics, our normal distribution calculator is an excellent place to start.

Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample . It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true !

More intuitively, p-value answers the question:

Assuming that I live in a world where the null hypothesis holds, how probable is it that, for another sample, the test I'm performing will generate a value at least as extreme as the one I observed for the sample I already have?

It is the alternative hypothesis that determines what "extreme" actually means , so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event, calculated under the assumption that H 0 is true:

Left-tailed test: p-value = Pr(S ≤ x | H 0 )

Right-tailed test: p-value = Pr(S ≥ x | H 0 )

Two-tailed test:

p-value = 2 × min{Pr(S ≤ x | H 0 ), Pr(S ≥ x | H 0 )}

(By min{a,b} , we denote the smaller number out of a and b .)

If the distribution of the test statistic under H 0 is symmetric about 0 , then: p-value = 2 × Pr(S ≥ |x| | H 0 )

or, equivalently: p-value = 2 × Pr(S ≤ -|x| | H 0 )

As a picture is worth a thousand words, let us illustrate these definitions. Here, we use the fact that the probability can be neatly depicted as the area under the density curve for a given distribution. We give two sets of pictures: one for a symmetric distribution and the other for a skewed (non-symmetric) distribution.

  • Symmetric case: normal distribution:

p-values for symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

  • Non-symmetric case: chi-squared distribution:

p-values for non-symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

In the last picture (two-tailed p-value for skewed distribution), the area of the left-hand side is equal to the area of the right-hand side.

To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true . Then, with the help of the cumulative distribution function ( cdf ) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:

Left-tailed test:

p-value = cdf(x) .

Right-tailed test:

p-value = 1 - cdf(x) .

p-value = 2 × min{cdf(x) , 1 - cdf(x)} .

If the distribution of the test statistic under H 0 is symmetric about 0 , then a two-sided p-value can be simplified to p-value = 2 × cdf(-|x|) , or, equivalently, as p-value = 2 - 2 × cdf(|x|) .

The probability distributions that are most widespread in hypothesis testing tend to have complicated cdf formulae, and finding the p-value by hand may not be possible. You'll likely need to resort to a computer or to a statistical table, where people have gathered approximate cdf values.

Well, you now know how to calculate the p-value, but… why do you need to calculate this number in the first place? In hypothesis testing, the p-value approach is an alternative to the critical value approach . Recall that the latter requires researchers to pre-set the significance level, α, which is the probability of rejecting the null hypothesis when it is true (so of type I error ). Once you have your p-value, you just need to compare it with any given α to quickly decide whether or not to reject the null hypothesis at that significance level, α. For details, check the next section, where we explain how to interpret p-values.

As we have mentioned above, the p-value is the answer to the following question:

What does that mean for you? Well, you've got two options:

  • A high p-value means that your data is highly compatible with the null hypothesis; and
  • A small p-value provides evidence against the null hypothesis , as it means that your result would be very improbable if the null hypothesis were true.

However, it may happen that the null hypothesis is true, but your sample is highly unusual! For example, imagine we studied the effect of a new drug and got a p-value of 0.03 . This means that in 3% of similar studies, random chance alone would still be able to produce the value of the test statistic that we obtained, or a value even more extreme, even if the drug had no effect at all!

The question "what is p-value" can also be answered as follows: p-value is the smallest level of significance at which the null hypothesis would be rejected. So, if you now want to make a decision on the null hypothesis at some significance level α , just compare your p-value with α :

  • If p-value ≤ α , then you reject the null hypothesis and accept the alternative hypothesis; and
  • If p-value ≥ α , then you don't have enough evidence to reject the null hypothesis.

Obviously, the fate of the null hypothesis depends on α . For instance, if the p-value was 0.03 , we would reject the null hypothesis at a significance level of 0.05 , but not at a level of 0.01 . That's why the significance level should be stated in advance and not adapted conveniently after the p-value has been established! A significance level of 0.05 is the most common value, but there's nothing magical about it. Here, you can see what too strong a faith in the 0.05 threshold can lead to. It's always best to report the p-value, and allow the reader to make their own conclusions.

Also, bear in mind that subject area expertise (and common reason) is crucial. Otherwise, mindlessly applying statistical principles, you can easily arrive at statistically significant, despite the conclusion being 100% untrue.

As our p-value calculator is here at your service, you no longer need to wonder how to find p-value from all those complicated test statistics! Here are the steps you need to follow:

Pick the alternative hypothesis : two-tailed, right-tailed, or left-tailed.

Tell us the distribution of your test statistic under the null hypothesis: is it N(0,1), t-Student, chi-squared, or Snedecor's F? If you are unsure, check the sections below, as they are devoted to these distributions.

If needed, specify the degrees of freedom of the test statistic's distribution.

Enter the value of test statistic computed for your data sample.

Our calculator determines the p-value from the test statistic and provides the decision to be made about the null hypothesis. The standard significance level is 0.05 by default.

Go to the advanced mode if you need to increase the precision with which the calculations are performed or change the significance level .

In terms of the cumulative distribution function (cdf) of the standard normal distribution, which is traditionally denoted by Φ , the p-value is given by the following formulae:

Left-tailed z-test:

p-value = Φ(Z score )

Right-tailed z-test:

p-value = 1 - Φ(Z score )

Two-tailed z-test:

p-value = 2 × Φ(−|Z score |)

p-value = 2 - 2 × Φ(|Z score |)

🙋 To learn more about Z-tests, head to Omni's Z-test calculator .

We use the Z-score if the test statistic approximately follows the standard normal distribution N(0,1) . Thanks to the central limit theorem, you can count on the approximation if you have a large sample (say at least 50 data points) and treat your distribution as normal.

A Z-test most often refers to testing the population mean , or the difference between two population means, in particular between two proportions. You can also find Z-tests in maximum likelihood estimations.

The p-value from the t-score is given by the following formulae, in which cdf t,d stands for the cumulative distribution function of the t-Student distribution with d degrees of freedom:

Left-tailed t-test:

p-value = cdf t,d (t score )

Right-tailed t-test:

p-value = 1 - cdf t,d (t score )

Two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

p-value = 2 - 2 × cdf t,d (|t score |)

Use the t-score option if your test statistic follows the t-Student distribution . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails – the exact shape depends on the parameter called the degrees of freedom . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from the normal distribution N(0,1).

The most common t-tests are those for population means with an unknown population standard deviation, or for the difference between means of two populations , with either equal or unequal yet unknown population standard deviations. There's also a t-test for paired (dependent) samples .

🙋 To get more insights into t-statistics, we recommend using our t-test calculator .

Use the χ²-score option when performing a test in which the test statistic follows the χ²-distribution .

This distribution arises if, for example, you take the sum of squared variables, each following the normal distribution N(0,1). Remember to check the number of degrees of freedom of the χ²-distribution of your test statistic!

How to find the p-value from chi-square-score ? You can do it with the help of the following formulae, in which cdf χ²,d denotes the cumulative distribution function of the χ²-distribution with d degrees of freedom:

Left-tailed χ²-test:

p-value = cdf χ²,d (χ² score )

Right-tailed χ²-test:

p-value = 1 - cdf χ²,d (χ² score )

Remember that χ²-tests for goodness-of-fit and independence are right-tailed tests! (see below)

Two-tailed χ²-test:

p-value = 2 × min{cdf χ²,d (χ² score ), 1 - cdf χ²,d (χ² score )}

(By min{a,b} , we denote the smaller of the numbers a and b .)

The most popular tests which lead to a χ²-score are the following:

Testing whether the variance of normally distributed data has some pre-determined value. In this case, the test statistic has the χ²-distribution with n - 1 degrees of freedom, where n is the sample size. This can be a one-tailed or two-tailed test .

Goodness-of-fit test checks whether the empirical (sample) distribution agrees with some expected probability distribution. In this case, the test statistic follows the χ²-distribution with k - 1 degrees of freedom, where k is the number of classes into which the sample is divided. This is a right-tailed test .

Independence test is used to determine if there is a statistically significant relationship between two variables. In this case, its test statistic is based on the contingency table and follows the χ²-distribution with (r - 1)(c - 1) degrees of freedom, where r is the number of rows, and c is the number of columns in this contingency table. This also is a right-tailed test .

Finally, the F-score option should be used when you perform a test in which the test statistic follows the F-distribution , also known as the Fisher–Snedecor distribution. The exact shape of an F-distribution depends on two degrees of freedom .

To see where those degrees of freedom come from, consider the independent random variables X and Y , which both follow the χ²-distributions with d 1 and d 2 degrees of freedom, respectively. In that case, the ratio (X/d 1 )/(Y/d 2 ) follows the F-distribution, with (d 1 , d 2 ) -degrees of freedom. For this reason, the two parameters d 1 and d 2 are also called the numerator and denominator degrees of freedom .

The p-value from F-score is given by the following formulae, where we let cdf F,d1,d2 denote the cumulative distribution function of the F-distribution, with (d 1 , d 2 ) -degrees of freedom:

Left-tailed F-test:

p-value = cdf F,d1,d2 (F score )

Right-tailed F-test:

p-value = 1 - cdf F,d1,d2 (F score )

Two-tailed F-test:

p-value = 2 × min{cdf F,d1,d2 (F score ), 1 - cdf F,d1,d2 (F score )}

Below we list the most important tests that produce F-scores. All of them are right-tailed tests .

A test for the equality of variances in two normally distributed populations . Its test statistic follows the F-distribution with (n - 1, m - 1) -degrees of freedom, where n and m are the respective sample sizes.

ANOVA is used to test the equality of means in three or more groups that come from normally distributed populations with equal variances. We arrive at the F-distribution with (k - 1, n - k) -degrees of freedom, where k is the number of groups, and n is the total sample size (in all groups together).

A test for overall significance of regression analysis . The test statistic has an F-distribution with (k - 1, n - k) -degrees of freedom, where n is the sample size, and k is the number of variables (including the intercept).

With the presence of the linear relationship having been established in your data sample with the above test, you can calculate the coefficient of determination, R 2 , which indicates the strength of this relationship . You can do it by hand or use our coefficient of determination calculator .

A test to compare two nested regression models . The test statistic follows the F-distribution with (k 2 - k 1 , n - k 2 ) -degrees of freedom, where k 1 and k 2 are the numbers of variables in the smaller and bigger models, respectively, and n is the sample size.

You may notice that the F-test of an overall significance is a particular form of the F-test for comparing two nested models: it tests whether our model does significantly better than the model with no predictors (i.e., the intercept-only model).

Can p-value be negative?

No, the p-value cannot be negative. This is because probabilities cannot be negative, and the p-value is the probability of the test statistic satisfying certain conditions.

What does a high p-value mean?

A high p-value means that under the null hypothesis, there's a high probability that for another sample, the test statistic will generate a value at least as extreme as the one observed in the sample you already have. A high p-value doesn't allow you to reject the null hypothesis.

What does a low p-value mean?

A low p-value means that under the null hypothesis, there's little probability that for another sample, the test statistic will generate a value at least as extreme as the one observed for the sample you already have. A low p-value is evidence in favor of the alternative hypothesis – it allows you to reject the null hypothesis.

Black Friday

Cubic regression, mean absolute deviation.

  • Biology (103)
  • Chemistry (101)
  • Construction (148)
  • Conversion (304)
  • Ecology (32)
  • Everyday life (263)
  • Finance (592)
  • Health (443)
  • Physics (513)
  • Sports (108)
  • Statistics (184)
  • Other (186)
  • Discover Omni (40)

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

7.4: Hypothesis Tests for a Single Population Mean

  • Last updated
  • Save as PDF
  • Page ID 49002

  • Hannah Seidler-Wright
  • Chaffey College

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

In previous lessons, we have learned that there are two fundamental forms of statistical inference: confidence intervals and hypothesis tests. In the last section, we used confidence intervals to estimate a single population mean. We will now apply the four step hypothesis testing process to test hypotheses about the value of a single population mean. To explore this concept, we will examine examples of water contamination across the US.

The Flint Water Crisis

The water crisis in Flint, Michigan, is an example of environmental injustice that took place beginning in 2014. The city decided to switch its drinking water supply from Detroit’s system to the water from the Flint River to save money. This switch was made despite inadequate treatment and testing of the water from the river. As a result, the community was deeply impacted. Their water supply turned yellow and began to smell and taste of sewage. Members of the community suffered from skin rashes and hair loss. Later, independent research would reveal that the contaminated water was contributing to dangerous increases of blood lead levels in the city’s youth, and the lead level in the water greatly exceeded the EPAs standard.

Eventually, in 2016, with the efforts of community activists and scientists, a federal judge sided with Flint residents and ordered clean bottled water to be delivered to citizens. The following year, the city was ordered to replace the city’s lead pipes with funding from the state and the community was provided with resources to support their health and well-being. However, the fight to bring safe access to the water supply is ongoing.

Dirty water spilling out of a large glass carboy on its side.

Read more about the water crisis in Flint, Michigan: https://www.nrdc.org/stories/flint-water-crisis-everything-you-need-know

Use Statistics to Defend Environmental Justice

To test whether the water was safe, the scientists likely conducted hypothesis tests. Now you will conduct a hypothesis test, like those scientists who fought for the citizens in Flint.

Step 1 Determine the hypotheses

Let \(\mu\) represent the _____________ lead level for the entire water supply.

\(H_0:\mu=\underline{\ \ \ \ \ \ \ \ \ \ }\text{ ppb}\)

Select the appropriate alternative hypothesis:

  • \(H_a: \mu<15 \text{ ppb}\)
  • \(H_a: \mu>15 \text{ ppb}\)
  • \(H_a: \mu \neq 15 \text{ ppb}\)

Select the appropriate test, and explain why:

  • Left-tailed
  • Right-tailed

Step 2 Collect sample data

The sample mean, \(\bar{x}\), is __________ ppb.

The sample standard deviation, s, is __________ ppb.

The sample size, n, is _________ households.

There are __________ degrees of freedom.

Explain why the conditions of the Central Limit Theorem are met:

Step 3 Assess the evidence

Which distribution will we use to find the P-value?

  • The normal distribution because the population standard deviation is given.
  • The student’s T-distribution because the population standard deviation is unknown.

The test statistic, rounded to three decimal places, is

\[T=\frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}=\frac{}{\frac{}{}}=\underline{\ \ \ \ \ \ \ \ \ \ }\nonumber\]

Below is the graph of the T-distribution with _______ degrees of freedom. Label the T-statistic on the horizontal axis. Then shade the region that represents the P-value.


Use https://www.desmos.com/calculator to find the P-value:

  • Enter tdist(199) in the first line.
  • Check the box for Find Cumulative Probability (CDF)
  • The minimum and maximum will default to \(-\infty\) and \(\infty\) respectively. Enter the T-score in the min or max so that the graph matches the graph above.

Step 4 State a conclusion in context

The level of significance is \(\alpha=\underline{\ \ \ \ \ \ \ \ \ \ }\). The P-value (rounded to three decimal places) is 0.004. Fill in the blank with \(\leq\) or \(>\):

\[0.004\ \underline{\ \ \ \ \ \ \ \ \ \ }\ 0.01\nonumber\]

Defend the citizens of Flint:

The evidence supports the claim that the true _____________ lead level for the entire water supply in Flint is ___________________ _______________________ 15 ppb. Therefore, by the current standards, the lead level in Flint’s water supply is dangerously high.

Summary of Hypothesis Testing Process for a Single Population Mean

Step 1: determine the hypotheses.

In order to test a claim about a population parameter, we create two opposing hypotheses. We call these the null hypothesis, \(H_0\), and the alternative hypothesis, \(H_a\). Let \(\mu\) represent a given population mean.

The Null Hypothesis

In every hypothesis test, we assume that the null hypothesis is true. The null hypothesis is always a statement of equality and therefore, should always contain an equal symbol (=). When a test involves a single population mean, the null hypothesis will be

\[H_0: \mu=value\nonumber\]

The Alternative Hypothesis

The alternative hypothesis is a claim implied by the research question and is an inequality. The alternative hypothesis states that population mean is greater than (>), less than (<), or not equal (≠) to the assumed value in the null hypothesis.

When a test involves a single population mean, alternative hypothesis will be one of the following:

\[\begin{aligned} & H_a: \mu>\text { value } \\ & H_a: \mu<\text { value } \\ & H_a: \mu \neq \text { value } \end{aligned}\]

Step 2: Collect Sample Data

During a hypothesis test, we work to know if a sample statistic is unusual or not. Therefore, we must think about probabilities from a sampling distribution.

In a previous lesson, we learned about the sampling distribution of sample means. The Central Limit Theorem says that a sampling distribution of sample means is approximately normal if either the sample size, n, is greater than 30 or sampling was performed from a normally distributed population. In the second step of a hypothesis test, we verify that the sampling distribution is approximately normal and we identify or compute any sample statistics.

Step 3: Assess the Evidence

This step is all about probability. Since the sampling distribution is approximately normal (as determined in step 2), and the population standard deviation is likely unknown, we can compute a T-score and use the student’s T-distribution to find probabilities. The sampling distribution of sample means has mean


and standard error


where \(\mu\) is the assumed population mean, s is the sample standard deviation, and n is the sample size. The test statistic is

\[T=\frac{x-\mu}{\sigma} \text { which translates to } T=\frac{\bar{x}-\mu_{\bar{x}}}{\sigma_{\bar{x}}}=\frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}\nonumber\]

when looking at the sampling distribution of sample means.


Step 4: State a Conclusion

Hypothesis tests are all about making decisions. We use the P-value to make a decision about the null and alternative hypotheses.

We compare our P-value to a level of significance. The level of significance, denoted \(\alpha\) (the greek letter “alpha”), is how unlikely a sample statistic needs to be to convince us about a claim. It is also the level of risk we accept in being wrong.

We have only two possible conclusions:

  • If the P-value \(\leq \alpha\), we reject the null hypothesis and support the alternative hypothesis.
  • This does not make the null hypothesis true—we cannot prove the null hypothesis because sample data cannot reveal the true value of the population mean.













Let \(\mu\)

represent _____________________________________________________________________


Which type of test and why?

The sampling distribution is approximately normal because ______________________________


Go to https://www.desmos.com/calculator . Type C= into the first line and copy and paste the data.

\(\bar{x}=\operatorname{mean}(C)=\underline{\ \ \ \ \ \ \ \ \ \ }\text { ounces }\)

\(s=\operatorname{stdev}(C)=\underline{\ \ \ \ \ \ \ \ \ \ }\text { ounces }\)

\(n=\underline{\ \ \ \ \ \ \ \ \ \ }\text{ and }df=\underline{\ \ \ \ \ \ \ \ \ \ }\)


P-value is _________________________

Compare the P-value and the significance level. Make a decision, and state a conclusion in context:


  1. Two Tailed Binomial Hypothesis Test on fx-CG50

    2 tailed hypothesis testing calculator

  2. Z two tailed hypothesis test calculator

    2 tailed hypothesis testing calculator

  3. Two tail hypothesis test calculator

    2 tailed hypothesis testing calculator

  4. T two tailed hypothesis test calculator

    2 tailed hypothesis testing calculator

  5. Hypothesis Testing

    2 tailed hypothesis testing calculator

  6. Two-Tailed Z-Test

    2 tailed hypothesis testing calculator


  1. Hypothesis Testing Calculator with Steps

    Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a). If there is a ...

  2. t-test Calculator

    Decide on the alternative hypothesis: Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value. ... Right-tailed. This t-test calculator allows you to use either the p-value approach or the critical ...

  3. Two Sample t-test Calculator

    If this is not the case, you should instead use the Welch's t-test calculator. To perform a two sample t-test, simply fill in the information below and then click the "Calculate" button. Enter raw data Enter summary data. Sample 1. 301, 298, 295, 297, 304, 305, 309, 298, 291, 299, 293, 304. Sample 2.

  4. Two Sample T-Test Calculator (Pooled-Variance)

    Two tailed test example: A factory uses two identical machines to produce plastic plates. You would expect both machines to produce the same number of plates per minute. Let μ1 = average number of plates produced by machine1 per minute. Let μ2 = average number of plates produced by machine2 per minute. We would expect μ1 to be equal to μ2.

  5. Hypothesis Test Calculator

    Use this Hypothesis Test Calculator for quick results in Python and R. Learn the step-by-step hypothesis test process and why hypothesis testing is important. Hypothesis Test Calculator | 365 Data Science

  6. T-Test Calculator for 2 Independent Means

    Enter the values for your two treatment conditions into the text boxes below, either one score per line or as a comma delimited list. Select your significance level and whether your hypothesis is one or two-tailed. Then give your data a final check, and press the "Calculate T and P Values" button. Treatment 1 ( X) Treatment 2 ( X) Significance ...

  7. T-Test Calculator

    Two sample and one sample t-test calculator with step by step explanation. Site map; Math Tests; Math Lessons; Math Formulas; Calculators; ... Two Tailed Test (default) One Tailed Test: 3. Significance Level: 0.05 (default) 0.01: 0.001: 4. Choose a test Unpaired T Test (default)

  8. T test calculator

    A t test compares the means of two groups. There are several types of two sample t tests and this calculator focuses on the three most common: unpaired, welch's, and paired t tests. Directions for using the calculator are listed below, along with more information about two sample t tests and help on which is appropriate for your analysis. NOTE: This is not the same as a one sample t test; for ...

  9. P Value Calculator: One-Tailed and Two-Tailed Tests

    Finding p value from the Pearson (r) score involves the following steps: Calculate the test statistic (t) t= r n-2 (1 - r2) Determine the degrees of freedom (df) = n−2. Use the t-distribution table to determine the critical t-value and interpolate (if necessary) y = y + (x - x 1 ) (y 2 - y 1) x 2 - x 1. One Tail.

  10. T-Test Calculator for 2 Independent Means

    T-Test Calculator for 2 Independent Means. This simple t -test calculator, provides full details of the t-test calculation, including sample mean, sum of squares and standard deviation. A t -test is used when you're looking at a numerical variable - for example, height - and then comparing the averages of two separate populations or groups (e.g ...

  11. One-Tailed and Two-Tailed Hypothesis Tests Explained

    Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution. ... Use SPSS for two-tailed tests to calculate the t & p values. Then report the p-value as ...

  12. P Value from Z Score Calculator

    Quick P Value from Z Score Calculator. P Value from Z Score Calculator. This is very easy: just stick your Z score in the box marked Z score, select your significance level and whether you're testing a one or two-tailed hypothesis (if you're not sure, go with the defaults), then press the button! If you need to derive a Z score from raw data ...

  13. Z-test Calculator

    Choose the alternative hypothesis: two-tailed or left/right-tailed. In our Z-test calculator, you can decide whether to use the p-value or critical regions approach. In the latter case, set the significance level, α. \alpha α. Enter the value of the test statistic, z. z z.

  14. P-value Calculator & Statistical Significance Calculator

    The p-value is for a one-sided hypothesis (one-tailed test), allowing you to infer the direction of the effect (more on one vs. two-tailed tests). However, the probability value for the two-sided hypothesis (two-tailed p-value) is also calculated and displayed, although it should see little to no practical applications.

  15. Two Sample t test calculator

    Instructions: Use this calculator to work on a two-samples t-test, showing all the steps. In order to run the test, you need two provide two independent samples in the spreadsheet below. You can either type the data or simply paste them from Excel. Ho: \mu_1 μ1 \mu_2 μ2. Ha: \mu_1 μ1 \mu_2 μ2. Significance Level ( \alpha α) =.

  16. Critical Value Calculator

    For example, in a two-tailed Z test with critical values -1.96 and 1.96 (corresponding to 0.05 significance level) the critical regions are from -∞ to -1.96 and from 1.96 to +∞. Therefore, if the statistic falls below -1.96 or above 1.96, the null hypothesis test is statistically significant.

  17. Two-Tailed Hypothesis Tests: 3 Example Problems

    Plugging these values into the One Sample t-test Calculator, we obtain the following results: t-test statistic: 2.1689; two-tailed p-value: 0.0478; ... This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The professor believes that the studying technique will influence the ...

  18. Two Sample t-test: Definition, Formula, and Example

    A two-sample t-test always uses the following null hypothesis: H 0: μ 1 = μ 2 (the two population means are equal) The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed: H 1 (two-tailed): μ 1 ≠ μ 2 (the two population means are not equal) H 1 (left-tailed): μ 1 < μ 2 (population 1 mean is less than population ...

  19. Critical Value Calculator

    Welcome to the critical value calculator! Here you can quickly determine the critical value (s) for two-tailed tests, as well as for one-tailed tests. It works for most common distributions in statistical testing: the standard normal distribution N (0,1) (that is when you have a Z-score), t-Student, chi-square, and F-distribution.

  20. Statistics

    With Python use the scipy and math libraries to calculate the P-value for a two tailed hypothesis test for a mean. Here, the sample size is 30, the sample mean is 62.1, the sample standard deviation is 13.46, and the test is for a mean different from 60. ... With R use built-in math and statistics functions find the P-value for a two tailed ...

  21. One Sample T Test Calculator

    The one sample t-test, also called single sample t-test, compares the sample mean to the population mean. more How to use the one sample t test calculator? Choose the tails: The one-tailed test, left or right, is more powerful than the two-tailed test and results in a smaller p-value, half since the t distribution is symmetrical.

  22. Free One-Sample t-Test Calculator

    This calculator will conduct a complete one-sample t-test, given the sample mean, the sample size, the hypothesized mean, and the sample standard deviation. The results generated by the calculator include the t-statistic, the degrees of freedom, the critical t-values for both one-tailed (directional) and two-tailed (non-directional) hypotheses, and the one-tailed and two-tailed probability ...

  23. p-value Calculator

    It is the alternative hypothesis that determines what "extreme" actually means, so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event ...

  24. 7.4: Hypothesis Tests for a Single Population Mean

    The alternative hypothesis is a claim implied by the research question and is an inequality. The alternative hypothesis states that population mean is greater than (>), less than (<), or not equal (≠) to the assumed value in the null hypothesis. When a test involves a single population mean, alternative hypothesis will be one of the following: