Z-Test for Statistical Hypothesis Testing Explained

z test hypothesis testing examples

The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .

There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.

What Is a Z-Test?

A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.  

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.

Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.

Requirements for a Z-Test

In order to conduct a Z-test, your statistics need to meet a few requirements, including:

  • A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
  • The standard deviation and mean of the population is known .
  • The sample data is collected/acquired randomly .

More on Data Science:   What Is Bootstrapping Statistics?

Z-Test Steps

There are four steps to complete a Z-test. Let’s examine each one.

4 Steps to a Z-Test

  • State the null hypothesis.
  • State the alternate hypothesis.
  • Choose your critical value.
  • Calculate your Z-test statistics. 

1. State the Null Hypothesis

The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :

Null hypothesis equation generated in LaTeX.

2. State the Alternate Hypothesis

Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:

Alternate hypothesis equation generated in LaTeX.

3. Choose Your Critical Value

Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:

Z-test critical value plot.

This critical value is based on confidence intervals.

4. Calculate Your Z-Test Statistic

Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :

Z-test statistic equation generated in LaTeX.

If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.

Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.

More on Data Science: Basic Probability Theory and Statistics Terms to Know

Z-Test Example

Let’s go through an example to fully understand the one-sample mean Z-test.

A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?

The null and alternate hypotheses are:

Null hypothesis and alternate hypothesis generated in LaTeX.

Where we are saying that our sample, the school, has a higher mean IQ than the population mean.

Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.

If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.

Computing our Z-test statistic, we see:

Z-test statistic equation generated in LaTeX.

Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.

Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.  

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

Logo for Maricopa Open Digital Press

10 Chapter 10: Hypothesis Testing with Z

Setting up the hypotheses.

When setting up the hypotheses with z, the parameter is associated with a sample mean (in the previous chapter examples the parameters for the null used 0). Using z is an occasion in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the US, as our null value and test for differences against that. For now, we will focus on testing a value of a single mean against what we expect from the population.

Using birthweight as an example, our null hypothesis takes the form: H 0 : μ = 7.47 Notice that we are testing the value for μ, the population parameter, NOT the sample statistic ̅X (or M). We are referring to the data right now in raw form (we have not standardized it using z yet). Again, using inferential statistics, we are interested in understanding the population, drawing from our sample observations. For the research question, we have a mean value from the sample to use, we have specific data is – it is observed and used as a comparison for a set point.

As mentioned earlier, the alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. We will set the criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative.

If we expect our obtained sample mean to be above or below the null hypothesis value (knowing which direction), we set a directional hypothesis. O ur alternative hypothesis takes the form based on the research question itself. In our example with birthweight, this could be presented as H A : μ > 7.47 or H A : μ < 7.47. 

Note that we should only use a directional hypothesis if we have a good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative hypothesis. In our birthweight example, this could be set as H A : μ ≠ 7.47

In working with data for this course we will need to set a critical value of the test statistic for alpha (α) for use of test statistic tables in the back of the book. This is determining the critical rejection region that has a set critical value based on α.

Determining Critical Value from α

We set alpha (α) before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use.

When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a non-directional hypothesis . To test the significance of a non-directional hypothesis, we have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. We call this a two-tailed test .

z test hypothesis testing examples

Figure 1. showing a 2-tail test for non-directional hypothesis for z for area C is the critical rejection region.

When a research hypothesis predicts a direction for the effect, it is called a directional hypothesis . To test the significance of a directional hypothesis, we have to consider the possibility that the sample could be extreme at one-tail of the comparison distribution. We call this a one-tailed test .

z test hypothesis testing examples

Figure 2. showing a 1-tail test for a directional hypothesis (predicting an increase) for z for area C is the critical rejection region.

Determining Cutoff Scores with Two-Tailed Tests

Typically we specify an α level before analyzing the data. If the data analysis results in a probability value below the α level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected. In other words, if our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis ; if not, we fail to reject the null (we never “accept” the null). According to this perspective, if a result is significant, then it does not matter how significant it is. Moreover, if it is not significant, then it does not matter how close to being significant it is. Therefore, if the 0.05 level is being used, then probability values of 0.049 and 0.001 are treated identically. Similarly, probability values of 0.06 and 0.34 are treated identically. Note we will discuss ways to address effect size (which is related to this challenge of NHST).

When setting the probability value, there is a special complication in a two-tailed test. We have to divide the significance percentage between the two tails. For example, with a 5% significance level, we reject the null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%. A one-tailed test does have such an extreme value but with a one-tailed test only one side of the distribution is considered.

z test hypothesis testing examples

Figure 3. Critical value differences in one and two-tail tests. Photo Credit

Let’s re view th e set critical values for Z.

We discussed z-scores and probability in chapter 8.  If we revisit the z-score for 5% and 1%, we can identify the critical regions for the critical rejection areas from the unit standard normal table.

  • A two-tailed test at the 5% level has a critical boundary Z score of +1.96 and -1.96
  • A one-tailed test at the 5% level has a critical boundary Z score of +1.64 or -1.64
  • A two-tailed test at the 1% level has a critical boundary Z score of +2.58 and -2.58
  • A one-tailed test at the 1% level has a critical boundary Z score of +2.33 or -2.33.

Review: Critical values, p-values, and significance level

There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effec t. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.

Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of α = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure 4.

image

Figure 4: The rejection region for a one-tailed test

The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

The rejection region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (“z-crit”) or z* (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0405 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z* = ±1.96. This is shown in Figure 5.

image

Figure 5: Two-tailed rejection region

Thus, any z-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z-scores in this way, the obtained value of z (sometimes called z-obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis.

Calculate the test statistic: Z

Now that we understand setting up the hypothesis and determining the outcome, let’s examine hypothesis testing with z!  The next step is to carry out the study and get the actual results for our sample. Central to hypothesis test is comparison of the population and sample means. To make our calculation and determine where the sample is in the hypothesized distribution we calculate the Z for the sample data.

Make a decision

To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.

The formula for our z- statistic has not changed:

z test hypothesis testing examples

To formally test our hypothesis, we compare our obtained z-statistic to our critical z-value. If z obt > z crit , that means it falls in the rejection region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2) and so we reject H 0 . If z obt < z crit , we fail to reject. Remember that as z gets larger, the corresponding area under the curve beyond z gets smaller. Thus, the proportion, or p-value, will be smaller than the area for α, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.

Conversely, if we fail to reject, we know that the proportion will be larger than α because the z-statistic will not be as far into the tail. This is illustrated for a one- tailed test in Figure 6.

image

Figure 6. Relation between α, z obt , and p

When the null hypothesis is rejected, the effect is said to be statistically significant . Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

Review: Steps of the Hypothesis Testing Process

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remained of the textbook and course, and though the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: α, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.

Step 3: Compute the Test Statistic

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example: Movie Popcorn

Let’s see how hypothesis testing works in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ =0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled, so he looks for differences in both directions. This scenario has all of the information we need to begin our hypothesis testing procedure.

Our manager is looking for a difference in the mean cups of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

H 0 : There is no difference in the cups of popcorn bags from this employee H 0 : μ = 8.00

Notice that we phrase the hypothesis in terms of the population parameter μ, which in this case would be the true average cups of bags filled by the new employee.

Our assumption of no difference, the null hypothesis, is that this mean is exactly

the same as the known population mean value we want it to match, 8.00. Now let’s do the alternative:

H A : There is a difference in the cups of popcorn bags from this employee H A : μ ≠ 8.00

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that α = 0.05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z-test at α = 0.05 are z* = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution so we can visualize the rejection region and make sure it makes sense

image

Figure 7: Rejection region for z* = ±1.96

Step 3: Calculate the Test Statistic

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average cups of this employee’s popcorn bags is ̅X = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z:

So our test statistic is z = -2.50, which we can draw onto our rejection region distribution:

image

Figure 8: Test statistic location

Looking at Figure 5, we can see that our obtained z-statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, -2.50 > -1.96, so we reject the null hypothesis. We can now write our conclusion:

When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the exact location doesn’t matter, just somewhere that flows naturally and makes sense) and the z-statistic and p-value. We don’t know the exact p-value, but we do know that because we rejected the null, it must be less than α.

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is.

For mean differences like we calculated here, our effect size is Cohen’s d :

z test hypothesis testing examples

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate an effect size

Table 1. Interpretation of Cohen’s d

Example: Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degree Fahrenheit but is allowed

to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

H 0 : There is no difference in the average building temperature H 0 : μ = 74

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

H A : The average building temperature is higher than claimed H A : μ > 74

image

Now that you have everything set up, you spend one week collecting temperature data:

You calculate the average of these scores to be 𝑋̅ = 76.6 degrees. You use this to calculate the test statistic, using μ = 74 (the supposed average temperature), σ = 1.00 (how much the temperature should vary), and n = 5 (how many data points you collected):

z = 76.60 − 74.00 = 2.60    = 5.78

          1.00/√5            0.45

This value falls so far into the tail that it cannot even be plotted on the distribution!

image

Figure 7: Obtained z-statistic

You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.

d = (76.60-74.00)/ 1= 2.60

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Example: Different Significance Level

First, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, α = 0.01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value: H 0 : The average score does not differ from the population H 0 : μ = 50

We will assume a two-tailed test: H A : The average score does differ H A : μ ≠ 50

We have seen the critical values for z-tests at α = 0.05 levels of significance several times. To find the values for α = 0.01, we will go to the standard normal table and find the z-score cutting of 0.005 (0.01 divided by 2 for a two-tailed test) of the area in the tail, which is z crit * = ±2.575. Notice that this cutoff is much higher than it was for α = 0.05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic.  The average of 10 scores is M = 60.40 with a µ = 60. We will use σ = 10 as our known population standard deviation. From this information, we calculate our z-statistic as:

Our obtained z-statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Notice two things about the end of the conclusion. First, we wrote that p is greater than instead of p is less than, like we did in the previous two examples. This is because we failed to reject the null hypothesis. We don’t know exactly what the p- value is, but we know it must be larger than the α level we used to test our hypothesis. Second, we used 0.01 instead of the usual 0.05, because this time we tested at a different level. The number you compare to the p-value should always be the significance level you test at. Because we did not detect a statistically significant effect, we do not need to calculate an effect size. Note: some statisticians will suggest to always calculate effects size as a possibility of Type II error. Although insignificant, calculating d = (60.4-60)/10 = .04 which suggests no effect (and not a possibility of Type II error).

Review Considerations in Hypothesis Testing

Errors in hypothesis testing.

Keep in mind that rejecting the null hypothesis is not an all-or-nothing decision. The Type I error rate is affected by the α level: the lower the α level the lower the Type I error rate. It might seem that α is the probability of a Type I error. However, this is not correct. Instead, α is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error. The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error.

Statistical Power

The statistical power of a research design is the probability of rejecting the null hypothesis given the sample size and expected relationship strength. Statistical power is the complement of the probability of committing a Type II error. Clearly, researchers should be interested in the power of their research designs if they want to avoid making Type II errors. In particular, they should make sure their research design has adequate power before collecting data. A common guideline is that a power of .80 is adequate. This means that there is an 80% chance of rejecting the null hypothesis for the expected relationship strength.

Given that statistical power depends primarily on relationship strength and sample size, there are essentially two steps you can take to increase statistical power: increase the strength of the relationship or increase the sample size. Increasing the strength of the relationship can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design). The usual strategy, however, is to increase the sample size. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.

Inferential statistics uses data from a sample of individuals to reach conclusions about the whole population. The degree to which our inferences are valid depends upon how we selected the sample (sampling technique) and the characteristics (parameters) of population data. Statistical analyses assume that sample(s) and population(s) meet certain conditions called statistical assumptions.

It is easy to check assumptions when using statistical software and it is important as a researcher to check for violations; if violations of statistical assumptions are not appropriately addressed then results may be interpreted incorrectly.

Learning Objectives

Having read the chapter, students should be able to:

  • Conduct a hypothesis test using a z-score statistics, locating critical region, and make a statistical decision including.
  • Explain the purpose of measuring effect size and power, and be able to compute Cohen’s d.

Exercises – Ch. 10

  • List the main steps for hypothesis testing with the z-statistic. When and why do you calculate an effect size?
  • z = 1.99, two-tailed test at α = 0.05
  • z = 1.99, two-tailed test at α = 0.01
  • z = 1.99, one-tailed test at α = 0.05
  • You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with μ = 78 and σ = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: 82, 74, 62, 68, 79, 94, 90, 81, 80.
  • A study examines self-esteem and depression in teenagers.  A sample of 25 teens with a low self-esteem are given the Beck Depression Inventory.  The average score for the group is 20.9.  For the general population, the average score is 18.3 with σ = 12.  Use a two-tail test with α = 0.05 to examine whether teenagers with low self-esteem show significant differences in depression.
  • You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $12 (μ = 42, σ = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the α = 0.05 level of significance.

Answers to Odd- Numbered Exercises – Ch. 10

1. List hypotheses. Determine critical region. Calculate z.  Compare z to critical region. Draw Conclusion.  We calculate an effect size when we find a statistically significant result to see if our result is practically meaningful or important

5. Step 1: H 0 : μ = 42 “My average tips does not differ from other servers”, H A : μ ≠ 42 “My average tips do differ from others”

Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Z-test Calculator

What is a z-test, when do i use z-tests, z-test formula, p-value from z-test, two-tailed z-test and one-tailed z-test, z-test critical values & critical regions, how to use the one-sample z-test calculator, z-test examples.

This Z-test calculator is a tool that helps you perform a one-sample Z-test on the population's mean . Two forms of this test - a two-tailed Z-test and a one-tailed Z-tests - exist, and can be used depending on your needs. You can also choose whether the calculator should determine the p-value from Z-test or you'd rather use the critical value approach!

Read on to learn more about Z-test in statistics, and, in particular, when to use Z-tests, what is the Z-test formula, and whether to use Z-test vs. t-test. As a bonus, we give some step-by-step examples of how to perform Z-tests!

Or you may also check our t-statistic calculator , where you can learn the concept of another essential statistic. If you are also interested in F-test, check our F-statistic calculator .

A one sample Z-test is one of the most popular location tests. The null hypothesis is that the population mean value is equal to a given number, μ 0 \mu_0 μ 0 ​ :

We perform a two-tailed Z-test if we want to test whether the population mean is not μ 0 \mu_0 μ 0 ​ :

and a one-tailed Z-test if we want to test whether the population mean is less/greater than μ 0 \mu_0 μ 0 ​ :

Let us now discuss the assumptions of a one-sample Z-test.

You may use a Z-test if your sample consists of independent data points and:

the data is normally distributed , and you know the population variance ;

the sample is large , and data follows a distribution which has a finite mean and variance. You don't need to know the population variance.

The reason these two possibilities exist is that we want the test statistics that follow the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) . In the former case, it is an exact standard normal distribution, while in the latter, it is approximately so, thanks to the central limit theorem.

The question remains, "When is my sample considered large?" Well, there's no universal criterion. In general, the more data points you have, the better the approximation works. Statistics textbooks recommend having no fewer than 50 data points, while 30 is considered the bare minimum.

Let x 1 , . . . , x n x_1, ..., x_n x 1 ​ , ... , x n ​ be an independent sample following the normal distribution N ( μ , σ 2 ) \mathrm N(\mu, \sigma^2) N ( μ , σ 2 ) , i.e., with a mean equal to μ \mu μ , and variance equal to σ 2 \sigma ^2 σ 2 .

We pose the null hypothesis, H 0  ⁣  ⁣ :  ⁣  ⁣   μ = μ 0 \mathrm H_0 \!\!:\!\! \mu = \mu_0 H 0 ​ :   μ = μ 0 ​ .

We define the test statistic, Z , as:

x ˉ \bar x x ˉ is the sample mean, i.e., x ˉ = ( x 1 + . . . + x n ) / n \bar x = (x_1 + ... + x_n) / n x ˉ = ( x 1 ​ + ... + x n ​ ) / n ;

μ 0 \mu_0 μ 0 ​ is the mean postulated in H 0 \mathrm H_0 H 0 ​ ;

n n n is sample size; and

σ \sigma σ is the population standard deviation.

In what follows, the uppercase Z Z Z stands for the test statistic (treated as a random variable), while the lowercase z z z will denote an actual value of Z Z Z , computed for a given sample drawn from N(μ,σ²).

If H 0 \mathrm H_0 H 0 ​ holds, then the sum S n = x 1 + . . . + x n S_n = x_1 + ... + x_n S n ​ = x 1 ​ + ... + x n ​ follows the normal distribution, with mean n μ 0 n \mu_0 n μ 0 ​ and variance n 2 σ n^2 \sigma n 2 σ . As Z Z Z is the standardization (z-score) of S n / n S_n/n S n ​ / n , we can conclude that the test statistic Z Z Z follows the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , provided that H 0 \mathrm H_0 H 0 ​ is true. By the way, we have the z-score calculator if you want to focus on this value alone.

If our data does not follow a normal distribution, or if the population standard deviation is unknown (and thus in the formula for Z Z Z we substitute the population standard deviation σ \sigma σ with sample standard deviation), then the test statistics Z Z Z is not necessarily normal. However, if the sample is sufficiently large, then the central limit theorem guarantees that Z Z Z is approximately N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

In the sections below, we will explain to you how to use the value of the test statistic, z z z , to make a decision , whether or not you should reject the null hypothesis . Two approaches can be used in order to arrive at that decision: the p-value approach, and critical value approach - and we cover both of them! Which one should you use? In the past, the critical value approach was more popular because it was difficult to calculate p-value from Z-test. However, with help of modern computers, we can do it fairly easily, and with decent precision. In general, you are strongly advised to report the p-value of your tests!

Formally, the p-value is the smallest level of significance at which the null hypothesis could be rejected. More intuitively, p-value answers the questions: provided that I live in a world where the null hypothesis holds, how probable is it that the value of the test statistic will be at least as extreme as the z z z - value I've got for my sample? Hence, a small p-value means that your result is very improbable under the null hypothesis, and so there is strong evidence against the null hypothesis - the smaller the p-value, the stronger the evidence.

To find the p-value, you have to calculate the probability that the test statistic, Z Z Z , is at least as extreme as the value we've actually observed, z z z , provided that the null hypothesis is true. (The probability of an event calculated under the assumption that H 0 \mathrm H_0 H 0 ​ is true will be denoted as P r ( event ∣ H 0 ) \small \mathrm{Pr}(\text{event} | \mathrm{H_0}) Pr ( event ∣ H 0 ​ ) .) It is the alternative hypothesis which determines what more extreme means :

  • Two-tailed Z-test: extreme values are those whose absolute value exceeds ∣ z ∣ |z| ∣ z ∣ , so those smaller than − ∣ z ∣ -|z| − ∣ z ∣ or greater than ∣ z ∣ |z| ∣ z ∣ . Therefore, we have:

The symmetry of the normal distribution gives:

  • Left-tailed Z-test: extreme values are those smaller than z z z , so
  • Right-tailed Z-test: extreme values are those greater than z z z , so

To compute these probabilities, we can use the cumulative distribution function, (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , which for a real number, x x x , is defined as:

Also, p-values can be nicely depicted as the area under the probability density function (pdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , due to:

With all the knowledge you've got from the previous section, you're ready to learn about Z-tests.

  • Two-tailed Z-test:

From the fact that Φ ( − z ) = 1 − Φ ( z ) \Phi(-z) = 1 - \Phi(z) Φ ( − z ) = 1 − Φ ( z ) , we deduce that

The p-value is the area under the probability distribution function (pdf) both to the left of − ∣ z ∣ -|z| − ∣ z ∣ , and to the right of ∣ z ∣ |z| ∣ z ∣ :

two-tailed p value

  • Left-tailed Z-test:

The p-value is the area under the pdf to the left of our z z z :

left-tailed p value

  • Right-tailed Z-test:

The p-value is the area under the pdf to the right of z z z :

right-tailed p value

The decision as to whether or not you should reject the null hypothesis can be now made at any significance level, α \alpha α , you desire!

if the p-value is less than, or equal to, α \alpha α , the null hypothesis is rejected at this significance level; and

if the p-value is greater than α \alpha α , then there is not enough evidence to reject the null hypothesis at this significance level.

The critical value approach involves comparing the value of the test statistic obtained for our sample, z z z , to the so-called critical values . These values constitute the boundaries of regions where the test statistic is highly improbable to lie . Those regions are often referred to as the critical regions , or rejection regions . The decision of whether or not you should reject the null hypothesis is then based on whether or not our z z z belongs to the critical region.

The critical regions depend on a significance level, α \alpha α , of the test, and on the alternative hypothesis. The choice of α \alpha α is arbitrary; in practice, the values of 0.1, 0.05, or 0.01 are most commonly used as α \alpha α .

Once we agree on the value of α \alpha α , we can easily determine the critical regions of the Z-test:

To decide the fate of H 0 \mathrm H_0 H 0 ​ , check whether or not your z z z falls in the critical region:

If yes, then reject H 0 \mathrm H_0 H 0 ​ and accept H 1 \mathrm H_1 H 1 ​ ; and

If no, then there is not enough evidence to reject H 0 \mathrm H_0 H 0 ​ .

As you see, the formulae for the critical values of Z-tests involve the inverse, Φ − 1 \Phi^{-1} Φ − 1 , of the cumulative distribution function (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

Our calculator reduces all the complicated steps:

Choose the alternative hypothesis: two-tailed or left/right-tailed.

In our Z-test calculator, you can decide whether to use the p-value or critical regions approach. In the latter case, set the significance level, α \alpha α .

Enter the value of the test statistic, z z z . If you don't know it, then you can enter some data that will allow us to calculate your z z z for you:

  • sample mean x ˉ \bar x x ˉ (If you have raw data, go to the average calculator to determine the mean);
  • tested mean μ 0 \mu_0 μ 0 ​ ;
  • sample size n n n ; and
  • population standard deviation σ \sigma σ (or sample standard deviation if your sample is large).

Results appear immediately below the calculator.

If you want to find z z z based on p-value , please remember that in the case of two-tailed tests there are two possible values of z z z : one positive and one negative, and they are opposite numbers. This Z-test calculator returns the positive value in such a case. In order to find the other possible value of z z z for a given p-value, just take the number opposite to the value of z z z displayed by the calculator.

To make sure that you've fully understood the essence of Z-test, let's go through some examples:

  • A bottle filling machine follows a normal distribution. Its standard deviation, as declared by the manufacturer, is equal to 30 ml. A juice seller claims that the volume poured in each bottle is, on average, one liter, i.e., 1000 ml, but we suspect that in fact the average volume is smaller than that...

Formally, the hypotheses that we set are the following:

H 0  ⁣ :   μ = 1000  ml \mathrm H_0 \! : \mu = 1000 \text{ ml} H 0 ​ :   μ = 1000  ml

H 1  ⁣ :   μ < 1000  ml \mathrm H_1 \! : \mu \lt 1000 \text{ ml} H 1 ​ :   μ < 1000  ml

We went to a shop and bought a sample of 9 bottles. After carefully measuring the volume of juice in each bottle, we've obtained the following sample (in milliliters):

1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 \small 1020, 970, 1000, 980, 1010, 930, 950, 980, 980 1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 .

Sample size: n = 9 n = 9 n = 9 ;

Sample mean: x ˉ = 980   m l \bar x = 980 \ \mathrm{ml} x ˉ = 980   ml ;

Population standard deviation: σ = 30   m l \sigma = 30 \ \mathrm{ml} σ = 30   ml ;

And, therefore, p-value = Φ ( − 2 ) ≈ 0.0228 \text{p-value} = \Phi(-2) \approx 0.0228 p-value = Φ ( − 2 ) ≈ 0.0228 .

As 0.0228 < 0.05 0.0228 \lt 0.05 0.0228 < 0.05 , we conclude that our suspicions aren't groundless; at the most common significance level, 0.05, we would reject the producer's claim, H 0 \mathrm H_0 H 0 ​ , and accept the alternative hypothesis, H 1 \mathrm H_1 H 1 ​ .

We tossed a coin 50 times. We got 20 tails and 30 heads. Is there sufficient evidence to claim that the coin is biased?

Clearly, our data follows Bernoulli distribution, with some success probability p p p and variance σ 2 = p ( 1 − p ) \sigma^2 = p (1-p) σ 2 = p ( 1 − p ) . However, the sample is large, so we can safely perform a Z-test. We adopt the convention that getting tails is a success.

Let us state the null and alternative hypotheses:

H 0  ⁣ :   p = 0.5 \mathrm H_0 \! : p = 0.5 H 0 ​ :   p = 0.5 (the coin is fair - the probability of tails is 0.5 0.5 0.5 )

H 1  ⁣ :   p ≠ 0.5 \mathrm H_1 \! : p \ne 0.5 H 1 ​ :   p  = 0.5 (the coin is biased - the probability of tails differs from 0.5 0.5 0.5 )

In our sample we have 20 successes (denoted by ones) and 30 failures (denoted by zeros), so:

Sample size n = 50 n = 50 n = 50 ;

Sample mean x ˉ = 20 / 50 = 0.4 \bar x = 20/50 = 0.4 x ˉ = 20/50 = 0.4 ;

Population standard deviation is given by σ = 0.5 × 0.5 \sigma = \sqrt{0.5 \times 0.5} σ = 0.5 × 0.5 ​ (because 0.5 0.5 0.5 is the proportion p p p hypothesized in H 0 \mathrm H_0 H 0 ​ ). Hence, σ = 0.5 \sigma = 0.5 σ = 0.5 ;

  • And, therefore

Since 0.1573 > 0.1 0.1573 \gt 0.1 0.1573 > 0.1 we don't have enough evidence to reject the claim that the coin is fair , even at such a large significance level as 0.1 0.1 0.1 . In that case, you may safely toss it to your Witcher or use the coin flip probability calculator to find your chances of getting, e.g., 10 heads in a row (which are extremely low!).

What is the difference between Z-test vs t-test?

We use a t-test for testing the population mean of a normally distributed dataset which had an unknown population standard deviation . We get this by replacing the population standard deviation in the Z-test statistic formula by the sample standard deviation, which means that this new test statistic follows (provided that H₀ holds) the t-Student distribution with n-1 degrees of freedom instead of N(0,1) .

When should I use t-test over the Z-test?

For large samples, the t-Student distribution with n degrees of freedom approaches the N(0,1). Hence, as long as there are a sufficient number of data points (at least 30), it does not really matter whether you use the Z-test or the t-test, since the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test instead of Z-test .

How do I calculate the Z test statistic?

To calculate the Z test statistic:

  • Compute the arithmetic mean of your sample .
  • From this mean subtract the mean postulated in null hypothesis .
  • Multiply by the square root of size sample .
  • Divide by the population standard deviation .
  • That's it, you've just computed the Z test statistic!

Books vs e-books

Helium balloons, mcnemar's test.

  • Biology ( 97 )
  • Chemistry ( 97 )
  • Construction ( 142 )
  • Conversion ( 283 )
  • Ecology ( 28 )
  • Everyday life ( 259 )
  • Finance ( 564 )
  • Food ( 65 )
  • Health ( 438 )
  • Math ( 656 )
  • Physics ( 508 )
  • Sports ( 102 )
  • Statistics ( 182 )
  • Other ( 179 )
  • Discover Omni ( 40 )

Introduction to Statistics and Data Analysis

6 hypothesis testing: the z-test.

We’ve all had the experience of standing at a crosswalk waiting staring at a pedestrian traffic light showing the little red man. You’re waiting for the little green man so you can cross. After a little while you’re still waiting and there aren’t any cars around. You might think ‘this light is really taking a long time’, but you continue waiting. Minutes pass and there’s still no little green man. At some point you come to the conclusion that the light is broken and you’ll never see that little green man. You cross on the little red man when it’s clear.

You may not have known this but you just conducted a hypothesis test. When you arrived at the crosswalk, you assumed that the light was functioning properly, although you will always entertain the possibility that it’s broken. In terms of hypothesis testing, your ‘null hypothesis’ is that the light is working and your ‘alternative hypothesis’ is that it’s broken. As time passes, it seems less and less likely that light is working properly. Eventually, the probability of the light working given how long you’ve been waiting becomes so low that you reject the null hypothesis in favor of the alternative hypothesis.

This sort of reasoning is the backbone of hypothesis testing and inferential statistics. It’s also the point in the course where we turn the corner from descriptive statistics to inferential statistics. Rather than describing our data in terms of means and plots, we will now start using our data to make inferences, or generalizations, about the population that our samples are drawn from. In this course we’ll focus on standard hypothesis testing where we set up a null hypothesis and determine the probability of our observed data under the assumption that the null hypothesis is true (the much maligned p-value). If this probability is small enough, then we conclude that our data suggests that the null hypothesis is false, so we reject it.

In this chapter, we’ll introduce hypothesis testing with examples from a ‘z-test’, when we’re comparing a single mean to what we’d expect from a population with known mean and standard deviation. In this case, we can convert our observed mean into a z-score for the standard normal distribution. Hence the name z-test.

It’s time to introduce the hypothesis test flow chart . It’s pretty self explanatory, even if you’re not familiar with all of these hypothesis tests. The z-test is (1) based on means, (2) with only one mean, and (3) where we know \(\sigma\) , the standard deviation of the population. Here’s how to find the z-test in the flow chart:

z test hypothesis testing examples

6.1 Women’s height example

Let’s work with the example from the end of the last chapter where we started with the fact that the heights of US women has a mean of 63 and a standard deviation of 2.5 inches. We calculated that the average height of the 122 women in Psych 315 is 64.7 inches. We then used the central limit theorem and calculated the probability of a random sample 122 heights from this population having a mean of 64.7 or greater is 2.4868996^{-14}. This is a very, very small number.

Here’s how we do it using R:

Let’s think of our sample as a random sample of UW psychology students, which is a reasonable assumption since all psychology students have to take a statistics class. What does this sample say about the psychology students that are women at UW compared to the US population? It could be that these psychology students at UW have the same mean and standard deviation as the US population, but our sample just happens to have an unusual number of tall women, but we calculated that the probability of this happening is really low. Instead, it makes more sense that the population that we’re drawing from has a mean that’s greater than the US population mean. Notice that we’re making a conclusion about the whole population of women psychology students based on our one sample.

Using the terminology of hypothesis testing, we first assumed the null hypothesis that UW women psych students have the same mean (and standard deviation) as the US population. The null hypothesis is written as:

\[ H_{0}: \mu = 63 \] In this example, our alternative hypothesis is that the mean of our population is larger than the mean of null hypothesis population. We write this as:

\[ H_{A}: \mu > 63 \]

Next, after obtaining a random sample and calculate the mean, we calculate the probability of drawing a mean this large (or larger) from the null hypothesis distribution.

If this probability is low enough, we reject the null hypothesis in favor of the alternative hypothesis. When our probability allows us to reject the null hypothesis, we say that our observed results are ‘statistically significant’.

In statistics terms, we never say we ‘accept that alternative hypothesis’ as true. All we can say is that we don’t think the null hypothesis is true. I know it’s subtle, but in science can never prove that a hypothesis is true or not. There’s always the possibility that we just happened to grab an unusual sample from the null hypothesis distribution.

6.2 The hated p<.05

The probability that we obtain our observed mean or greater given that the null hypothesis is true is called the p-value. How improbable is improbable enough to reject the null hypothesis? The p-value for our example above on women’s heights is astronomically low, so it’s clear that we should reject \(H_{0}\) .

The p-value that’s on the border of rejection is called the alpha ( \(\alpha\) ) value. We reject \(H_{0}\) when our p-value is less than \(\alpha\) .

You probably know that the most common value of alpha is \(\alpha = .05\) .

The first publication of this value dates back to Sir Ronald Fisher, in his seminal 1925 book Statistical Methods for Research Workers where he states:

“It is convenient to take this point as a limit in judging whether a deviation is considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant.” (p. 47)

If you read the chapter on the normal distribution, then you should know that 95% of the area under the normal distribution lies within \(\pm\) two standard deviations of the mean. So the probability of obtaining a sample that exceeds two standard deviations from the mean (in either direction) is .05.

6.3 IQ example

Let’s do an example using IQ scores. IQ scores are normalized to have a mean of 100 and a standard deviation of 15 points. Because they’re normalized, they are a rare example of a population which has a known mean and standard deviation. In the next chapter we’ll discuss the t-test, which is used in the more common situation when we don’t know the population standard deviation.

Suppose you have the suspicion that graduate students have higher IQ’s than the general population. You have enough time to go and measure the IQ’s of 25 randomly sampled grad students and obtain a mean of 105. Is this difference between our this observed mean and 100 statistically significant using an alpha value of \(\alpha = 0.05\) ?

Here the null hypothesis is:

\[ H_{0}: \mu = 100\]

And the alternative hypothesis is:

\[ H_{A}: \mu > 100 \]

We know that the parameters for the null hypothesis are:

\[ \mu = 100 \] and \[ \sigma = 15 \]

From this, we can calculate the probability of observing our mean of 105 or higher using the central limit theorem and what we know about the normal distribution:

\[ \sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}} = \frac{15}{\sqrt{25}} = 3 \] From this, we can calculate the probability of our observed mean using R’s ‘pnorm’ function. Here’s how to do the whole thing in R.

Since our p-value of 0.0478 is (just barely) less than our chosen value of \(\alpha = 0.05\) as our criterion, we reject \(H_{0}\) for this (contrived) example and conclude that our observed mean of 105 is significantly greater than 100, so our study suggests that the average graduate student has a higher IQ than the overall population.

You should feel uncomfortable making such a hard, binary decision for such a borderline case. After all, if we had chosen our second favorite value of alpha, \(\alpha = .01\) , we would have failed to reject \(H_{0}\) . This discomfort is a primary reason why statisticians are moving away from this discrete decision making process. Later on we’ll discuss where things are going, including reporting effect sizes, and using confidence intervals.

6.4 Alpha values vs. critical values

Using R’s qnorm function, we can find the z-score for which only 5% of the area lies above:

So the probability of a randomly sampled z-score exceeding 1.644854 is less than 5%. It follows that if we convert our observed mean into z-score values, we will reject \(H_{0}\) if and only if our z-score is greater than 1.644854. This value is called the ‘critical value’ because it lies on the boundary between rejecting and failing to reject \(H_{0}\) .

In our last example, the z-score for our observed mean is:

\[ z = \frac{X-\mu}{\frac{\sigma}{\sqrt{n}}} = \frac{105 - 100}{3} = 1.67 \] Our z-score is just barely greater than the critical value of 1.644854, which makes sense because our p-value is just barely less than 0.05.

Sometimes you’ll see textbooks will compare critical values to observed scores for the decision making process in hypothesis testing. This dates back to days were computers were less available and we had to rely on tables instead. There wasn’t enough space in a book to hold complete tables which prohibited the ability to look up a p-value for any observed value. Instead only critical values for specific values of alpha were included. If you look at really old papers, you’ll see statistics reported as \(p<.05\) or \(p<.01\) instead of actual p-values for this reason.

It may help to visualize the relationship between p-values, alpha values and critical values like this:

z test hypothesis testing examples

The red shaded region is the upper 5% of the standard normal distribution which starts at the critical value of z=1.644854. This is sometimes called the ‘rejection region’. The blue vertical line is drawn at our observed value of z=1.67. You can see that the red line falls just inside the rejection region, so we Reject \(H_{0}\) !

6.5 One vs. two-tailed tests

Recall that our alternative hypothesis was to reject if our mean IQ was significantly greater than the null hypothesis mean: \(H_{A}: \mu > 100\) . This implies that the situation where \(\mu < 100\) is never even in consideration, which is weird. In science, we’re trying to understand the true state of the world. Although we have a hunch that grad student IQ’s are higher than average, there is always the possibility that they are lower than average. If our sample came up with an IQ well below 100, we’d simply fail to reject \(H_{0}\) and move on. This feels like throwing out important information.

The test we just ran is called a ‘one-tailed’ test because we only reject \(H_{0}\) if our results fall in one of the two tails of the population distribution.

Instead, it might make more sense to reject \(H_{0}\) if we get either an unusually large or small score. This means we need two critical values - one above and one below zero. At first thought you might think we just duplicate our critical value from a one-tailed test to the other side. But will double the area of the rejection region. That’s not a good thing because if \(H_{0}\) is true, there’s actually a \(2\alpha\) probability that we’ll draw a score in the rejection region.

Instead, we divide the area into two tails, each containing an area of \(\frac{\alpha}{2}\) . For \(\alpha\) = 0.05, we can find the critical value of z with qnorm:

So with a two-tailed test at \(\alpha = 0.05\) we reject \(H_{0}\) if our observed z-score is either above z = 1.96 or less than -1.96. This is that value around 2 that Sir Ronald Fischer was talking about!

Here’s what the critical regions and observed value of z looks like for our example with a two-tailed test:

z test hypothesis testing examples

You can see that splitting the area of \(\alpha = 0.05\) into two halves increased the critical value in the positive direction from 1.64 to 1.96, making it harder to reject \(H_{0}\) . For our example, this changes our decision: our observed value of z = 1.67 no longer falls into the rejection region, so now we fail to reject \(H_{0}\) .

If we now fail to reject \(H_{0}\) , what about the p-value? Remember, for a one-tailed test, p = \(\alpha\) if our observed z-score lands right on the critical value of z. The same is true for a two-tailed test. But the z-score moved so that the area above that score is \(\frac{\alpha}{2}\) . So for a two-tailed test, in order to have a p-value of \(\alpha\) when our z-score lands right on the critical value, we need to double p-value hat we’d get for a one-tailed test.

For our example, the p-value for the one tailed test was \(p=0.0478\) . So if we use a two-tailed test, our p-value is \((2)(0.0478) = 0.0956\) . This value is greater than \(\alpha\) = 0.05, which makes sense because we just showed above that we fail to reject \(H_{0}\) with a two tailed test.

Which is the right test, one-tailed or two-tailed? Ideally, as scientists, we should be agnostic about the results of our experiment. But in reality, we all know that the results are more interesting if they are statistically significant. So you can imagine that for this example, given a choice between one and two-tailed, we’d choose a one-tailed test so that we can reject \(H_{0}\) .

There are two problems with this. First, we should never adjust our choice of hypothesis test after we observe the data. That would be an example of ‘p-hacking’, a topic we’ll discuss later. Second, most statisticians these days strongly recommend against one-tailed tests. The only reason for a one-tailed test is if there is no logical or physical possibility for a population mean to fall below the null hypothesis mean.

Statology

Statistics Made Easy

One Sample Z-Test: Definition, Formula, and Example

A  one sample z-test is used to test whether the mean of a population is less than, greater than, or equal to some specific value.

This test assumes that the population standard deviation is known.

This tutorial explains the following:

  • The formula to perform a one sample z-test.
  • The assumptions of a one sample z-test.
  • An example of how to perform a one sample z-test.

Let’s jump in!

One Sample Z-Test: Formula

A one sample z-test will always use one of the following null and alternative hypotheses:

1. Two-Tailed Z-Test

  • H 0 :  μ = μ 0 (population mean is equal to some hypothesized value μ 0 )
  • H A : μ ≠ μ 0 (population mean is not equal to some hypothesized value μ 0 )

2. Left-Tailed Z-Test

  • H 0 :  μ ≥ μ 0 (population mean is greater than or equal to some hypothesized value μ 0 )
  • H A : μ < μ 0 (population mean is less than some hypothesized value μ 0 )

3. Right-Tailed Z-Test

  • H 0 :  μ ≤ μ 0 (population mean is less than or equal to some hypothesized value μ 0 )
  • H A : μ > μ 0 (population mean is greaer than some hypothesized value μ 0 )

We use the following formula to calculate the z test statistic:

z = ( x – μ 0 ) / (σ/√ n )

  • x : sample mean
  • μ 0 : hypothesized population mean
  • σ: population standard deviation
  • n:  sample size

If the p-value that corresponds to the z test statistic is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis .

One Sample Z-Test: Assumptions

For the results of a one sample z-test to be valid, the following assumptions should be met:

  • The data are continuous (not discrete).
  • The data is a simple random sample from the population of interest.
  • The data in the population is approximately normally distributed .
  • The population standard deviation is known.

One Sample Z-Test : Example

Suppose the IQ in a population is normally distributed with a mean of μ = 100 and standard deviation of σ = 15.

A scientist wants to know if a new medication affects IQ levels, so she recruits 20 patients to use it for one month and records their IQ levels at the end of the month:

To test this, she will perform a one sample z-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.

Suppose she collects a simple random sample with the following information:

  • n (sample size) = 20
  •   x (sample mean IQ) = 103.05

Step 2: Define the hypotheses.

She will perform the one sample z-test with the following hypotheses:

  • H 0 :  μ = 100
  • H A :  μ ≠ 100

Step 3: Calculate the z test statistic.

The z test statistic is calculated as:

  • z = (x – μ) / (σ√ n )
  • z = (103.05 – 100) / (15/√ 20 )
  • z = 0.90933

Step 4: Calculate the p-value of the z test statistic.

According to the Z Score to P Value Calculator , the two-tailed p-value associated with z = 0.90933 is 0.36318 .

Step 5: Draw a conclusion.

Since the p-value (0.36318) is not less than the significance level (.05), the scientist will fail to reject the null hypothesis.

There is not sufficient evidence to say that the new medication significantly affects IQ level.

Note:  You can also perform this entire one sample z-test by using the One Sample Z-Test Calculator .

Additional Resources

The following tutorials explain how to perform a one sample z-test using different statistical software:

How to Perform Z-Tests in Excel How to Perform Z-Tests in R How to Perform Z-Tests in Python

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search Search Please fill out this field.

What Is a Z-Test?

Understanding z-tests, one-sample z-test example.

  • Z-Test FAQs

The Bottom Line

  • Corporate Finance
  • Financial Analysis

Z-Test Definition: Its Uses in Statistics Simply Explained With Example

James Chen, CMT is an expert trader, investment adviser, and global market strategist.

z test hypothesis testing examples

Investopedia / Julie Bang

A z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large.

The test statistic is assumed to have a normal distribution , and nuisance parameters such as standard deviation should be known in order for an accurate z-test to be performed.

Key Takeaways

  • A z-test is a statistical test to determine whether two population means are different when the variances are known and the sample size is large.
  • A z-test is a hypothesis test in which the z-statistic follows a normal distribution. 
  • A z-statistic, or z-score, is a number representing the result from the z-test.
  • Z-tests are closely related to t-tests, but t-tests are best performed when an experiment has a small sample size.
  • Z-tests assume the standard deviation is known, while t-tests assume it is unknown.

The z-test is also a hypothesis test in which the z-statistic follows a normal distribution. The z-test is best used for greater-than-30 samples because, under the central limit theorem , as the number of samples gets larger, the samples are considered to be approximately normally distributed.

When conducting a z-test, the null and alternative hypotheses, alpha and z-score should be stated. Next, the test statistic should be calculated, and the results and conclusion stated. A z-statistic, or z-score, is a number representing how many standard deviations above or below the mean population a score derived from a z-test is.

Examples of tests that can be conducted as z-tests include a one-sample location test, a two-sample location test, a paired difference test, and a maximum likelihood estimate. Z-tests are closely related to t-tests, but t-tests are best performed when an experiment has a small sample size. Also, t-tests assume the standard deviation is unknown, while z-tests assume it is known. If the standard deviation of the population is unknown, the assumption of the sample variance equaling the population variance is made.

Assume an investor wishes to test whether the average daily return of a stock is greater than 3%. A simple random sample of 50 returns is calculated and has an average of 2%. Assume the standard deviation of the returns is 2.5%. Therefore, the null hypothesis is when the average, or mean, is equal to 3%.

Conversely, the alternative hypothesis is whether the mean return is greater or less than 3%. Assume an alpha of 0.05% is selected with a two-tailed test . Consequently, there is 0.025% of the samples in each tail, and the alpha has a critical value of 1.96 or -1.96. If the value of z is greater than 1.96 or less than -1.96, the null hypothesis is rejected.

The value for z is calculated by subtracting the value of the average daily return selected for the test, or 1% in this case, from the observed average of the samples. Next, divide the resulting value by the standard deviation divided by the square root of the number of observed values.

Therefore, the test statistic is:

(0.02 - 0.01) ÷ (0.025 ÷ √ 50) = 2.83

The investor rejects the null hypothesis since z is greater than 1.96 and concludes that the average daily return is greater than 1%.

What's the Difference Between a T-Test and Z-Test?

Z-tests are closely related to t-tests, but t-tests are best performed when the data consists of a small sample size, i.e., less than 30. Also, t-tests assume the standard deviation is unknown, while z-tests assume it is known.

When Should You Use a Z-Test?

If the standard deviation of the population is unknown and the sample size is greater than or equal to 30, then the assumption of the sample variance equaling the population variance should be made using the z-test. Regardless of the sample size, if the population standard deviation for a variable remains unknown, a t-test should be used instead.

What Is a Z-Score?

A z-score, or z-statistic, is a number representing how many standard deviations above or below the mean population the score derived from a z-test is. Essentially, it is a numerical measurement that describes a value's relationship to the mean of a group of values. If a z-score is 0, it indicates that the data point's score is identical to the mean score. A z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.

What Is Central Limit Theorem (CLT)?

In the study of probability theory, the central limit theorem (CLT) states that the distribution of sample approximates a normal distribution (also known as a “bell curve”) as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population distribution shape. Sample sizes equal to or greater than 30 are considered sufficient for the CLT to predict the characteristics of a population accurately. The z-test's fidelity relies on the CLT holding.

A z-test is used in hypothesis testing to evaluate whether a finding or association is statistically significant or not. In particular, it tests whether two means are the same (the null hypothesis). A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.

z test hypothesis testing examples

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts.

  • Python for Data Science
  • Data Analysis
  • Machine Learning
  • Deep Learning
  • Deep Learning Interview Questions
  • ML Projects
  • ML Interview Questions

Related Articles

  • Explore Our Geeks Community
  • Machine Learning Mathematics

Linear Algebra and Matrix

  • Scalar and Vector
  • Python Program to Add Two Matrices
  • Python program to multiply two matrices
  • Vector Operations
  • Product of Vectors
  • Scalar Product of Vectors
  • Dot and Cross Products on Vectors
  • Transpose a matrix in Single line in Python
  • Transpose of a Matrix
  • Adjoint and Inverse of a Matrix
  • How to inverse a matrix using NumPy
  • Determinant of a Matrix
  • Program to find Normal and Trace of a matrix
  • Data Science | Solving Linear Equations
  • Data Science - Solving Linear Equations with Python
  • System of Linear Equations
  • System of Linear Equations in three variables using Cramer's Rule
  • Eigenvalues
  • Applications of Eigenvalues and Eigenvectors
  • How to compute the eigenvalues and right eigenvectors of a given square array using NumPY?

Statistics for Machine Learning

  • Descriptive Statistic
  • Measures of Central Tendency
  • Measures of Dispersion | Types, Formula and Examples
  • Mean, Variance and Standard Deviation
  • Calculate the average, variance and standard deviation in Python using NumPy
  • Random Variables
  • Difference between Parametric and Non-Parametric Methods
  • Probability Distribution
  • Confidence Interval
  • Mathematics | Covariance and Correlation
  • Program to find correlation coefficient
  • Robust Correlation
  • Normal Probability Plot
  • Quantile Quantile plots
  • True Error vs Sample Error
  • Bias-Variance Trade Off - Machine Learning
  • Understanding Hypothesis Testing
  • Paired T-Test - A Detailed Overview
  • P-value in Machine Learning
  • F-Test in Statistics
  • Residual Leverage Plot (Regression Diagnostic)
  • Difference between Null and Alternate Hypothesis
  • Mann and Whitney U test
  • Wilcoxon Signed Rank Test
  • Kruskal Wallis Test
  • Friedman Test
  • Mathematics | Probability

Probability and Probability Distributions

  • Mathematics - Law of Total Probability
  • Bayes's Theorem for Conditional Probability
  • Mathematics | Probability Distributions Set 1 (Uniform Distribution)
  • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
  • Mathematics | Probability Distributions Set 5 (Poisson Distribution)
  • Uniform Distribution Formula
  • Mathematics | Probability Distributions Set 2 (Exponential Distribution)
  • Mathematics | Probability Distributions Set 3 (Normal Distribution)
  • Mathematics | Beta Distribution Model
  • Gamma Distribution Model in Mathematics
  • Chi-Square Test for Feature Selection - Mathematical Explanation
  • Student's t-distribution in Statistics
  • Python - Central Limit Theorem
  • Mathematics | Limits, Continuity and Differentiability
  • Implicit Differentiation

Calculus for Machine Learning

  • Engineering Mathematics - Partial Derivatives
  • Advanced Differentiation
  • How to find Gradient of a Function using Python?
  • Optimization techniques for Gradient Descent
  • Higher Order Derivatives
  • Taylor Series
  • Application of Derivative - Maxima and Minima | Mathematics
  • Absolute Minima and Maxima
  • Optimization for Data Science
  • Unconstrained Multivariate Optimization
  • Lagrange Multipliers
  • Lagrange's Interpolation
  • Linear Regression in Machine learning
  • Ordinary Least Squares (OLS) using statsmodels

Regression in Machine Learning

Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis. 

A statistical test starts with the formulation of a null hypothesis (H0) and an alternative hypothesis (Ha). The alternative hypothesis proposes a particular link or effect, whereas the null hypothesis reflects the default assumption and often states no effect or no difference.

The p-value indicates the likelihood of observing the data or more extreme results assuming the null hypothesis is true. Researchers compare the calculated p-value to a predetermined significance level, often denoted as α, to make a decision regarding the null hypothesis. If the p-value is smaller than α, the results are considered statistically significant, leading to the rejection of the null hypothesis in favor of the alternative hypothesis.

The p-value is calculated using a variety of statistical tests, including the Z-test, T-test , Chi-squared test , ANOVA , Z-test , and F-test , among others. In this article, we will focus on explaining the Z-test.

What is Z-Test?

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).

Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).

The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:

\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma}

The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.

Then Z-Score for Delhi will be.

\begin{aligned} \text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma} \\&=\frac{300-200}{5} \\&=20 \end{aligned}

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

When to Use Z-test:

  • The sample size should be greater than 30. Otherwise, we should use the t-test.
  • Samples should be drawn at random from the population.
  • The standard deviation of the population should be known.
  • Samples that are drawn from the population should be independent of each other.
  • The data should be normally distributed , however, for a large sample size, it is assumed to have a normal distribution because central limit theorem

Hypothesis Testing

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.

  • Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H 0 .
  • Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by H A .

Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).

Steps to perform Z-test:

  • First, identify the null and alternate hypotheses.
  • Determine the level of significance (∝).
  • Find the critical value of z in the z-test using

Z  =  \frac{(\overline{X}- \mu)}{\left ( \sigma /\sqrt{n} \right )}

  • n: sample size.
  • Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis

Type of Z-test

  • Left-tailed Test: In this test, our region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

z test hypothesis testing examples

  • Right-tailed Test: In this test, our region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

z test hypothesis testing examples

  • Two-tailed test: In this test, our region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.

z test hypothesis testing examples

Below is an example of performing the z-test:

Example One-Tailed Test:

 A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.

H_0 : \mu  = 100

  • Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
  • Here 4.71 >1.645, so we reject the null hypothesis. 
  • If the z-test statistics are less than the z-score, then we will not reject the null hypothesis.

Code Implementations

Two-sampled z-test:.

In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u 1 and u 2 to be the population mean, and X 1 and X 2 to be the observed sample mean. Here, our null hypothesis could be like this:

H_{0} : \mu_{1} -\mu_{2} = 0

and alternative hypothesis

H_{1} :  \mu_{1} - \mu_{2} \ne 0

and the formula for calculating the z-test score:

Z = \frac{\left ( \overline{X_{1}} - \overline{X_{2}} \right ) - \left ( \mu_{1} - \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10 Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

Step 1: Null & Alternate Hypothesis

 \mu_1 -\mu_2 = 0

Step 2: Significance Label

\alpha = 0.05

Step 3: Z-Score

\begin{aligned} \text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)} {\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}}  \\ &= \frac{(75-80)-0} {\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}} \\ &= \frac{-5} {\sqrt{2+2.4}} \\ &= \frac{-5} {2.0976} \\&=-2.384 \end{aligned}

Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025

  •  Critical Z-Score = 1.96

Step 5: Compare with the absolute Z-Score value

  • absolute(Z-Score) > Critical Z-Score
  • Reject the null hypothesis. There is a significant difference between the online and offline classes.

Type 1 error and Type II error:

  • Type I error: Type 1 error has occurred when we reject the null hypothesis, even when the hypothesis is true. This error is denoted by alpha.
  • Type II error: Type II error occurred when we didn’t reject the null hypothesis, even when the hypothesis is false. This error is denoted by beta.

Feeling lost in the world of random DSA topics, wasting time without progress? It's time for a change! Join our DSA course, where we'll guide you on an exciting journey to master DSA efficiently and on schedule. Ready to dive in? Explore our Free Demo Content and join our DSA course, trusted by over 100,000 geeks!

  • DSA in Java
  • DSA in Python
  • DSA in JavaScript

Please Login to comment...

  • Engineering Mathematics
  • Mathematical
  • pawan_kumar_gunjan
  • How To Make Money With Online Tutoring
  • Top 7 Greatest Women Boxers of all Time 2024
  • 10 EASY Ways To Make Money as a Teenager [2024 ]
  • Top 12 AI Tools for LinkedIn [2024]
  • 10 Best Free AI Art Generators to Create Image From Text [Free & Paid]

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.E: Hypothesis Testing with One Sample (Exercises)

  • Last updated
  • Save as PDF
  • Page ID 1146

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

9.1: Introduction

9.2: null and alternative hypotheses.

Some of the following statements refer to the null hypothesis, some to the alternate hypothesis.

State the null hypothesis, \(H_{0}\), and the alternative hypothesis. \(H_{a}\), in terms of the appropriate parameter \((\mu \text{or} p)\).

  • The mean number of years Americans work before retiring is 34.
  • At most 60% of Americans vote in presidential elections.
  • The mean starting salary for San Jose State University graduates is at least $100,000 per year.
  • Twenty-nine percent of high school seniors get drunk each month.
  • Fewer than 5% of adults ride the bus to work in Los Angeles.
  • The mean number of cars a person owns in her lifetime is not more than ten.
  • About half of Americans prefer to live away from cities, given the choice.
  • Europeans have a mean paid vacation each year of six weeks.
  • The chance of developing breast cancer is under 11% for women.
  • Private universities' mean tuition cost is more than $20,000 per year.
  • \(H_{0}: \mu = 34; H_{a}: \mu \neq 34\)
  • \(H_{0}: p \leq 0.60; H_{a}: p > 0.60\)
  • \(H_{0}: \mu \geq 100,000; H_{a}: \mu < 100,000\)
  • \(H_{0}: p = 0.29; H_{a}: p \neq 0.29\)
  • \(H_{0}: p = 0.05; H_{a}: p < 0.05\)
  • \(H_{0}: \mu \leq 10; H_{a}: \mu > 10\)
  • \(H_{0}: p = 0.50; H_{a}: p \neq 0.50\)
  • \(H_{0}: \mu = 6; H_{a}: \mu \neq 6\)
  • \(H_{0}: p ≥ 0.11; H_{a}: p < 0.11\)
  • \(H_{0}: \mu \leq 20,000; H_{a}: \mu > 20,000\)

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin? The alternative hypothesis is:

  • \(p < 0.30\)
  • \(p \leq 0.30\)
  • \(p \geq 0.30\)
  • \(p > 0.30\)

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 attended the midnight showing. An appropriate alternative hypothesis is:

  • \(p = 0.20\)
  • \(p > 0.20\)
  • \(p < 0.20\)
  • \(p \leq 0.20\)

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test. The null and alternative hypotheses are:

  • \(H_{0}: \bar{x} = 4.5, H_{a}: \bar{x} > 4.5\)
  • \(H_{0}: \mu \geq 4.5, H_{a}: \mu < 4.5\)
  • \(H_{0}: \mu = 4.75, H_{a}: \mu > 4.75\)
  • \(H_{0}: \mu = 4.5, H_{a}: \mu > 4.5\)

9.3: Outcomes and the Type I and Type II Errors

State the Type I and Type II errors in complete sentences given the following statements.

  • The mean number of cars a person owns in his or her lifetime is not more than ten.
  • Private universities mean tuition cost is more than $20,000 per year.
  • Type I error: We conclude that the mean is not 34 years, when it really is 34 years. Type II error: We conclude that the mean is 34 years, when in fact it really is not 34 years.
  • Type I error: We conclude that more than 60% of Americans vote in presidential elections, when the actual percentage is at most 60%.Type II error: We conclude that at most 60% of Americans vote in presidential elections when, in fact, more than 60% do.
  • Type I error: We conclude that the mean starting salary is less than $100,000, when it really is at least $100,000. Type II error: We conclude that the mean starting salary is at least $100,000 when, in fact, it is less than $100,000.
  • Type I error: We conclude that the proportion of high school seniors who get drunk each month is not 29%, when it really is 29%. Type II error: We conclude that the proportion of high school seniors who get drunk each month is 29% when, in fact, it is not 29%.
  • Type I error: We conclude that fewer than 5% of adults ride the bus to work in Los Angeles, when the percentage that do is really 5% or more. Type II error: We conclude that 5% or more adults ride the bus to work in Los Angeles when, in fact, fewer that 5% do.
  • Type I error: We conclude that the mean number of cars a person owns in his or her lifetime is more than 10, when in reality it is not more than 10. Type II error: We conclude that the mean number of cars a person owns in his or her lifetime is not more than 10 when, in fact, it is more than 10.
  • Type I error: We conclude that the proportion of Americans who prefer to live away from cities is not about half, though the actual proportion is about half. Type II error: We conclude that the proportion of Americans who prefer to live away from cities is half when, in fact, it is not half.
  • Type I error: We conclude that the duration of paid vacations each year for Europeans is not six weeks, when in fact it is six weeks. Type II error: We conclude that the duration of paid vacations each year for Europeans is six weeks when, in fact, it is not.
  • Type I error: We conclude that the proportion is less than 11%, when it is really at least 11%. Type II error: We conclude that the proportion of women who develop breast cancer is at least 11%, when in fact it is less than 11%.
  • Type I error: We conclude that the average tuition cost at private universities is more than $20,000, though in reality it is at most $20,000. Type II error: We conclude that the average tuition cost at private universities is at most $20,000 when, in fact, it is more than $20,000.

For statements a-j in Exercise 9.109 , answer the following in complete sentences.

  • State a consequence of committing a Type I error.
  • State a consequence of committing a Type II error.

When a new drug is created, the pharmaceutical company must subject it to testing before receiving the necessary permission from the Food and Drug Administration (FDA) to market the drug. Suppose the null hypothesis is “the drug is unsafe.” What is the Type II Error?

  • To conclude the drug is safe when in, fact, it is unsafe.
  • Not to conclude the drug is safe when, in fact, it is safe.
  • To conclude the drug is safe when, in fact, it is safe.
  • Not to conclude the drug is unsafe when, in fact, it is unsafe.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing. The Type I error is to conclude that the percent of EVC students who attended is ________.

  • at least 20%, when in fact, it is less than 20%.
  • 20%, when in fact, it is 20%.
  • less than 20%, when in fact, it is at least 20%.
  • less than 20%, when in fact, it is less than 20%.

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average?

The Type II error is not to reject that the mean number of hours of sleep LTCC students get per night is at least seven when, in fact, the mean number of hours

  • is more than seven hours.
  • is at most seven hours.
  • is at least seven hours.
  • is less than seven hours.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test, the Type I error is:

  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is higher
  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is the same
  • to conclude that the mean hours per week currently is 4.5, when in fact, it is higher
  • to conclude that the mean hours per week currently is no higher than 4.5, when in fact, it is not higher

9.4: Distribution Needed for Hypothesis Testing

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average? The distribution to be used for this test is \(\bar{X} \sim\) ________________

  • \(N\left(7.24, \frac{1.93}{\sqrt{22}}\right)\)
  • \(N\left(7.24, 1.93\right)\)

9.5: Rare Events, the Sample, Decision and Conclusion

The National Institute of Mental Health published an article stating that in any one-year period, approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose that in a survey of 100 people in a certain town, seven of them suffered from depression or a depressive illness. Conduct a hypothesis test to determine if the true proportion of people in that town suffering from depression or a depressive illness is lower than the percent in the general adult American population.

  • Is this a test of one mean or proportion?
  • State the null and alternative hypotheses. \(H_{0}\) : ____________________ \(H_{a}\) : ____________________
  • Is this a right-tailed, left-tailed, or two-tailed test?
  • What symbol represents the random variable for this test?
  • In words, define the random variable for this test.
  • \(x =\) ________________
  • \(n =\) ________________
  • \(p′ =\) _____________
  • Calculate \(\sigma_{x} =\) __________. Show the formula set-up.
  • State the distribution to use for the hypothesis test.
  • Find the \(p\text{-value}\).
  • Reason for the decision:
  • Conclusion (write out in a complete sentence):

9.6: Additional Information and Full Hypothesis Test Examples

For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in [link] . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's \(t\) - distribution for one of the following homework problems, you may assume that the underlying population is normally distributed. (In general, you must first prove that assumption, however.)

A particular brand of tires claims that its deluxe tire averages at least 50,000 miles before it needs to be replaced. From past studies of this tire, the standard deviation is known to be 8,000. A survey of owners of that tire design is conducted. From the 28 tires surveyed, the mean lifespan was 46,500 miles with a standard deviation of 9,800 miles. Using \(\alpha = 0.05\), is the data highly inconsistent with the claim?

  • \(H_{0}: \mu \geq 50,000\)
  • \(H_{a}: \mu < 50,000\)
  • Let \(\bar{X} =\) the average lifespan of a brand of tires.
  • normal distribution
  • \(z = -2.315\)
  • \(p\text{-value} = 0.0103\)
  • Check student’s solution.
  • alpha: 0.05
  • Decision: Reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the mean lifespan of the tires is less than 50,000 miles.
  • \((43,537, 49,463)\)

From generation to generation, the mean age when smokers first start to smoke varies. However, the standard deviation of that age remains constant of around 2.1 years. A survey of 40 smokers of this generation was done to see if the mean starting age is at least 19. The sample mean was 18.1 with a sample standard deviation of 1.3. Do the data support the claim at the 5% level?

The cost of a daily newspaper varies from city to city. However, the variation among prices remains steady with a standard deviation of 20¢. A study was done to test the claim that the mean cost of a daily newspaper is $1.00. Twelve costs yield a mean cost of 95¢ with a standard deviation of 18¢. Do the data support the claim at the 1% level?

  • \(H_{0}: \mu = $1.00\)
  • \(H_{a}: \mu \neq $1.00\)
  • Let \(\bar{X} =\) the average cost of a daily newspaper.
  • \(z = –0.866\)
  • \(p\text{-value} = 0.3865\)
  • \(\alpha: 0.01\)
  • Decision: Do not reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is greater than 0.01.
  • Conclusion: There is sufficient evidence to support the claim that the mean cost of daily papers is $1. The mean cost could be $1.
  • \(($0.84, $1.06)\)

An article in the San Jose Mercury News stated that students in the California state university system take 4.5 years, on average, to finish their undergraduate degrees. Suppose you believe that the mean time is longer. You conduct a survey of 49 students and obtain a sample mean of 5.1 with a sample standard deviation of 1.2. Do the data support your claim at the 1% level?

The mean number of sick days an employee takes per year is believed to be about ten. Members of a personnel department do not believe this figure. They randomly survey eight employees. The number of sick days they took for the past year are as follows: 12; 4; 15; 3; 11; 8; 6; 8. Let \(x =\) the number of sick days they took for the past year. Should the personnel team believe that the mean number is ten?

  • \(H_{0}: \mu = 10\)
  • \(H_{a}: \mu \neq 10\)
  • Let \(\bar{X}\) the mean number of sick days an employee takes per year.
  • Student’s t -distribution
  • \(t = –1.12\)
  • \(p\text{-value} = 0.300\)
  • \(\alpha: 0.05\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05.
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the mean number of sick days is not ten.
  • \((4.9443, 11.806)\)

In 1955, Life Magazine reported that the 25 year-old mother of three worked, on average, an 80 hour week. Recently, many groups have been studying whether or not the women's movement has, in fact, resulted in an increase in the average work week for women (combining employment and at-home work). Suppose a study was done to determine if the mean work week has increased. 81 women were surveyed with the following results. The sample mean was 83; the sample standard deviation was ten. Does it appear that the mean work week has increased for women at the 5% level?

Your statistics instructor claims that 60 percent of the students who take her Elementary Statistics class go through life feeling more enriched. For some reason that she can't quite figure out, most people don't believe her. You decide to check this out on your own. You randomly survey 64 of her past Elementary Statistics students and find that 34 feel more enriched as a result of her class. Now, what do you think?

  • \(H_{0}: p \geq 0.6\)
  • \(H_{a}: p < 0.6\)
  • Let \(P′ =\) the proportion of students who feel more enriched as a result of taking Elementary Statistics.
  • normal for a single proportion
  • \(p\text{-value} = 0.1308\)
  • Conclusion: There is insufficient evidence to conclude that less than 60 percent of her students feel more enriched.

The “plus-4s” confidence interval is \((0.411, 0.648)\)

A Nissan Motor Corporation advertisement read, “The average man’s I.Q. is 107. The average brown trout’s I.Q. is 4. So why can’t man catch brown trout?” Suppose you believe that the brown trout’s mean I.Q. is greater than four. You catch 12 brown trout. A fish psychologist determines the I.Q.s as follows: 5; 4; 7; 3; 6; 4; 5; 3; 6; 3; 8; 5. Conduct a hypothesis test of your belief.

Refer to Exercise 9.119 . Conduct a hypothesis test to see if your decision and conclusion would change if your belief were that the brown trout’s mean I.Q. is not four.

  • \(H_{0}: \mu = 4\)
  • \(H_{a}: \mu \neq 4\)
  • Let \(\bar{X}\) the average I.Q. of a set of brown trout.
  • two-tailed Student's t-test
  • \(t = 1.95\)
  • \(p\text{-value} = 0.076\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05
  • Conclusion: There is insufficient evidence to conclude that the average IQ of brown trout is not four.
  • \((3.8865,5.9468)\)

According to an article in Newsweek , the natural ratio of girls to boys is 100:105. In China, the birth ratio is 100: 114 (46.7% girls). Suppose you don’t believe the reported figures of the percent of girls born in China. You conduct a study. In this study, you count the number of girls and boys born in 150 randomly chosen recent births. There are 60 girls and 90 boys born of the 150. Based on your study, do you believe that the percent of girls born in China is 46.7?

A poll done for Newsweek found that 13% of Americans have seen or sensed the presence of an angel. A contingent doubts that the percent is really that high. It conducts its own survey. Out of 76 Americans surveyed, only two had seen or sensed the presence of an angel. As a result of the contingent’s survey, would you agree with the Newsweek poll? In complete sentences, also give three reasons why the two polls might give different results.

  • \(H_{a}: p < 0.13\)
  • Let \(P′ =\) the proportion of Americans who have seen or sensed angels
  • –2.688
  • \(p\text{-value} = 0.0036\)
  • Reason for decision: The \(p\text{-value}\)e is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the percentage of Americans who have seen or sensed an angel is less than 13%.

The“plus-4s” confidence interval is (0.0022, 0.0978)

The mean work week for engineers in a start-up company is believed to be about 60 hours. A newly hired engineer hopes that it’s shorter. She asks ten engineering friends in start-ups for the lengths of their mean work weeks. Based on the results that follow, should she count on the mean work week to be shorter than 60 hours?

Data (length of mean work week): 70; 45; 55; 60; 65; 55; 55; 60; 50; 55.

Use the “Lap time” data for Lap 4 (see [link] ) to test the claim that Terri finishes Lap 4, on average, in less than 129 seconds. Use all twenty races given.

  • \(H_{0}: \mu \geq 129\)
  • \(H_{a}: \mu < 129\)
  • Let \(\bar{X} =\) the average time in seconds that Terri finishes Lap 4.
  • Student's t -distribution
  • \(t = 1.209\)
  • Conclusion: There is insufficient evidence to conclude that Terri’s mean lap time is less than 129 seconds.
  • \((128.63, 130.37)\)

Use the “Initial Public Offering” data (see [link] ) to test the claim that the mean offer price was $18 per share. Do not use all the data. Use your random number generator to randomly survey 15 prices.

The following questions were written by past students. They are excellent problems!

"Asian Family Reunion," by Chau Nguyen

Every two years it comes around.

We all get together from different towns.

In my honest opinion,

It's not a typical family reunion.

Not forty, or fifty, or sixty,

But how about seventy companions!

The kids would play, scream, and shout

One minute they're happy, another they'll pout.

The teenagers would look, stare, and compare

From how they look to what they wear.

The men would chat about their business

That they make more, but never less.

Money is always their subject

And there's always talk of more new projects.

The women get tired from all of the chats

They head to the kitchen to set out the mats.

Some would sit and some would stand

Eating and talking with plates in their hands.

Then come the games and the songs

And suddenly, everyone gets along!

With all that laughter, it's sad to say

That it always ends in the same old way.

They hug and kiss and say "good-bye"

And then they all begin to cry!

I say that 60 percent shed their tears

But my mom counted 35 people this year.

She said that boys and men will always have their pride,

So we won't ever see them cry.

I myself don't think she's correct,

So could you please try this problem to see if you object?

  • \(H_{0}: p = 0.60\)
  • \(H_{a}: p < 0.60\)
  • Let \(P′ =\) the proportion of family members who shed tears at a reunion.
  • –1.71
  • Reason for decision: \(p\text{-value} < \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of family members who shed tears at a reunion is less than 0.60. However, the test is weak because the \(p\text{-value}\) and alpha are quite close, so other tests should be done.
  • We are 95% confident that between 38.29% and 61.71% of family members will shed tears at a family reunion. \((0.3829, 0.6171)\). The“plus-4s” confidence interval (see chapter 8) is \((0.3861, 0.6139)\)

Note that here the “large-sample” \(1 - \text{PropZTest}\) provides the approximate \(p\text{-value}\) of 0.0438. Whenever a \(p\text{-value}\) based on a normal approximation is close to the level of significance, the exact \(p\text{-value}\) based on binomial probabilities should be calculated whenever possible. This is beyond the scope of this course.

"The Problem with Angels," by Cyndy Dowling

Although this problem is wholly mine,

The catalyst came from the magazine, Time.

On the magazine cover I did find

The realm of angels tickling my mind.

Inside, 69% I found to be

In angels, Americans do believe.

Then, it was time to rise to the task,

Ninety-five high school and college students I did ask.

Viewing all as one group,

Random sampling to get the scoop.

So, I asked each to be true,

"Do you believe in angels?" Tell me, do!

Hypothesizing at the start,

Totally believing in my heart

That the proportion who said yes

Would be equal on this test.

Lo and behold, seventy-three did arrive,

Out of the sample of ninety-five.

Now your job has just begun,

Solve this problem and have some fun.

"Blowing Bubbles," by Sondra Prull

Studying stats just made me tense,

I had to find some sane defense.

Some light and lifting simple play

To float my math anxiety away.

Blowing bubbles lifts me high

Takes my troubles to the sky.

POIK! They're gone, with all my stress

Bubble therapy is the best.

The label said each time I blew

The average number of bubbles would be at least 22.

I blew and blew and this I found

From 64 blows, they all are round!

But the number of bubbles in 64 blows

Varied widely, this I know.

20 per blow became the mean

They deviated by 6, and not 16.

From counting bubbles, I sure did relax

But now I give to you your task.

Was 22 a reasonable guess?

Find the answer and pass this test!

  • \(H_{0}: \mu \geq 22\)
  • \(H_{a}: \mu < 22\)
  • Let \(\bar{X} =\) the mean number of bubbles per blow.
  • –2.667
  • \(p\text{-value} = 0.00486\)
  • Conclusion: There is sufficient evidence to conclude that the mean number of bubbles per blow is less than 22.
  • \((18.501, 21.499)\)

"Dalmatian Darnation," by Kathy Sparling

A greedy dog breeder named Spreckles

Bred puppies with numerous freckles

The Dalmatians he sought

Possessed spot upon spot

The more spots, he thought, the more shekels.

His competitors did not agree

That freckles would increase the fee.

They said, “Spots are quite nice

But they don't affect price;

One should breed for improved pedigree.”

The breeders decided to prove

This strategy was a wrong move.

Breeding only for spots

Would wreak havoc, they thought.

His theory they want to disprove.

They proposed a contest to Spreckles

Comparing dog prices to freckles.

In records they looked up

One hundred one pups:

Dalmatians that fetched the most shekels.

They asked Mr. Spreckles to name

An average spot count he'd claim

To bring in big bucks.

Said Spreckles, “Well, shucks,

It's for one hundred one that I aim.”

Said an amateur statistician

Who wanted to help with this mission.

“Twenty-one for the sample

Standard deviation's ample:

They examined one hundred and one

Dalmatians that fetched a good sum.

They counted each spot,

Mark, freckle and dot

And tallied up every one.

Instead of one hundred one spots

They averaged ninety six dots

Can they muzzle Spreckles’

Obsession with freckles

Based on all the dog data they've got?

"Macaroni and Cheese, please!!" by Nedda Misherghi and Rachelle Hall

As a poor starving student I don't have much money to spend for even the bare necessities. So my favorite and main staple food is macaroni and cheese. It's high in taste and low in cost and nutritional value.

One day, as I sat down to determine the meaning of life, I got a serious craving for this, oh, so important, food of my life. So I went down the street to Greatway to get a box of macaroni and cheese, but it was SO expensive! $2.02 !!! Can you believe it? It made me stop and think. The world is changing fast. I had thought that the mean cost of a box (the normal size, not some super-gigantic-family-value-pack) was at most $1, but now I wasn't so sure. However, I was determined to find out. I went to 53 of the closest grocery stores and surveyed the prices of macaroni and cheese. Here are the data I wrote in my notebook:

Price per box of Mac and Cheese:

  • 5 stores @ $2.02
  • 15 stores @ $0.25
  • 3 stores @ $1.29
  • 6 stores @ $0.35
  • 4 stores @ $2.27
  • 7 stores @ $1.50
  • 5 stores @ $1.89
  • 8 stores @ 0.75.

I could see that the cost varied but I had to sit down to figure out whether or not I was right. If it does turn out that this mouth-watering dish is at most $1, then I'll throw a big cheesy party in our next statistics lab, with enough macaroni and cheese for just me. (After all, as a poor starving student I can't be expected to feed our class of animals!)

  • \(H_{0}: \mu \leq 1\)
  • \(H_{a}: \mu > 1\)
  • Let \(\bar{X} =\) the mean cost in dollars of macaroni and cheese in a certain town.
  • Student's \(t\)-distribution
  • \(t = 0.340\)
  • \(p\text{-value} = 0.36756\)
  • Conclusion: The mean cost could be $1, or less. At the 5% significance level, there is insufficient evidence to conclude that the mean price of a box of macaroni and cheese is more than $1.
  • \((0.8291, 1.241)\)

"William Shakespeare: The Tragedy of Hamlet, Prince of Denmark," by Jacqueline Ghodsi

THE CHARACTERS (in order of appearance):

  • HAMLET, Prince of Denmark and student of Statistics
  • POLONIUS, Hamlet’s tutor
  • HOROTIO, friend to Hamlet and fellow student

Scene: The great library of the castle, in which Hamlet does his lessons

(The day is fair, but the face of Hamlet is clouded. He paces the large room. His tutor, Polonius, is reprimanding Hamlet regarding the latter’s recent experience. Horatio is seated at the large table at right stage.)

POLONIUS: My Lord, how cans’t thou admit that thou hast seen a ghost! It is but a figment of your imagination!

HAMLET: I beg to differ; I know of a certainty that five-and-seventy in one hundred of us, condemned to the whips and scorns of time as we are, have gazed upon a spirit of health, or goblin damn’d, be their intents wicked or charitable.

POLONIUS If thou doest insist upon thy wretched vision then let me invest your time; be true to thy work and speak to me through the reason of the null and alternate hypotheses. (He turns to Horatio.) Did not Hamlet himself say, “What piece of work is man, how noble in reason, how infinite in faculties? Then let not this foolishness persist. Go, Horatio, make a survey of three-and-sixty and discover what the true proportion be. For my part, I will never succumb to this fantasy, but deem man to be devoid of all reason should thy proposal of at least five-and-seventy in one hundred hold true.

HORATIO (to Hamlet): What should we do, my Lord?

HAMLET: Go to thy purpose, Horatio.

HORATIO: To what end, my Lord?

HAMLET: That you must teach me. But let me conjure you by the rights of our fellowship, by the consonance of our youth, but the obligation of our ever-preserved love, be even and direct with me, whether I am right or no.

(Horatio exits, followed by Polonius, leaving Hamlet to ponder alone.)

(The next day, Hamlet awaits anxiously the presence of his friend, Horatio. Polonius enters and places some books upon the table just a moment before Horatio enters.)

POLONIUS: So, Horatio, what is it thou didst reveal through thy deliberations?

HORATIO: In a random survey, for which purpose thou thyself sent me forth, I did discover that one-and-forty believe fervently that the spirits of the dead walk with us. Before my God, I might not this believe, without the sensible and true avouch of mine own eyes.

POLONIUS: Give thine own thoughts no tongue, Horatio. (Polonius turns to Hamlet.) But look to’t I charge you, my Lord. Come Horatio, let us go together, for this is not our test. (Horatio and Polonius leave together.)

HAMLET: To reject, or not reject, that is the question: whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous statistics, or to take arms against a sea of data, and, by opposing, end them. (Hamlet resignedly attends to his task.)

(Curtain falls)

"Untitled," by Stephen Chen

I've often wondered how software is released and sold to the public. Ironically, I work for a company that sells products with known problems. Unfortunately, most of the problems are difficult to create, which makes them difficult to fix. I usually use the test program X, which tests the product, to try to create a specific problem. When the test program is run to make an error occur, the likelihood of generating an error is 1%.

So, armed with this knowledge, I wrote a new test program Y that will generate the same error that test program X creates, but more often. To find out if my test program is better than the original, so that I can convince the management that I'm right, I ran my test program to find out how often I can generate the same error. When I ran my test program 50 times, I generated the error twice. While this may not seem much better, I think that I can convince the management to use my test program instead of the original test program. Am I right?

  • \(H_{0}: p = 0.01\)
  • \(H_{a}: p > 0.01\)
  • Let \(P′ =\) the proportion of errors generated
  • Normal for a single proportion
  • Decision: Reject the null hypothesis
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of errors generated is more than 0.01.

The“plus-4s” confidence interval is \((0.004, 0.144)\).

"Japanese Girls’ Names"

by Kumi Furuichi

It used to be very typical for Japanese girls’ names to end with “ko.” (The trend might have started around my grandmothers’ generation and its peak might have been around my mother’s generation.) “Ko” means “child” in Chinese characters. Parents would name their daughters with “ko” attaching to other Chinese characters which have meanings that they want their daughters to become, such as Sachiko—happy child, Yoshiko—a good child, Yasuko—a healthy child, and so on.

However, I noticed recently that only two out of nine of my Japanese girlfriends at this school have names which end with “ko.” More and more, parents seem to have become creative, modernized, and, sometimes, westernized in naming their children.

I have a feeling that, while 70 percent or more of my mother’s generation would have names with “ko” at the end, the proportion has dropped among my peers. I wrote down all my Japanese friends’, ex-classmates’, co-workers, and acquaintances’ names that I could remember. Following are the names. (Some are repeats.) Test to see if the proportion has dropped for this generation.

Ai, Akemi, Akiko, Ayumi, Chiaki, Chie, Eiko, Eri, Eriko, Fumiko, Harumi, Hitomi, Hiroko, Hiroko, Hidemi, Hisako, Hinako, Izumi, Izumi, Junko, Junko, Kana, Kanako, Kanayo, Kayo, Kayoko, Kazumi, Keiko, Keiko, Kei, Kumi, Kumiko, Kyoko, Kyoko, Madoka, Maho, Mai, Maiko, Maki, Miki, Miki, Mikiko, Mina, Minako, Miyako, Momoko, Nana, Naoko, Naoko, Naoko, Noriko, Rieko, Rika, Rika, Rumiko, Rei, Reiko, Reiko, Sachiko, Sachiko, Sachiyo, Saki, Sayaka, Sayoko, Sayuri, Seiko, Shiho, Shizuka, Sumiko, Takako, Takako, Tomoe, Tomoe, Tomoko, Touko, Yasuko, Yasuko, Yasuyo, Yoko, Yoko, Yoko, Yoshiko, Yoshiko, Yoshiko, Yuka, Yuki, Yuki, Yukiko, Yuko, Yuko.

"Phillip’s Wish," by Suzanne Osorio

My nephew likes to play

Chasing the girls makes his day.

He asked his mother

If it is okay

To get his ear pierced.

She said, “No way!”

To poke a hole through your ear,

Is not what I want for you, dear.

He argued his point quite well,

Says even my macho pal, Mel,

Has gotten this done.

It’s all just for fun.

C’mon please, mom, please, what the hell.

Again Phillip complained to his mother,

Saying half his friends (including their brothers)

Are piercing their ears

And they have no fears

He wants to be like the others.

She said, “I think it’s much less.

We must do a hypothesis test.

And if you are right,

I won’t put up a fight.

But, if not, then my case will rest.”

We proceeded to call fifty guys

To see whose prediction would fly.

Nineteen of the fifty

Said piercing was nifty

And earrings they’d occasionally buy.

Then there’s the other thirty-one,

Who said they’d never have this done.

So now this poem’s finished.

Will his hopes be diminished,

Or will my nephew have his fun?

  • \(H_{0}: p = 0.50\)
  • \(H_{a}: p < 0.50\)
  • Let \(P′ =\) the proportion of friends that has a pierced ear.
  • –1.70
  • \(p\text{-value} = 0.0448\)
  • Reason for decision: The \(p\text{-value}\) is less than 0.05. (However, they are very close.)
  • Conclusion: There is sufficient evidence to support the claim that less than 50% of his friends have pierced ears.
  • Confidence Interval: \((0.245, 0.515)\): The “plus-4s” confidence interval is \((0.259, 0.519)\).

"The Craven," by Mark Salangsang

Once upon a morning dreary

In stats class I was weak and weary.

Pondering over last night’s homework

Whose answers were now on the board

This I did and nothing more.

While I nodded nearly napping

Suddenly, there came a tapping.

As someone gently rapping,

Rapping my head as I snore.

Quoth the teacher, “Sleep no more.”

“In every class you fall asleep,”

The teacher said, his voice was deep.

“So a tally I’ve begun to keep

Of every class you nap and snore.

The percentage being forty-four.”

“My dear teacher I must confess,

While sleeping is what I do best.

The percentage, I think, must be less,

A percentage less than forty-four.”

This I said and nothing more.

“We’ll see,” he said and walked away,

And fifty classes from that day

He counted till the month of May

The classes in which I napped and snored.

The number he found was twenty-four.

At a significance level of 0.05,

Please tell me am I still alive?

Or did my grade just take a dive

Plunging down beneath the floor?

Upon thee I hereby implore.

Toastmasters International cites a report by Gallop Poll that 40% of Americans fear public speaking. A student believes that less than 40% of students at her school fear public speaking. She randomly surveys 361 schoolmates and finds that 135 report they fear public speaking. Conduct a hypothesis test to determine if the percent at her school is less than 40%.

  • \(H_{0}: p = 0.40\)
  • \(H_{a}: p < 0.40\)
  • Let \(P′ =\) the proportion of schoolmates who fear public speaking.
  • –1.01
  • \(p\text{-value} = 0.1563\)
  • Conclusion: There is insufficient evidence to support the claim that less than 40% of students at the school fear public speaking.
  • Confidence Interval: \((0.3241, 0.4240)\): The “plus-4s” confidence interval is \((0.3257, 0.4250)\).

Sixty-eight percent of online courses taught at community colleges nationwide were taught by full-time faculty. To test if 68% also represents California’s percent for full-time faculty teaching the online classes, Long Beach City College (LBCC) in California, was randomly selected for comparison. In the same year, 34 of the 44 online courses LBCC offered were taught by full-time faculty. Conduct a hypothesis test to determine if 68% represents California. NOTE: For more accurate results, use more California community colleges and this past year's data.

According to an article in Bloomberg Businessweek , New York City's most recent adult smoking rate is 14%. Suppose that a survey is conducted to determine this year’s rate. Nine out of 70 randomly chosen N.Y. City residents reply that they smoke. Conduct a hypothesis test to determine if the rate is still 14% or if it has decreased.

  • \(H_{0}: p = 0.14\)
  • \(H_{a}: p < 0.14\)
  • Let \(P′ =\) the proportion of NYC residents that smoke.
  • –0.2756
  • \(p\text{-value} = 0.3914\)
  • At the 5% significance level, there is insufficient evidence to conclude that the proportion of NYC residents who smoke is less than 0.14.
  • Confidence Interval: \((0.0502, 0.2070)\): The “plus-4s” confidence interval (see chapter 8) is \((0.0676, 0.2297)\).

The mean age of De Anza College students in a previous term was 26.6 years old. An instructor thinks the mean age for online students is older than 26.6. She randomly surveys 56 online students and finds that the sample mean is 29.4 with a standard deviation of 2.1. Conduct a hypothesis test.

Registered nurses earned an average annual salary of $69,110. For that same year, a survey was conducted of 41 California registered nurses to determine if the annual salary is higher than $69,110 for California nurses. The sample average was $71,121 with a sample standard deviation of $7,489. Conduct a hypothesis test.

  • \(H_{0}: \mu = 69,110\)
  • \(H_{0}: \mu > 69,110\)
  • Let \(\bar{X} =\) the mean salary in dollars for California registered nurses.
  • \(t = 1.719\)
  • \(p\text{-value}: 0.0466\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean salary of California registered nurses exceeds $69,110.
  • \(($68,757, $73,485)\)

La Leche League International reports that the mean age of weaning a child from breastfeeding is age four to five worldwide. In America, most nursing mothers wean their children much earlier. Suppose a random survey is conducted of 21 U.S. mothers who recently weaned their children. The mean weaning age was nine months (3/4 year) with a standard deviation of 4 months. Conduct a hypothesis test to determine if the mean weaning age in the U.S. is less than four years old.

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin?

After conducting the test, your decision and conclusion are

  • Reject \(H_{0}\): There is sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Reject \(H_{0}\): There is sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing.

At a 1% level of significance, an appropriate conclusion is:

  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is more than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is at least 20%.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test.

At a significance level of \(a = 0.05\), what is the correct conclusion?

  • There is enough evidence to conclude that the mean number of hours is more than 4.75
  • There is enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.75

Instructions: For the following ten exercises,

Hypothesis testing: For the following ten exercises, answer each question.

State the null and alternate hypothesis.

State the \(p\text{-value}\).

State \(\alpha\).

What is your decision?

Write a conclusion.

Answer any other questions asked in the problem.

According to the Center for Disease Control website, in 2011 at least 18% of high school students have smoked a cigarette. An Introduction to Statistics class in Davies County, KY conducted a hypothesis test at the local high school (a medium sized–approximately 1,200 students–small city demographic) to determine if the local high school’s percentage was lower. One hundred fifty students were chosen at random and surveyed. Of the 150 students surveyed, 82 have smoked. Use a significance level of 0.05 and using appropriate statistical evidence, conduct a hypothesis test and state the conclusions.

A recent survey in the N.Y. Times Almanac indicated that 48.8% of families own stock. A broker wanted to determine if this survey could be valid. He surveyed a random sample of 250 families and found that 142 owned some type of stock. At the 0.05 significance level, can the survey be considered to be accurate?

  • \(H_{0}: p = 0.488\) \(H_{a}: p \neq 0.488\)
  • \(p\text{-value} = 0.0114\)
  • \(\alpha = 0.05\)
  • Reject the null hypothesis.
  • At the 5% level of significance, there is enough evidence to conclude that 48.8% of families own stocks.
  • The survey does not appear to be accurate.

Driver error can be listed as the cause of approximately 54% of all fatal auto accidents, according to the American Automobile Association. Thirty randomly selected fatal accidents are examined, and it is determined that 14 were caused by driver error. Using \(\alpha = 0.05\), is the AAA proportion accurate?

The US Department of Energy reported that 51.7% of homes were heated by natural gas. A random sample of 221 homes in Kentucky found that 115 were heated by natural gas. Does the evidence support the claim for Kentucky at the \(\alpha = 0.05\) level in Kentucky? Are the results applicable across the country? Why?

  • \(H_{0}: p = 0.517\) \(H_{0}: p \neq 0.517\)
  • \(p\text{-value} = 0.9203\).
  • \(\alpha = 0.05\).
  • Do not reject the null hypothesis.
  • At the 5% significance level, there is not enough evidence to conclude that the proportion of homes in Kentucky that are heated by natural gas is 0.517.
  • However, we cannot generalize this result to the entire nation. First, the sample’s population is only the state of Kentucky. Second, it is reasonable to assume that homes in the extreme north and south will have extreme high usage and low usage, respectively. We would need to expand our sample base to include these possibilities if we wanted to generalize this claim to the entire nation.

For Americans using library services, the American Library Association claims that at most 67% of patrons borrow books. The library director in Owensboro, Kentucky feels this is not true, so she asked a local college statistic class to conduct a survey. The class randomly selected 100 patrons and found that 82 borrowed books. Did the class demonstrate that the percentage was higher in Owensboro, KY? Use \(\alpha = 0.01\) level of significance. What is the possible proportion of patrons that do borrow books from the Owensboro Library?

The Weather Underground reported that the mean amount of summer rainfall for the northeastern US is at least 11.52 inches. Ten cities in the northeast are randomly selected and the mean rainfall amount is calculated to be 7.42 inches with a standard deviation of 1.3 inches. At the \(\alpha = 0.05 level\), can it be concluded that the mean rainfall was below the reported average? What if \(\alpha = 0.01\)? Assume the amount of summer rainfall follows a normal distribution.

  • \(H_{0}: \mu \geq 11.52\) \(H_{a}: \mu < 11.52\)
  • \(p\text{-value} = 0.000002\) which is almost 0.
  • At the 5% significance level, there is enough evidence to conclude that the mean amount of summer rain in the northeaster US is less than 11.52 inches, on average.
  • We would make the same conclusion if alpha was 1% because the \(p\text{-value}\) is almost 0.

A survey in the N.Y. Times Almanac finds the mean commute time (one way) is 25.4 minutes for the 15 largest US cities. The Austin, TX chamber of commerce feels that Austin’s commute time is less and wants to publicize this fact. The mean for 25 randomly selected commuters is 22.1 minutes with a standard deviation of 5.3 minutes. At the \(\alpha = 0.10\) level, is the Austin, TX commute significantly less than the mean commute time for the 15 largest US cities?

A report by the Gallup Poll found that a woman visits her doctor, on average, at most 5.8 times each year. A random sample of 20 women results in these yearly visit totals

3; 2; 1; 3; 7; 2; 9; 4; 6; 6; 8; 0; 5; 6; 4; 2; 1; 3; 4; 1

At the \(\alpha = 0.05\) level can it be concluded that the sample mean is higher than 5.8 visits per year?

  • \(H_{0}: \mu \leq 5.8\) \(H_{a}: \mu > 5.8\)
  • \(p\text{-value} = 0.9987\)
  • At the 5% level of significance, there is not enough evidence to conclude that a woman visits her doctor, on average, more than 5.8 times a year.

According to the N.Y. Times Almanac the mean family size in the U.S. is 3.18. A sample of a college math class resulted in the following family sizes:

5; 4; 5; 4; 4; 3; 6; 4; 3; 3; 5; 5; 6; 3; 3; 2; 7; 4; 5; 2; 2; 2; 3; 2

At \(\alpha = 0.05\) level, is the class’ mean family size greater than the national average? Does the Almanac result remain valid? Why?

The student academic group on a college campus claims that freshman students study at least 2.5 hours per day, on average. One Introduction to Statistics class was skeptical. The class took a random sample of 30 freshman students and found a mean study time of 137 minutes with a standard deviation of 45 minutes. At α = 0.01 level, is the student academic group’s claim correct?

  • \(H_{0}: \mu \geq 150\) \(H_{0}: \mu < 150\)
  • \(p\text{-value} = 0.0622\)
  • \(\alpha = 0.01\)
  • At the 1% significance level, there is not enough evidence to conclude that freshmen students study less than 2.5 hours per day, on average.
  • The student academic group’s claim appears to be correct.

9.7: Hypothesis Testing of a Single Mean and Single Proportion

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

z test hypothesis testing examples

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved January 29, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

  • Prompt Library
  • DS/AI Trends
  • Stats Tools
  • Interview Questions
  • Generative AI
  • Machine Learning
  • Deep Learning

Z-tests for Hypothesis testing: Formula & Examples

Different types of Z-test - One sample and two samples

Z-tests are statistical hypothesis testing techniques that are used to determine whether the null hypothesis relating to comparing sample means or proportions with that of population at a given significance level can be rejected or otherwise based on the z-statistics or z-score. As a data scientist , you must get a good understanding of the z-tests and its applications to test the hypothesis for your statistical models. In this blog post, we will discuss an overview of different types of z-tests and related concepts with the help of examples. You may want to check my post on hypothesis testing titled – Hypothesis testing explained with examples

Table of Contents

What are Z-tests & Z-statistics?

Z-tests can be defined as statistical hypothesis testing techniques that are used to quantify the hypothesis testing related to claim made about the population parameters such as mean and proportion. Z-test uses the sample data to test the hypothesis about the population parameters (mean or proportion). There are different types of Z-tests which are used to estimate the population mean or proportion, or, perform hypotheses testing related to samples’ means or proportions.

Different types of Z-tests 

There are following different types of Z-tests which are used to perform different types of hypothesis testing.  

Different types of Z-test - One sample and two samples

  • One-sample Z-test for means
  • Two-sample Z-test for means
  • One sample Z-test for proportion
  • Two sample Z-test for proportions

Four variables are involved in the Z-test for performing hypothesis testing for different scenarios. They are as follows:

  • An independent variable that is called the “sample” and assumed to be normally distributed;
  • A dependent variable that is known as the test statistic (Z) and calculated based on sample data
  • Different types of Z-test that can be used for performing hypothesis testing
  • A significance level or “alpha” is usually set at 0.05 but can take the values such as 0.01, 0.05, 0.1

When to use Z-test – Explained with examples

The following are different scenarios when Z-test can be used:

  • Compare the sample or a single group with that of the population with respect to the parameter, mean. This is called as one-sample Z-test for means. For example, whether the student of a particular school has been scoring marks in Mathematics which is statistically significant than the other schools. This can also be thought of as a hypothesis test to check whether the sample belongs to the population or otherwise.
  • Compare two groups with respect to the population parameter, mean. This is called as two-samples Z-test for means. For example, you want to compare class X students from different schools and determine if students of one school are better than others based on their score of Mathematics.
  • Compare hypothesized proportion of the population to that of population theoritical proportion. For example, whether the unemployment rate of a given state is different than the well-established rate for the ccountry
  • Compare the proportion of one population with the proportion of othe rproportion. For example, whether the efficacy rate of vaccination in two different population are statistically significant or otherwise.

Z-test Interview Questions 

Here is a list of a few interview questions you may expect in your data scientists interview:

  • What is Z-test?
  • What is Z-statistics or Z-score?
  • When to use Z-test vs other tests such as T-test or Chi-square test?
  • What is Z-distribution?
  • What is the difference between Z-distribution and T-distribution?
  • What is sampling distribution?
  • What are different types of Z-tests?
  • Explain different types of Z-tests with the help of real-world examples?
  • What’s the difference two samples Z-test for means and two-samples Z-test for proportions? Explain with one example each.
  • As data scientists, give some scenarios when you would like to use Z-test when building machine learning models?

Recent Posts

Ajitesh Kumar

  • Amazon (AWS) Machine Learning / AI Services List - January 28, 2024
  • How is ChatGPT Trained to Generate Desired Responses? - January 27, 2024
  • Machine Learning Lifecycle Example: From Data to Deployment - January 27, 2024

Ajitesh Kumar

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search for:

ChatGPT Prompts (250+)

  • Design User Flows for App
  • Generate a List of Competitors for App
  • Create a User Journey Map for App
  • Generate Design Ideas for App
  • Generate Visual Design Ideas for App
  • Amazon (AWS) Machine Learning / AI Services List
  • How is ChatGPT Trained to Generate Desired Responses?
  • Machine Learning Lifecycle Example: From Data to Deployment
  • Retrieval Augmented Generation (RAG) & LLM: Examples
  • AI-Ready Data Explained with Examples

Data Science / AI Trends

  • • Custom LLM and AI Agents (RAG) On Structured + Unstructured Data - AI Brain For Your Organization
  • • Guides, papers, lecture, notebooks and resources for prompt engineering
  • • Common tricks to make LLMs efficient and stable
  • • Machine learning in finance
  • • New short course on advanced retrieval for RAG (retrieval augmented generation)

Free Online Tools

  • Create Scatter Plots Online for your Excel Data
  • Histogram / Frequency Distribution Creation Tool
  • Online Pie Chart Maker Tool
  • Z-test vs T-test Decision Tool
  • Independent samples t-test calculator

Recent Comments

Useful for Doctor profession

Thank you for pointing out correctly. Corrected!

I think the Pe where the 3x3 contingency table is miscalculated

Nice question to help us

Thanks for the article! Very well explained. Examples made it easier to understand. We have a dataset with 500 diseases…

z test hypothesis testing examples

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.3 hypothesis testing examples.

  • Example: Right-Tailed Test
  • Example: Left-Tailed Test
  • Example: Two-Tailed Test

Brinell Hardness Scores

An engineer measured the Brinell hardness of 25 pieces of ductile iron that were subcritically annealed. The resulting data were:

The engineer hypothesized that the mean Brinell hardness of all such ductile iron pieces is greater than 170. Therefore, he was interested in testing the hypotheses:

H 0 : μ = 170 H A : μ > 170

The engineer entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

Descriptive Statistics

$\mu$: mean of Brinelli

Null hypothesis    H₀: $\mu$ = 170 Alternative hypothesis    H₁: $\mu$ > 170

The output tells us that the average Brinell hardness of the n = 25 pieces of ductile iron was 172.52 with a standard deviation of 10.31. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 10.31 by the square root of n = 25, is 2.06). The test statistic t * is 1.22, and the P -value is 0.117.

If the engineer set his significance level α at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were greater than 1.7109 (determined using statistical software or a t -table):

t distribution graph for df = 24 and a right tailed test of .05 significance level

Since the engineer's test statistic, t * = 1.22, is not greater than 1.7109, the engineer fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 24 curve and to the right of the test statistic t * = 1.22:

t distribution graph of right tailed test showing the p-value of 0117 for a t-value of 1.22

In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than \(\alpha\) = 0.05, the engineer fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

Note that the engineer obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Height of Sunflowers

A biologist was interested in determining whether sunflower seedlings treated with an extract from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and subsequently obtained the following heights:

The biologist's hypotheses are:

H 0 : μ = 15.7 H A : μ < 15.7

The biologist entered her data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. She obtained the following output:

$\mu$: mean of Height

Null hypothesis    H₀: $\mu$ = 15.7 Alternative hypothesis    H₁: $\mu$ < 15.7

The output tells us that the average height of the n = 33 sunflower seedlings was 13.664 with a standard deviation of 2.544. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 13.664 by the square root of n = 33, is 0.443). The test statistic t * is -4.60, and the P -value, 0.000, is to three decimal places.

Minitab Note. Minitab will always report P -values to only 3 decimal places. If Minitab reports the P -value as 0.000, it really means that the P -value is 0.000....something. Throughout this course (and your future research!), when you see that Minitab reports the P -value as 0.000, you should report the P -value as being "< 0.001."

If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t * were less than -1.6939 (determined using statistical software or a t -table):s-3-3

Since the biologist's test statistic, t * = -4.60, is less than -1.6939, the biologist rejects the null hypothesis. That is, the test statistic falls in the "critical region." There is sufficient evidence, at the α = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

If the biologist used the P -value approach to conduct her hypothesis test, she would determine the area under a t n - 1 = t 32 curve and to the left of the test statistic t * = -4.60:

t-distribution for left tailed test with significance level of 0.05 shown in left tail

In the output above, Minitab reports that the P -value is 0.000, which we take to mean < 0.001. Since the P -value is less than 0.001, it is clearly less than \(\alpha\) = 0.05, and the biologist rejects the null hypothesis. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

t-distribution graph for left tailed test with a t-value of -4.60 and left tail area of 0.000

Note again that the biologist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Gum Thickness

A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch. A quality control specialist regularly checks this claim. On one production run, he took a random sample of n = 10 pieces of gum and measured their thickness. He obtained:

The quality control specialist's hypotheses are:

H 0 : μ = 7.5 H A : μ ≠ 7.5

The quality control specialist entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

$\mu$: mean of Thickness

Null hypothesis    H₀: $\mu$ = 7.5 Alternative hypothesis    H₁: $\mu \ne$ 7.5

The output tells us that the average thickness of the n = 10 pieces of gums was 7.55 one-hundredths of an inch with a standard deviation of 0.1027. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 0.1027 by the square root of n = 10, is 0.0325). The test statistic t * is 1.54, and the P -value is 0.158.

If the quality control specialist sets his significance level \(\alpha\) at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were less than -2.2616 or greater than 2.2616 (determined using statistical software or a t -table):

t-distribution graph of two tails with a significance level of .05 and t values of -2.2616 and 2.2616

Since the quality control specialist's test statistic, t * = 1.54, is not less than -2.2616 nor greater than 2.2616, the quality control specialist fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all of the manufacturer's spearmint gum differs from 7.5 one-hundredths of an inch.

If the quality control specialist used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 9 curve, to the right of 1.54 and to the left of -1.54:

t-distribution graph for a two tailed test with t values of -1.54 and 1.54, the corresponding p-values are 0.0789732 on both tails

In the output above, Minitab reports that the P -value is 0.158. Since the P -value, 0.158, is greater than \(\alpha\) = 0.05, the quality control specialist fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all pieces of spearmint gum differs from 7.5 one-hundredths of an inch.

Note that the quality control specialist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

In our review of hypothesis tests, we have focused on just one particular hypothesis test, namely that concerning the population mean \(\mu\). The important thing to recognize is that the topics discussed here — the general idea of hypothesis tests, errors in hypothesis testing, the critical value approach, and the P -value approach — generally extend to all of the hypothesis tests you will encounter.

Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.

What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as \(H_{0}\). Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as \(H_{1}\) or \(H_{a}\). For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, \(\alpha\) or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

  • z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the size of the sample.
  • t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\). s is the sample standard deviation.
  • \(\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}\). \(O_{i}\) is the observed value and \(E_{i}\) is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

  • One sample: z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).
  • Two samples: z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

  • One sample: t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\).
  • Two samples: t = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

\(H_{0}\): The population parameter is ≤ some value

\(H_{1}\): The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Right Tail Hypothesis Testing

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

\(H_{0}\): The population parameter is ≥ some value

\(H_{1}\): The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Left Tail Hypothesis Testing

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

\(H_{0}\): the population parameter = some value

\(H_{1}\): the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Two Tail Hypothesis Testing

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

  • Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
  • Step 2: Set up the alternative hypothesis.
  • Step 3: Choose the correct significance level, \(\alpha\), and find the critical value.
  • Step 4: Calculate the correct test statistic (z, t or \(\chi\)) and p-value.
  • Step 5: Compare the test statistic with the critical value or compare the p-value with \(\alpha\) to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as \(H_{0}\): \(\mu\) = 100.

Step 2: The alternative hypothesis is given by \(H_{1}\): \(\mu\) > 100.

Step 3: As this is a one-tailed test, \(\alpha\) = 100% - 95% = 5%. This can be used to determine the critical value.

1 - \(\alpha\) = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

\(\mu\) = 100, \(\overline{x}\) = 112.5, n = 30, \(\sigma\) = 15

z = \(\frac{112.5-100}{\frac{15}{\sqrt{30}}}\) = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Related Articles:

  • Probability and Statistics
  • Data Handling

Important Notes on Hypothesis Testing

  • Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
  • It involves the setting up of a null hypothesis and an alternate hypothesis.
  • There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
  • Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

  • Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) > 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 5, s = 18. \(\alpha\) = 0.05 Using the t-distribution table, the critical value is 2.132 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
  • Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. \(H_{0}\): \(\mu\) = 80, \(H_{1}\): \(\mu\) ≠ 80 \(\overline{x}\) = 88, \(\mu\) = 80, n = 36, \(\sigma\) = 10. \(\alpha\) = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) z = \(\frac{88-80}{\frac{10}{\sqrt{36}}}\) = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
  • Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) < 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = \(\frac{82-90}{\frac{18}{\sqrt{6}}}\) t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

go to slide go to slide go to slide

z test hypothesis testing examples

Book a Free Trial Class

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) and for two samples is z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide \(\alpha\) by 2. This is done as there are two rejection regions in the curve.

IMAGES

  1. Z Test

    z test hypothesis testing examples

  2. Z-test- definition, formula, examples, uses, z-test vs t-test

    z test hypothesis testing examples

  3. Z Test

    z test hypothesis testing examples

  4. Hypothesis Testing using Z-test Statistics

    z test hypothesis testing examples

  5. Hypothesis Testing Problems

    z test hypothesis testing examples

  6. Two sample z test

    z test hypothesis testing examples

VIDEO

  1. Hypothesis test Z Test Part 5 Double Sample MBS First Semester Statistics TU Solution in Nepali

  2. Lec-39: Introduction to Z-Test

  3. Hypothesis Testing

  4. HYPOTHESIS TESTING PROBLEM-11 USING Z TEST VIDEO-14

  5. Z-test

  6. hypothesis testing |Population varience known & unknown| z test problems || (Bangla) #StatQuery

COMMENTS

  1. Z Test: Uses, Formula & Examples

    Related posts: Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels. Two-Sample Z Test Hypotheses. Null hypothesis (H 0): Two population means are equal (µ 1 = µ 2).; Alternative hypothesis (H A): Two population means are not equal (µ 1 ≠ µ 2).; Again, when the p-value is less than or equal to your significance level, reject the null hypothesis.

  2. Z Test

    The z test formula compares the z statistic with the z critical value to test whether there is a difference in the means of two populations. In hypothesis testing, the z critical value divides the distribution graph into the acceptance and the rejection regions.If the test statistic falls in the rejection region then the null hypothesis can be rejected otherwise it cannot be rejected.

  3. PDF The Z-test

    The Z-test January 9, 2021 Contents Example 1: (one tailed z-test) Example 2: (two tailed z-test) Questions Answers The z-test is a hypothesis test to determine if a single observed mean is signi cantly di erent (or greater or less than) the mean under the null hypothesis, hypwhen you know the standard deviation of the population.

  4. PDF Hypothesis Testing with z Tests

    Critical Values: Test statistic values beyond which we will reject the null hypothesis (cutoffs) p levels (α): Probabilities used to determine the critical value 5. Calculate test statistic (e.g., z statistic) 6. Make a decision Statistically Significant: Instructs us to reject the null hypothesis because the pattern in the data differs from

  5. Z-Test for Statistical Hypothesis Testing Explained

    A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution. The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations' mean. Z-tests are the most common statistical tests ...

  6. 10 Chapter 10: Hypothesis Testing with Z

    10. Chapter 10: Hypothesis Testing with Z. This chapter lays out the basic logic and process of hypothesis testing using a z. We will perform a test statistics using z, we use the z formula from chapter 8 and data from a sample mean to make an inference about a population.

  7. Z Test: Definition & Two Proportion Z-Test

    This tests for a difference in proportions. A two proportion z-test allows you to compare two proportions to see if they are the same. The null hypothesis (H 0) for the test is that the proportions are the same.; The alternate hypothesis (H 1) is that the proportions are not the same.; Example question: let's say you're testing two flu drugs A and B. Drug A works on 41 people out of a ...

  8. Z-test Calculator

    This Z-test calculator is a tool that helps you perform a one-sample Z-test on the population's mean. Two forms of this test - a two-tailed Z-test and a one-tailed Z-tests - exist, and can be used depending on your needs. You can also choose whether the calculator should determine the p-value from Z-test or you'd rather use the critical value ...

  9. Z-test

    A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution.Z-test tests the mean of a distribution. For each significance level in the confidence interval, the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t-test whose ...

  10. 6 Hypothesis Testing: the z-test

    In this chapter, we'll introduce hypothesis testing with examples from a 'z-test', when we're comparing a single mean to what we'd expect from a population with known mean and standard deviation. In this case, we can convert our observed mean into a z-score for the standard normal distribution. Hence the name z-test.

  11. Hypothesis Testing

    A sample of 30 patients who have tried the raw cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch had an effect. Step 1: State the null hypothesis: H 0 :μ=100. Step 2: State the alternate hypothesis: H 1 :≠100. Step 3: State your alpha level. We'll use 0.05 for this example.

  12. One Sample Z-Test: Definition, Formula, and Example

    One Sample Z-Test: Formula. A one sample z-test will always use one of the following null and alternative hypotheses: 1. Two-Tailed Z-Test. H0: μ = μ0 (population mean is equal to some hypothesized value μ0) HA: μ ≠ μ0 (population mean is not equal to some hypothesized value μ0) 2. Left-Tailed Z-Test. H0: μ ≥ μ0 (population mean is ...

  13. Z-Test Definition: Its Uses in Statistics Simply Explained With Example

    Z-Test: A z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large. The test statistic is assumed to have ...

  14. Z-Test: Formula, Examples, Uses, Z-Test vs T-Test

    Z-test is the most commonly used statistical tool in research methodology, with it being used for studies where the sample size is large (n>30). In the case of the z-test, the variance is usually known. Z-test is more convenient than t-test as the critical value at each significance level in the confidence interval is the sample for all sample ...

  15. Difference between Z-Test and T-Test

    A. A z-test is used to test a Null Hypothesis if the population variance is known, or if the sample size is larger than 30, for an unknown population variance. A t-test is used when the sample size is less than 30 and the population variance is unknown. Q2.

  16. Hypothesis Testing: Z-Scores

    Figure 3. Two-tailed test with alpha = 5% | Image by author. Before continuing, let's formalize a few things: The null hypothesis is rejected when the sample mean is associated with a low probability of occurrence. The null hypothesis is retained when the sample mean is associated with a high probability of occurrence.. Such probability of occurrence is better known as p-value.

  17. Z-test

    Steps to perform Z-test: First, identify the null and alternate hypotheses. Determine the level of significance (∝). Find the critical value of z in the z-test using. Calculate the z-test statistics. Below is the formula for calculating the z-test statistics. where, : mean of the sample. : mean of the population.

  18. 9.E: Hypothesis Testing with One Sample (Exercises)

    An Introduction to Statistics class in Davies County, KY conducted a hypothesis test at the local high school (a medium sized-approximately 1,200 students-small city demographic) to determine if the local high school's percentage was lower. One hundred fifty students were chosen at random and surveyed.

  19. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  20. Z-tests for Hypothesis testing: Formula & Examples

    Z-tests are statistical hypothesis testing techniques that are used to determine whether the null hypothesis relating to comparing sample means or proportions with that of population at a given significance level can be rejected or otherwise based on the z-statistics or z-score. As a data scientist, you must get a good understanding of the z-tests and its applications to test the hypothesis ...

  21. S.3.3 Hypothesis Testing Examples

    If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a tn - 1 = t24 curve and to the right of the test statistic t * = 1.22: In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than α = 0.05, the engineer fails to reject the null hypothesis.

  22. Z-Test

    Z-test is a statistical tool that is used in hypothesis testing. It is the go-to method when the sample size is large. The test derives the difference between the two large population samples, provided the variance is known. Z-tests are similar to t-tests; the only difference is that t-tests are conducted for small sample sizes or when the ...

  23. Hypothesis Testing

    Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant. It involves the setting up of a null hypothesis and an alternate hypothesis. There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.