AOS6 Topic 4: Hypothesis Testing

Hypothesis testing for the mean is a statistical procedure used to assess whether there is enough evidence to support a claim about the population mean. It involves comparing sample data to a null hypothesis \((H_0)\) and an alternative hypothesis \((H_1 or H_a)\) to determine if there is a significant difference between the sample mean and a hypothesized population mean.

Hypothesis Testing for IQ Scores

The mean and standard deviation for IQ scores in the general population are \(\mu = 100\) and \(\sigma = 15\). Suppose we believe that, in general, Year \(12\) mathematics students score higher on IQ tests than members of the general population. To investigate, we select a random sample of \(100\) Year \(12\) mathematics students and determine their mean IQ to be \(103.6\). This is \(3.6\) points higher than the mean IQ of people in general.

Is it reasonable to conclude that Year \(12\) mathematics students score higher on IQ tests than the general public? We already know that sample means will vary from sample to sample, and we would not expect the mean of an individual sample to have exactly the same value as the mean of the population from which it is drawn.

One explanation is that Year 12 mathematics students perform no better on IQ tests than members of the general public, and the difference between the mean score of the sample, \(\bar{x} = 103.6\), and that of the general population, \(\mu = 100\), is due to sampling variability.

Another explanation is that Year 12 mathematics students actually do better than average on IQ tests, and a sample mean of \(\bar{x} = 103.6\) is consistent with this explanation.

Hypothesis testing is concerned with deciding which of the two explanations is more likely, which we do on the basis of probability.

The Logic of a Hypothesis Test

A hypothesis test can be likened to a trial in a court of law. We begin with a hypothesis that we wish to find evidence to support. In a court, as a prosecutor, your intention is to show that the person is guilty. However, the starting point in the trial is that the person is innocent. It is up to the prosecutor to provide enough evidence to show that this assumption is untenable.

The assumption of innocence in hypothesis-testing terms is called the null hypothesis, denoted by H0. If we can collect evidence to show that the null hypothesis is untenable, we can conclude that there is support for an alternative hypothesis, denoted by H1.

Setting up the Hypotheses

In this IQ example, our hypothesis is that Year 12 mathematics students perform better than the general population on IQ tests. To test this with a hypothesis test, we start by assuming the opposite: we assume that Year 12 mathematics students perform no better on IQ tests than members of the general public. In statistical terms, we are saying that the distribution of IQ scores for these students is the same as for the general public.

For the general public, we know that IQ is normally distributed with a mean of \( \mu = 100 \) and a standard deviation of \( \sigma = 15 \). The null hypothesis is that the students are drawn from a population in which the mean is \( \mu = 100 \). We express this null hypothesis symbolically as \( H_0: \mu = 100 \)

The null hypothesis

The null hypothesis, \( H_0 \), says that the sample is drawn from a population which has the same mean as before (i.e. the population mean has not changed). Under the null hypothesis, any difference between the values of a sample statistic and the population parameter is explained by sample-to-sample variation.

In this case, we are hypothesizing that the mean IQ of Year 12 mathematics students is higher than that of the general population – that the sample comes from a population with mean \( \mu > 100 \). We express this alternative hypothesis symbolically as \( H_1 : \mu > 100 \).

The alternative hypothesis

The alternative hypothesis, \( H_1 \), says that the population mean has changed. That is, while there will always be some sampling variability, the amount of variation is so much that it is more likely that the sample has been drawn from a population with a different mean.

Note: Hypotheses are always expressed in terms of population parameters.

The Logic of a Hypothesis Test

The Test Statistic

How do we decide between the two hypotheses? Both in a court of law and in statistical hypothesis testing, evidence is collected. This evidence is then weighed up (considered) so that a decision can be made. In the court room, the jury functions as the decision maker, weighing the evidence to make a decision of guilty (the alternative hypothesis) or not guilty (the null hypothesis). In hypothesis testing, the evidence is contained in the sample data.

To help us make our decision, we generally summarize the data into a single statistic, called the test statistic. There are many test statistics that can be used. If we are testing a hypothesis about a population mean \(\mu\), then the obvious test statistic is the sample mean \(\overline{x}\).

If we find that the sample mean observed is very unlikely to have been obtained from a sample drawn from the hypothesized population, this will cause us to doubt the credibility of that hypothesized population mean. The statistical tool we use to determine the likelihood of this value of a test statistic is the distribution of sample means.

The p-value

The p-value is the probability of observing a value of the sample statistic as extreme as or more extreme than the one observed, assuming that the null hypothesis is true.

Explanation

Consider again the hypothesis that the mean IQ of Year 12 mathematics students is higher than that of the general population.

We have hypotheses:

H_0 : \( \mu = 100 \)
H_1 : \( \mu > 100 \)

and the mean of a sample of size 100 is \( \overline{x} = 103.6 \). Thus we can write:

p-value = \( \text{Pr}(X¯ \geq 103.6 | \mu = 100) \)

To get a picture as to how much we could reasonably expect the sample mean to vary from sample to sample, we can use simulation. The following dotplot shows the values of \( \overline{x} \) obtained from 100 samples (each of size 100) taken from a normal distribution with mean \( \mu = 100 \) and standard deviation \( \sigma = 15 \).

Distribution of the sample mean

If \( X \) is a normally distributed random variable with mean \( \mu \) and standard deviation \( \sigma \), then the distribution of the sample mean \( \bar{X} \) will also be normal, with mean \( E(\bar{X}) = \mu \) and standard deviation \( \text{sd}(\bar{X}) = \frac{\sigma}{\sqrt{n}} \), where \( n \) is the sample size.

Example

Thus, if the null hypothesis is true, then \( \bar{X} \) is normally distributed with \( E(\bar{X}) = \mu = 100 \) and \( \text{sd}(\bar{X}) = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{100}} = 1.5 \)

Therefore

\( \text{p-value} = \text{Pr}(\bar{X} \geq 103.6 | \mu = 100) = \text{Pr}\left(Z \geq \frac{103.6 - 100}{1.5}\right) = \text{Pr}(Z \geq 2.4) = 0.0082 \)

Thus, the p-value tells us that, if the mean IQ of Year 12 mathematics students is 100, then the likelihood of observing a sample mean as high as or higher than 103.6 is extremely small, only 0.0082.

Strength of evidence

Consider again our IQ example. The more unlikely it is that the sample we observed could be drawn from a population with a mean IQ of 100, the more convinced we are that the sample must come from a population with a higher IQ.

In general, the smaller the p-value, the smaller the probability that the sample is from a population with the mean under the null hypothesis, and thus the stronger the evidence against the null hypothesis.

How small does the p-value have to be to provide convincing evidence against the null hypothesis? The following table gives some conventions.

p-value	Conclusion
p-value > 0.05	insufficient evidence against \(H_0\)
p-value < 0.05 (5%)	good evidence against \(H_0\)
p-value < 0.01 (1%)	strong evidence against \(H_0\)
p-value < 0.001 (0.1%)	very strong evidence against \(H_0\)

For our IQ example, we interpret the p-value of 0.0082 as strong evidence against the null hypothesis and in support of our hypothesis that Year 12 mathematics students perform better than the general population on IQ tests.

Statistical Significance

The significance level of a test, \(α\), is the condition for rejecting the null hypothesis:

If the p-value is less than \(α\), then we reject the null hypothesis in favour of the alternative hypothesis.
If the p-value is greater than \(α\), then we do not reject the null hypothesis.

Significance Level

The most commonly used value for the significance level is 0.05 (5%), although 0.01 (1%) and 0.001 (0.1%) are sometimes used.

If the p-value is less than the significance level, say 0.05, then we say that the result is statistically significant at the 5% level.
If the p-value is greater than the significance level, then we say that the result is not statistically significant at the 5% level.

This approach to hypothesis testing is commonly used.

z-test

The hypothesis test for a mean of a sample drawn from a normally distributed population with known standard deviation is called a z-test.

Large samples

The central limit theorem tells us that, if the sample size is large enough, then the distribution of the sample mean of any random variable is approximately normal. Thus, a z-test can be used even when the distribution of the random variable is not known, provided the sample size is large enough. (For most distributions, a sample size of 30 is sufficient.)

Directional and Non-directional Hypotheses

We considered only situations where we had a pretty good idea as to the direction in which the mean might have changed. That is, we considered only that the mean IQ of Year 12 mathematics students might be higher than the general population, or that the fuel consumption of the new model car might be lower than the previous model. These are examples of directional hypotheses. When we translate these hypotheses into testable alternative hypotheses, we say that our sample has come from a population with mean more than 100 (for the IQ example) or less than 13.7 (for the fuel-consumption example).

The presence of a ‘less than’ sign (<) or a ‘greater than’ sign (>) in the alternative hypothesis indicates that we are dealing with a directional hypothesis. Only values of the sample mean more than 100 (for the IQ example) or less than 13.7 (for the fuel-consumption example) will lend support to the alternative hypothesis.

Now suppose that we do not know whether the fuel consumption of our new model car has increased or decreased. In this case, we would hypothesize that the fuel consumption is different for the new model (a non-directional hypothesis). We have to allow for the possibility of the sample mean being less than or greater than 13.7 litres per 100 km. We express this symbolically by using a ‘not equal to’ sign (≠) in the alternative hypothesis:

\(H_1 : \mu \neq 13.7\)

The presence of the ‘not equal to’ sign (≠) in the alternative hypothesis indicates that we are dealing with a non-directional hypothesis. A sample mean either greater than 13.7 or less than 13.7 could provide evidence to support this hypothesis.

One-tail tests

The directionality of the alternative hypothesis H1 determines how the p-value is calculated. For the directional hypothesis

\( H_1 : \mu > 13.7 \)

only a sample mean considerably greater than 13.7 will lend support to this hypothesis. Thus, in calculating the p-value, we only consider values in the upper tail of the normal curve.

For the directional hypothesis

\( H_1 : \mu < 13.7

only a sample mean considerably less than 13.7 will lend support to this hypothesis. Thus, in calculating the p-value, we only consider values in the lower tail of the normal curve.

Because the p-values for directional tests are given by an area in just one tail of the curve, these tests are commonly called one-tail tests.

Two-tail tests

For the non-directional hypothesis \( H_1 : \mu \neq 13.7 \), a sample mean that is either considerably less than 13.7 or considerably greater than 13.7 will lend support to this hypothesis. Thus, in calculating the p-value, we need to consider values in both tails of the normal curve.

Because the p-values for non-directional tests are given by an area in both tails of the curve, these tests are commonly called two-tail tests.

One-tail and two-tail tests

When the alternative hypothesis is directional (< or >), we carry out a one-tail test.
When the alternative hypothesis is non-directional (≠), we carry out a two-tail test.

p-value (two-tail test) = 2 × p-value (one-tail test)

Relating a two-tail test to a confidence interval

We established in Section 16A that a 95% confidence interval for the population mean \( \mu \) is given by

\[ \left( \overline{x} - 1.9600 \frac{\sigma}{\sqrt{n}}, \overline{x} + 1.9600 \frac{\sigma}{\sqrt{n}} \right) \]

There is a close relationship between confidence intervals and two-tail hypothesis tests. To explain this, we will use the following basic fact about intervals of the real number line:

\[ a \in (b - c, b + c) \Leftrightarrow |a - b| < c \Leftrightarrow b \in (a - c, a + c) \]

Now suppose that we are testing the hypotheses

\[ H_0 : \mu = \mu_0 \]

\[ H_1 : \mu \neq \mu_0 \]

Then we have

\[ \mu_0 \in \left( \overline{x} - 1.9600 \frac{\sigma}{\sqrt{n}}, \overline{x} + 1.9600 \frac{\sigma}{\sqrt{n}} \right) \Rightarrow \overline{x} \in \left( \mu_0 - 1.9600 \frac{\sigma}{\sqrt{n}}, \mu_0 + 1.9600 \frac{\sigma}{\sqrt{n}} \right) \]

Hence, the 95% confidence interval does not contain \( \mu_0 \) if and only if we should reject the null hypothesis at the 5% level of significance.

The p-value for a two-tail test

The p-value for a two-tail test can be defined as:

p-value = Pr(|X¯ - µ| ≥ |x¯ - µ|) = Pr(|Z| ≥ \frac{|x¯ - µ|}{\sigma/\sqrt{n}})

where:

µ is the population mean under the null hypothesis

x¯ is the observed value of the sample mean

σ is the value of the population standard deviation

n is the sample size

Example 1

Finding null hypotheses and alternative hypotheses

The average fuel consumption for a particular model of car is 13.7 litres per 100 km. The manufacturer is claiming that the new model will use less petrol. A sample of 25 of the new model cars had an average fuel consumption of 12.5 litres per 100 km. Write down the null and alternative hypotheses that the manufacturer will use in testing this claim.

Example 2

Consider again Example 1, where we are testing the hypotheses:

\( H_0 : \mu = 13.7 \)

\( H_1 : \mu < 13.7 \)

Assume that fuel consumption is normally distributed with a standard deviation of \( \sigma = 2.8 \) litres per 100 km. If the average fuel consumption for a sample of 25 cars is \( \bar{x} = 12.5 \) litres per 100 km, determine the p-value for this test.

Example 3

p-value

In Example 2, we obtained a p-value of 0.0161. How do we interpret this p-value?

Example 4

The lifetimes of a certain brand of ‘long-life’ batteries are normally distributed, with a mean of 240 hours and a standard deviation of 40 hours. After introducing a new manufacturing process, the company has had a number of customer complaints that have led them to believe that the batteries may have a shorter life than before. In order to check the length of battery life, a random sample of 25 batteries was selected and the mean battery life found to be 230 hours.

Write down the null and alternative hypotheses for this test.
Determine the p-value for this test.
Has the lifetime of the batteries decreased? Test at the 5% level of significance.

Example 5

Calculating null and alternative hypotheses and Test at the 5% level of significance.

The volume of coffee dispensed by a coffee machine is known to be normally distributed, with a mean of \( 200 \, \text{mL} \) and a standard deviation of \( 5 \, \text{mL} \). After a routine service, a test was carried out on the machine to check that it is still functioning properly. A random sample of 15 cups yielded a mean volume of \( 197.7 \, \text{mL} \).

a. Null Hypothesis (\( H_0 \)): The mean volume of coffee dispensed by the machine is still \( 200 \, \text{mL} \).

b. Alternative Hypothesis (\( H_1 \)): The mean volume of coffee dispensed by the machine is not equal to \( 200 \, \text{mL} \).

Use the given data to test whether the mean volume of coffee dispensed by the machine is still \( 200 \, \text{mL} \). Test at the 5% level of significance.

Example 6

In order to apply the symmetry of the normal distribution to determine such probabilities, the random variable must first be standardised.

Suppose that \( X \) is a normally distributed random variable with mean \( \mu = 10 \) and standard deviation \( \sigma = 5 \). Find the probability that a single value of \( X \) is at least 2 units from the mean.

Exercise &&1&& (&&1&& Question)

Exercise &&2&& (&&1&& Question)

Exercise &&3&& (&&1&& Question)

Exercise &&4&& (&&1&& Question)

Exercise &&5&& (&&1&& Question)

AOS6 Topic 4: Hypothesis Testing

Hypothesis Testing for IQ Scores

The Logic of a Hypothesis Test

Setting up the Hypotheses

The null hypothesis

The alternative hypothesis

The Logic of a Hypothesis Test

The Test Statistic

The p-value

Explanation

Distribution of the sample mean

Example

Strength of evidence

Statistical Significance

Significance Level

z-test

Large samples

Directional and Non-directional Hypotheses

One-tail tests

Two-tail tests

One-tail and two-tail tests

Relating a two-tail test to a confidence interval

The p-value for a two-tail test

Finding null hypotheses and alternative hypotheses

Solution:

Solution

p-value

Solution:

Solution:

Calculating null and alternative hypotheses and Test at the 5% level of significance.

Solution

In order to apply the symmetry of the normal distribution to determine such probabilities, the random variable must first be standardised.

Solution

In which parameters hypotheses are always expressed?

What is the null hypothesis \(H_0\) in the IQ example?

If the p-value obtained from a hypothesis test is \(0.001\), what does this indicate about the evidence against the null hypothesis?

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by \(25%\). State the null and alternative hypotheses.

Suppose that \( Z \) is a standard normal random variable. Find \( \text{Pr}(|Z| \geq 2) \).