AOS6 Topic 3: Confidence Intervals for Population Mean

In statistics, confidence intervals play a crucial role in estimating population parameters based on sample data.

When it comes to estimating the population mean, confidence intervals provide a range of values within which the true population mean is likely to fall. This statistical tool offers insights into the precision and reliability of our estimates, allowing us to make informed decisions and draw meaningful conclusions.

Definition:

A confidence interval for the population mean is a range of values calculated from sample data that is likely to contain the true population mean with a certain level of confidence. This level of confidence, often denoted by \(1 - \alpha\), where \(\alpha\) represents the significance level, reflects the probability that the interval contains the true population parameter. Commonly used confidence levels are 90%, 95%, and 99%.

Point Estimates

Suppose, for example, we are interested in the mean IQ score of all Year 12 mathematics students in Australia. The value of the population mean \( \mu \) is unknown. Collecting information about the whole population is not feasible, and so a random sample must suffice.

What information can be obtained from a single sample? Certainly, the sample mean \( \bar{x} \) gives some indication of the value of the population mean \( \mu \), and can be used when we have no other information.

The value of the sample mean \( \bar{x} \) can be used to estimate the population mean \( \mu \). Since this is a single-valued estimate, it is called a point estimate of \( \mu \).

Thus, if we select a random sample of 100 Year 12 mathematics students and find that their mean IQ is 108.6, then the value \( \bar{x} = 108.6 \) serves as an estimate of the population mean \( \mu \).

Interval Estimates

The value of the sample mean \( \bar{x} \) obtained from a single sample is going to change from sample to sample, and while sometimes the value will be close to the population mean \( \mu \), at other times it will not. To use a single value to estimate \( \mu \) can be rather risky. What is required is an interval that we are reasonably sure contains the parameter value \( \mu \).

An interval estimate for the population mean \( \mu \) is called a confidence interval for \( \mu \).

95% Confidence Interval

An approximate 95% confidence interval for \( \mu \) is given by

\( \left[\bar{x} - 1.9600 \frac{\sigma}{\sqrt{n}}, \bar{x} + 1.9600 \frac{\sigma}{\sqrt{n}}\right] \)

where:

\( \mu \) is the population mean (unknown)
\( \bar{x} \) is a value of the sample mean
\( \sigma \) is the value of the population standard deviation
\( n \) is the size of the sample from which \( \bar{x} \) was calculated

Central Limit Theorem

Let \( X \) be any random variable, with mean \( \mu \) and standard deviation \( \sigma \). Then, provided that the sample size \( n \) is large enough, the distribution of the sample mean \( \bar{X} \) is approximately normal with mean \( E(\bar{X}) = \mu \) and standard deviation \( \text{sd}(\bar{X}) = \frac{\sigma}{\sqrt{n}} \).

Approximate 95% Confidence Intervals

Central limit theorem tells us that, whatever the underlying distribution of the random variable \( X \), if the sample size \( n \) is large, then the sampling distribution of \( \bar{X} \) is approximately normal with \( E(\bar{X}) = \mu \) and \( \text{sd}(\bar{X}) = \frac{\sigma}{\sqrt{n}} \).

For the standard normal random variable \( Z \), we have \( \text{Pr}(-1.9600 < Z < 1.9600) = 0.95 \).

So we can state that, for large \( n \):

\( \text{Pr}\left(-1.9600 < \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} < 1.9600\right) \approx 0.95 \)

Multiplying through gives:

\( \text{Pr}\left(-1.9600 \frac{\sigma}{\sqrt{n}} < \bar{X} - \mu < 1.9600 \frac{\sigma}{\sqrt{n}}\right) \approx 0.95 \)

Further simplifying, we obtain:

\( \text{Pr}\left(\bar{X} - 1.9600 \frac{\sigma}{\sqrt{n}} < \mu < \bar{X} + 1.9600 \frac{\sigma}{\sqrt{n}}\right) \approx 0.95 \)

This final expression gives us an interval which, with 95% probability, will contain the value of the population mean \( \mu \) (which we do not know).

Note:Often when determining a confidence interval for the population mean, the population standard deviation \( \sigma \) is unknown. If the sample size is large (say \( n \geq 30 \)), then we can use the sample standard deviation \( s \) in this formula as an approximation to the population standard deviation \( \sigma \).

Changing the Level of Confidence

We can find an approximate confidence interval with a level of confidence other than 95% by using the same principles. For example, since we know that \( \text{Pr}(-1.6449 < Z < 1.6449) = 0.90 \), an approximate 90% confidence interval for \( \mu \) is given by

\( \left[\bar{x} - 1.6449 \frac{\sigma}{\sqrt{n}}, \bar{x} + 1.6449 \frac{\sigma}{\sqrt{n}}\right] \)

We can generalize these two examples as follows:

C% Confidence Interval

An approximate \( C\% \) confidence interval for \( \mu \) is given by

\( \left[\bar{x} - z \frac{\sigma}{\sqrt{n}}, \bar{x} + z \frac{\sigma}{\sqrt{n}}\right] \)

where:

\( z \) is such that \( \text{Pr}(-z < Z < z) = C\% \)
\( \mu \) is the population mean (unknown)
\( \bar{x} \) is a value of the sample mean
\( \sigma \) is the value of the population standard deviation
\( n \) is the size of the sample from which \( \bar{x} \) was calculated

Note:

The values of \( z \) (to four decimal places) for commonly used confidence intervals are:

90% \( z = 1.6449 \)
95% \( z = 1.9600 \)
99% \( z = 2.5758 \)

Interpretation of Confidence Intervals

The 95% confidence interval found in Example 1 should not be interpreted as meaning that \( \text{Pr}(105.66 < \mu < 111.54) = 0.95 \). Since \( \mu \) is a constant, the value either does or does not lie in the stated interval.

The correct interpretation of a 95% confidence interval is that we expect approximately 95% of such intervals to contain the population mean \( \mu \). Whether or not the particular confidence interval obtained contains the population mean \( \mu \) is generally not known.

If we were to repeat the process of taking a sample and calculating a confidence interval many times, the result would be something like that indicated in the diagram.

The diagram shows the confidence intervals obtained when 20 different samples were drawn from the same population.

The value of the population mean \( \mu \) is indicated by the vertical line, and it is, of course, constant.

It is quite easy to see from the diagram that none of the values of the sample estimate is exactly the same as the population mean, but that all the intervals except one (19 out of 20, or 95%) have captured the value of the population mean, as would be expected in the case of a 95% confidence interval.

Effect of Sample Size on Confidence Intervals

We saw in Example 2 that increasing the level of confidence increases the width of the confidence interval. The width of a confidence interval is important, as for a confidence interval to be useful it should not be too wide. The distance between the sample mean and the endpoints of a confidence interval is called the margin of error. The smaller the margin of error, the better the estimate of the population mean.

Since the width of the confidence interval is inversely proportional to the square root of the sample size, it makes sense that a better way to decrease the width of the confidence interval is to increase the sample size.

Example 1

Finding a 95% Confidence Interval

Find an approximate 95% confidence interval for the mean IQ of Year 12 mathematics students in Australia, if we select a random sample of 100 students and find the sample mean \( \bar{x} \) to be 108.6. Assume that the standard deviation for this population is 15.

Solution:

The interval is found by substituting \( \bar{x} = 108.6 \), \( n = 100 \), and \( \sigma = 15 \) into the expression for an approximate 95% confidence interval:

\( \left[\bar{x} - 1.9600 \frac{\sigma}{\sqrt{n}}, \bar{x} + 1.9600 \frac{\sigma}{\sqrt{n}}\right] \)

= \( \left[108.6 - 1.9600 \times \frac{15}{\sqrt{100}}, 108.6 + 1.9600 \times \frac{15}{\sqrt{100}}\right] \)

= (105.66, 111.54)

Thus, based on a sample of size 100 and a sample estimate of 108.6, an approximate 95% confidence interval for the population mean \( \mu \) is (105.66, 111.54).

Example 2

Comparing Confidence Intervals

Calculate and compare 90%, 95%, and 99% confidence intervals for the mean IQ of Year 12 mathematics students in Australia, if we select a random sample of 100 students and find the sample mean \( \bar{x} \) to be 108.6. (Assume that \( \sigma = 15 \).)

Solution:

From Example 1, we know that the 95% confidence interval is (105.66, 111.54).

The 90% confidence interval is

\( \left[108.6 - 1.6449 \times \frac{15}{\sqrt{10}}, 108.6 + 1.6449 \times \frac{15}{\sqrt{10}}\right] \)

= (106.13, 111.07)

The 99% confidence interval is

\( \left[108.6 - 2.5758 \times \frac{15}{\sqrt{10}}, 108.6 + 2.5758 \times \frac{15}{\sqrt{10}}\right] \)

= (104.74, 112.46)

We see that increasing the level of confidence increases the width of the confidence interval.

Example 3

Expected Number of Intervals Containing Population Mean

Suppose that the process of taking a sample and determining a confidence interval based on the sample mean was repeated 200 times. How many of these intervals would be expected to contain the value of the population mean \( \mu \) if the level of confidence is:

a) 90%
b) 95%

Solution:

a) We expect \( 0.90 \times 200 = 180 \) of the 90% confidence intervals to contain \( \mu \).

b) We expect \( 0.95 \times 200 = 190 \) of the 95% confidence intervals to contain \( \mu \).

Example 4

Comparison of 95% Confidence Intervals

Calculate and compare 95% confidence intervals for the mean IQ of Year 12 mathematics students in Australia, if:

We select a random sample of 100 students and find the sample mean \( \bar{x} \) to be 108.6
We select a random sample of 400 students and find the sample mean \( \bar{x} \) to be 108.6

Solution:

From Example 1, the first 95% confidence interval is (105.66, 111.54).

The second 95% confidence interval is:

\( \left( 108.6 - 1.9600 \times \frac{15}{\sqrt{400}}, 108.6 + 1.9600 \times \frac{15}{\sqrt{400}} \right) = (107.13, 110.07) \)

Thus, the confidence interval based on a sample of size 400 is narrower than the confidence interval based on a sample of size 100.

In this example, by increasing the sample size, we obtained a narrower 95% confidence interval and therefore a better estimate for the population mean \( \mu \). In fact, since we have increased the sample size by a factor of 4 (from 100 to 400), we can readily verify that we have decreased the width of the confidence interval by a factor of 2.

Example 5

Decreasing Width of Confidence Interval

A confidence interval is used to estimate the population mean \( \mu \) based on a sample mean \( \bar{x} \).

By what factor must the sample size be increased in order to decrease the width of the confidence interval by 80%?

Solution:

Let \( n_1 \) be the current sample size, and let \( n_2 \) be the new sample size.

Let \( W_1 \) be the width of the current confidence interval, and let \( W_2 \) be the width of the new confidence interval. Then

\( W_1 = 2z \frac{\sigma}{\sqrt{n_1}} \) and \( W_2 = 2z \frac{\sigma}{\sqrt{n_2}} \) where the value of \( z \) is determined by the level of confidence.

For the width to decrease by 80%, we require:

\( W_2 = 0.2 \times W_1 \)

\( 2z \frac{\sigma}{\sqrt{n_2}} = 0.2 \times 2z \frac{\sigma}{\sqrt{n_1}} \)

\( \sqrt{n_2} = 5\sqrt{n_1} \)

\( \therefore n_2 = 25n_1 \)

The sample size should be increased by a factor of 25.

Example 6

Sample Size Calculation for Confidence Interval

Consider again the problem of estimating the mean IQ of Year 12 mathematics students in Australia. What size sample is required in order to ensure that the difference between the sample mean and the population mean is 1.5 points or less at the 95% confidence level? (Assume that \( \sigma = 15 \)).

Solution:

The distance between the sample mean \( \bar{x} \) and the endpoints of the 95% confidence interval should be less than or equal to 1.5. Therefore, we require \( 1.9600 \times \frac{15}{\sqrt{n}} \leq 1.5 \).

Hence \( n \geq \left( \frac{1.9600 \times 15}{1.5} \right)^2 = 384.16 \).

We require a sample of at least 385 students.

Exercise 1

Practice Makes Perfect

Exercise 2

Practice Makes Perfect

Exercise 3

Practice Makes Perfect

Exercise 4

Practice Makes Perfect