AOS6 Topic 2: Sample Mean Distribution
The sample mean
Let \( X \) be a normal random variable which represents a particular measure on a population (for example, IQ scores or rope lengths). The mean of \( X \) is \( \mu \) and the standard deviation is \( \sigma \). Samples of size \( n \) selected from this population can be described by independent random variables \( X_1, X_2, \ldots, X_n \) with identical distributions to \( X \).
The sample mean is defined as
\( \bar{X} = \frac{{X_1 + X_2 + \ldots + X_n}}{n} \)
Since \( \bar{X} \) is a linear combination of independent normal random variables, the random variable \( \bar{X} \) is also normally distributed.
The expected value of \( \bar{X} \) can be found using our general result for linear combinations:
\( E(\bar{X}) = E\left(\frac{1}{n} (X_1 + X_2 + \cdots + X_n)\right) \)
\( = \frac{1}{n} (E(X_1) + E(X_2) + \cdots + E(X_n)) \) where \( a_1 = a_2 = \cdots = a_n = \frac{1}{n} = n \times \frac{1}{n} \times \mu \) since \( E(X_i) = E(X) = \mu \)
\( = \mu \)
Similarly, we can find the variance of \( \bar{X} \):
\( \text{Var}(\bar{X}) = \text{Var}\left(\frac{1}{n} (X_1 + X_2 + \cdots + X_n)\right) \)
\( = \frac{1}{n^2} (\text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n)) \)
\( = n \times \left(\frac{1}{n^2}\right) \times \sigma^2 \)
\( = \frac{\sigma^2}{n} \)
We can summarise our results as follows.
The Sample Mean of a Normal Random Variable
Let \( X \) be a normally distributed random variable with mean \( \mu \) and standard deviation \( \sigma \).
Let \( X_1, X_2, \ldots, X_n \) represent a sample of size \( n \) selected from this population. The sample mean is defined as
\( \bar{X} = \frac{{X_1 + X_2 + \ldots + X_n}}{n} \)
The sample mean \( \bar{X} \) is normally distributed with \( E(\bar{X}) = \mu \) and \( sd(\bar{X}) = \frac{\sigma}{\sqrt{n}} \).
Investigating the distribution of the sample mean using simulation
In the previous section, we made assertions about the distribution of the sample mean \( \bar{X} \), when \( X \) is a normally distributed random variable. In this section, we use simulation to validate these assertions empirically.
Consider the random variable IQ, which we assume is normally distributed with a mean of \( \mu = 100 \) and a standard deviation of \( \sigma = 15 \) in a given population. We will begin by simulating the drawing of a random sample of size 10 from this population.
One random sample of 10 scores, obtained by simulation, is:
105, 109, 104, 86, 118, 100, 81, 94, 70, 88
Recall that the sample mean is denoted by \( \bar{x} \) and that \( \bar{x} = \frac{\sum x}{n} \) where \( \sum \) means ‘sum’ and \( n \) is the size of the sample.
Here the sample mean is:
\[ \bar{x} = \frac{105 + 109 + 104 + 86 + 118 + 100 + 81 + 94 + 70 + 88}{10} = 95.5 \]A second sample, also obtained by simulation, is:
114, 124, 128, 133, 95, 107, 117, 91, 115, 104
with sample mean:
\[ \bar{x} = \frac{114 + 124 + 128 + 133 + 95 + 107 + 117 + 91 + 115 + 104}{10} = 112.8 \]Since \( \bar{x} \) varies according to the contents of the random samples, we consider the sample means \( \bar{x} \) as being the values of a random variable, which we denote by \( \bar{X} \).
Since \( \bar{x} \) is a statistic which is calculated from a sample, the probability distribution of the random variable \( \bar{X} \) is called a sampling distribution.
Sampling Distribution of the Sample Mean
The sampling distribution of the sample mean refers to the distribution of sample means obtained from multiple random samples of the same size drawn from a population. In statistical terms, if we repeatedly draw samples of the same size from a population and calculate the mean for each sample, the collection of these sample means forms the sampling distribution of the sample mean.