Let \( X \) be a normal random variable which represents a particular measure on a population (for example, IQ scores or rope lengths). The mean of \( X \) is \( \mu \) and the standard deviation is \( \sigma \). Samples of size \( n \) selected from this population can be described by independent random variables \( X_1, X_2, \ldots, X_n \) with identical distributions to \( X \).
The sample mean is defined as
\( \bar{X} = \frac{{X_1 + X_2 + \ldots + X_n}}{n} \)
Since \( \bar{X} \) is a linear combination of independent normal random variables, the random variable \( \bar{X} \) is also normally distributed.
The expected value of \( \bar{X} \) can be found using our general result for linear combinations:
\( E(\bar{X}) = E\left(\frac{1}{n} (X_1 + X_2 + \cdots + X_n)\right) \)
\( = \frac{1}{n} (E(X_1) + E(X_2) + \cdots + E(X_n)) \) where \( a_1 = a_2 = \cdots = a_n = \frac{1}{n} = n \times \frac{1}{n} \times \mu \) since \( E(X_i) = E(X) = \mu \)
\( = \mu \)
Similarly, we can find the variance of \( \bar{X} \):
\( \text{Var}(\bar{X}) = \text{Var}\left(\frac{1}{n} (X_1 + X_2 + \cdots + X_n)\right) \)
\( = \frac{1}{n^2} (\text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n)) \)
\( = n \times \left(\frac{1}{n^2}\right) \times \sigma^2 \)
\( = \frac{\sigma^2}{n} \)
We can summarise our results as follows.
Let \( X \) be a normally distributed random variable with mean \( \mu \) and standard deviation \( \sigma \).
Let \( X_1, X_2, \ldots, X_n \) represent a sample of size \( n \) selected from this population. The sample mean is defined as
\( \bar{X} = \frac{{X_1 + X_2 + \ldots + X_n}}{n} \)
The sample mean \( \bar{X} \) is normally distributed with \( E(\bar{X}) = \mu \) and \( sd(\bar{X}) = \frac{\sigma}{\sqrt{n}} \).
In the previous section, we made assertions about the distribution of the sample mean \( \bar{X} \), when \( X \) is a normally distributed random variable. In this section, we use simulation to validate these assertions empirically.
Consider the random variable IQ, which we assume is normally distributed with a mean of \( \mu = 100 \) and a standard deviation of \( \sigma = 15 \) in a given population. We will begin by simulating the drawing of a random sample of size 10 from this population.
One random sample of 10 scores, obtained by simulation, is:
105, 109, 104, 86, 118, 100, 81, 94, 70, 88
Recall that the sample mean is denoted by \( \bar{x} \) and that \( \bar{x} = \frac{\sum x}{n} \) where \( \sum \) means ‘sum’ and \( n \) is the size of the sample.
Here the sample mean is:
\[ \bar{x} = \frac{105 + 109 + 104 + 86 + 118 + 100 + 81 + 94 + 70 + 88}{10} = 95.5 \]A second sample, also obtained by simulation, is:
114, 124, 128, 133, 95, 107, 117, 91, 115, 104
with sample mean:
\[ \bar{x} = \frac{114 + 124 + 128 + 133 + 95 + 107 + 117 + 91 + 115 + 104}{10} = 112.8 \]Since \( \bar{x} \) varies according to the contents of the random samples, we consider the sample means \( \bar{x} \) as being the values of a random variable, which we denote by \( \bar{X} \).
Since \( \bar{x} \) is a statistic which is calculated from a sample, the probability distribution of the random variable \( \bar{X} \) is called a sampling distribution.
The sampling distribution of the sample mean refers to the distribution of sample means obtained from multiple random samples of the same size drawn from a population. In statistical terms, if we repeatedly draw samples of the same size from a population and calculate the mean for each sample, the collection of these sample means forms the sampling distribution of the sample mean.