AOS4 Topic 10: Sampling & Estimation

Estimation and sampling are fundamental concepts in statistics, crucial for making inferences about a population based on a subset of data. These concepts are widely used in various fields, including economics, medicine, social sciences, and quality control, where it's often impractical or impossible to collect data from an entire population.

What is Sampling?

Sampling is the process of selecting a subset of individuals, items, or observations from a larger population. The goal is to gather data from this subset, called a sample, to make inferences about the population as a whole. Sampling is essential when it is too costly, time-consuming, or impossible to collect data from every member of the population.

Estimation

Estimation refers to the process of inferring the value of a population parameter based on sample data. Since it is often impractical to measure the entire population, estimation provides a means to approximate population parameters, such as the mean, variance, or proportion.

Populations and Samples

The set of all eligible members of a group which we intend to study is called a population. For example, if we are interested in the IQ scores of the Year 12 students at ABC Secondary College, then this group of students could be considered a population; we could collect and analyse all the IQ scores for these students. However, if we are interested in the IQ scores of all Year 12 students across Australia, then this becomes the population.

Often, dealing with an entire population is not practical:

  • The population may be too large – for example, all Year 12 students in Australia.
  • The population may be hard to access – for example, all blue whales in the Pacific Ocean.
  • The data collection process may be destructive – for example, testing every battery to see how long it lasts would mean that there were no batteries left to sell.

Nevertheless, we often wish to make statements about a property of a population when data about the entire population is unavailable. The solution is to select a subset of the population – called a sample – in the hope that what we find out about the sample is also true about the population it comes from. Dealing with a sample is generally quicker and cheaper than dealing with the whole population, and a well-chosen sample will give much useful information about this population. How to select the sample then becomes a very important issue.

Random Samples

Suppose we are interested in investigating the effect of sustained computer use on the eyesight of a group of university students. To do this we go into a lecture theatre containing the students and select all the students sitting in the front two rows as our sample. This sample may be quite inappropriate, as students who already have problems with their eyesight are more likely to be sitting at the front, and so the sample may not be typical of the population. To make valid conclusions about the population from the sample, we would like the sample to have a similar nature to the population.

While there are many sophisticated methods of selecting samples, the general principle of sample selection is that the method of choosing the sample should not favour or disfavour any subgroup of the population. Since it is not always obvious if the method of selection will favour a subgroup or not, we try to choose the sample so that every member of the population has an equal chance of being in the sample. In this way, all subgroups have a chance of being represented. The way we do this is to choose the sample at random.

Definition

A sample of size n is called a simple random sample if it is selected from the population in such a way that every subset of size n has an equal chance of being chosen as the sample. In particular, every member of the population must have an equal chance of being included in the sample.

To choose a sample from the group of university students, we could put the name of every student in a hat and then draw out, one at a time, the names of the students who will be in the sample.

Choosing the sample in an appropriate manner is critical in order to obtain usable results.

Note:

In order to make valid conclusions about a population from a sample, we would like the sample chosen to be representative of the population as a whole. This means that all the different subgroups present in the population appear in the sample in similar proportions as they do in the population.

The Sample Proportion as a Random Variable

Suppose that our population of interest is the class of students from Example 2, and suppose further that we are particularly interested in the proportion of female students in the class. This is called the population proportion and is generally denoted by p. The population proportion p is constant for a particular population.

Population proportion (p) =
number in population with attribute / population size

In this class, there are 10 females, so the proportion of female students in the class is:

p =
10 / 20 = 1 / 2

Now consider the proportion of female students in the sample chosen:
Sue, Georgia, Miller, Matt, Tom, David

The proportion of females in the sample may be calculated by dividing the number of females in the sample by the sample size. In this case, there are two females in the sample, so the proportion of female students in the sample is:

Sample proportion () = 2 / 6 = 1 / 3

This value is called the sample proportion and is denoted by (we say ‘p hat’).

Sample proportion () =
number in sample with attribute / sample size

Note that different symbols are used for the sample proportion and the population proportion, so that we don’t confuse them. In this particular case:

= 1 / 3, which is not the same as the population proportion p = 1 / 2.

This does not mean there is a problem. In fact, each time a sample is selected, the number of females in the sample will vary. Sometimes the sample proportion will be 1 / 2, and sometimes it will not.

  • The population proportion p is a population parameter; its value is constant.
  • The sample proportion is a sample statistic; its value is not constant, but varies from sample to sample.

The Exact Distribution of the Sample Proportion

We have seen that the sample proportion varies from sample to sample. We can use our knowledge of probability to further develop our understanding of the sample proportion.

Sampling from a Small Population

Suppose we have a bag containing six blue balls and four red balls, and from the bag, we take a sample of size 4. We are interested in the proportion of blue balls in the sample. We know that the population proportion is equal to:

Population proportion = 610 = 35

That is, p = 0.6

The probabilities associated with the possible values of the sample proportion ˆp can be calculated either by direct consideration of the sample outcomes or by using our knowledge of selections. Recall that:

C(n, x) = n!(x!(n - x)!)

This is the number of different ways to select x objects from n objects.

Sample Proportion Table

The following table gives the probability of obtaining each possible sample proportion \(\hat{p}\) when selecting a random sample of size 4 from the bag.

Number of blue balls in the sample (x) 0 1 2 3 4
Proportion of blue balls in the sample, \(\hat{p}\) 0 \(\frac{1}{4}\) \(\frac{1}{2}\) \(\frac{3}{4}\) 1
Probability \(\frac{1}{210}\) \(\frac{24}{210}\) \(\frac{90}{210}\) \(\frac{80}{210}\) \(\frac{15}{210}\)

We can see from the table that we can consider the sample proportion as a random variable, \(\hat{P}\), and we can write:

  • Pr(P̂ = 0) = 1/210
  • Pr(P̂ = 1/4) = 24/210
  • Pr(P̂ = 1/2) = 90/210
  • Pr(P̂ = 3/4) = 80/210
  • Pr(P̂ = 1) = 15/210

The possible values of ˆp and their associated probabilities together form a probability distribution for the random variable P̂, which can be summarised as follows:

0 1/4 1/2 3/4 1
Pr(P̂ = p̂) 1/210 24/210 90/210 80/210 15/210

The distribution of a statistic which is calculated from a sample (such as the sample proportion) has a special name – it is called a sampling distribution.

Sampling from a Large Population

Generally, when we select a sample it is from a population which is too large or too difficult to enumerate or even count – populations such as all the people in Australia, or all the cows in Texas, or all the people who will ever have asthma. When the population is so large, we assume that the probability of observing the attribute we are interested in remains constant with each selection, irrespective of prior selections for the sample.

Suppose we know that 70% of all 17-year-olds in Australia attend school. That is, \( p = 0.7 \). We will assume that this probability remains constant for all selections for the sample.

Now consider selecting a random sample of size 4 from the population of all 17-year-olds in Australia. This time we can use our knowledge of binomial distributions to calculate the associated probability for each possible value of the sample proportion \( \hat{p} \), using the probability function:

\[ \text{Pr}(X = x) = \binom{4}{x} \cdot 0.7^x \cdot 0.3^{4-x} \quad \text{where} \quad x = 0, 1, 2, 3, 4 \]

The following table gives the probability of obtaining each possible sample proportion \( \hat{p} \) when selecting a random sample of four 17-year-olds.

Sampling Distribution Table

Number at school in the sample (x) Proportion at school in the sample, \( \hat{p} \) Probability
0 0 0.0081
1 0.25 0.0756
2 0.5 0.2646
3 0.75 0.4116
4 1 0.2401

Once again, we can summarise the sampling distribution of the sample proportion as follows:

\( \hat{p} \) 0 0.25 0.5 0.75 1
Pr(\( \hat{P} = \hat{p} \)) 0.0081 0.0756 0.2646 0.4116 0.2401

The population that the sample of size \( n = 4 \) is being taken from is such that each item selected has a probability \( p = 0.7 \) of success. Thus we can define the random variable \( \hat{P} = \frac{X}{4} \) where \( X \) is a binomial random variable with parameters \( n = 4 \) and \( p = 0.7 \). To emphasise this we can write:

x 0 1 2 3 4
\( \hat{p} = \frac{x}{4} \) 0 0.25 0.5 0.75 1
Pr(\( \hat{P} = \hat{p} \)) = Pr(X = x) 0.0081 0.0756 0.2646 0.4116 0.2401

Note: The probabilities for the sample proportions, \( \hat{p} \), correspond to the probabilities for the numbers of successes, \( x \).

Example 1

A researcher wishes to evaluate how well the local library is catering to the needs of a town’s residents. To do this, she hands out a questionnaire to each person entering the library over the course of a week. Will this method result in a random sample?

Solution

Since the members of the sample are already using the library, they are possibly satisfied with the service available. Additional valuable information might well be obtained by finding out the opinion of those who do not use the library.

A better sample would be obtained by selecting at random from the town’s entire population, so the sample contains both people who use the library and people who do not.

Example 2

Use a random number generator to select a group of six students from the following class:

Student Name Assigned Number
Denice 1
Matt 2
Teresa 3
Sue 4
Shanyn 5
Mark 6
Arnold 7
Nick 8
Miller 9
William 10
Lulu 11
Darren 12
Tom 13
David 14
Lacey 15
Janelle 16
Mike 17
Jane 18
Georgia 19
Jaimie 20

Solution

First assign a number to each member of the class:

Student Name Assigned Number
Denice 1
Matt 2
Teresa 3
Sue 4
Shanyn 5
Mark 6
Arnold 7
Nick 8
Miller 9
William 10
Lulu 11
Darren 12
Tom 13
David 14
Lacey 15
Janelle 16
Mike 17
Jane 18
Georgia 19
Jaimie 20

Generating six random integers from 1 to 20 gives on this occasion: 4, 19, 9, 2, 13, 14.

The sample chosen is thus:

Student Name Assigned Number
Sue 4
Georgia 19
Miller 9
Matt 2
Tom 13
David 14

Note: In this example, we want a list of six random integers without repeats. We do not add a randomly generated integer to our list if it is already in the list.

Example 3

Use a random number generator to select another group of six students from the same class, and determine the proportion of females in the sample.

Student Name Assigned Number
Denice 1
Matt 2
Teresa 3
Sue 4
Shanyn 5
Mark 6
Arnold 7
Nick 8
Miller 9
William 10
Lulu 11
Darren 12
Tom 13
David 14
Lacey 15
Janelle 16
Mike 17
Jane 18
Georgia 19
Jaimie 20

Solution

Generating another six random integers from 1 to 20 gives: 19, 3, 11, 9, 15, 1.

The sample chosen is:

  • Georgia
  • Teresa
  • Lulu
  • Miller
  • Lacey
  • Denice

For this sample, we have:

Sample Proportion (p̂) = 5/6

Note: Since (p̂) varies according to the contents of the random samples, we can consider the sample proportions (p̂) as being the values of a random variable, which we will denote by (P̂). We investigate this idea further in the next section.

Example 4

A bag contains six blue balls and four red balls. If we take a random sample of size 4, what is the probability that there is one blue ball in the sample \(\hat{p} = \frac{1}{4}\)?

Solution

Method 1

Consider selecting the sample by taking one ball from the bag at a time (without replacement). The favourable outcomes are RRRB, RRBR, RBRR, and BRRR, with:

\[ \text{Pr}\left(\{RRRB, RRBR, RBRR, BRRR\}\right) = \left( \frac{4}{10} \times \frac{3}{9} \times \frac{2}{8} \times \frac{6}{7} \right) + \left( \frac{4}{10} \times \frac{3}{9} \times \frac{6}{8} \times \frac{2}{7} \right) + \left( \frac{4}{10} \times \frac{6}{9} \times \frac{3}{8} \times \frac{2}{7} \right) + \left( \frac{6}{10} \times \frac{4}{9} \times \frac{3}{8} \times \frac{2}{7} \right) = \frac{4}{35} \]

Method 2

In total, there are \[ \binom{10}{4} = 210 \] ways to select 4 balls from 10 balls.

There are \[ \binom{4}{3} = 4 \] ways of choosing 3 red balls from 4 red balls, and \[ \binom{6}{1} = 6 \] ways of choosing one blue ball from 6 blue balls.

Thus, the probability of obtaining 3 red balls and one blue ball is equal to:

\[ \frac{\binom{4}{3} \times \binom{6}{1}}{\binom{10}{4}} = \frac{24}{210} = \frac{4}{35} \]

Example 5

A bag contains six blue balls and four red balls. Use the sampling distribution in the previous table to determine the probability that the proportion of blue balls in a sample of size 4 is more than \( \frac{1}{4} \).

Solution

\[ Pr\left( \hat{P} > \frac{1}{4} \right) = Pr\left( \hat{P} = \frac{1}{2} \right) + Pr\left( \hat{P} = \frac{3}{4} \right) + Pr\left( \hat{P} = 1 \right) \]

\[ = \frac{90}{210} + \frac{80}{210} + \frac{15}{210} \]

\[ = \frac{185}{210} = \frac{37}{42} \]

Example 6

Use the sampling distribution in the previous table to determine the probability that, in a random sample of four Australian 17-year-olds, the proportion attending school is less than 50%.

Solution

\[ \text{Pr}(\hat{P} < 0.5) = \text{Pr}(\hat{P} = 0) + \text{Pr}(\hat{P} = 0.25) \]

\[ = 0.0081 + 0.0756 \]

\[ = 0.0837 \]

Exercise &&1&& (&&1&& Question)

What is the population proportion (p) in a sample if a bag contains 6 blue balls and 4 red balls?

1
Submit

Exercise &&2&& (&&1&& Question)

If you take a random sample of size 4 from a bag with 6 blue balls and 4 red balls, what is the probability that the proportion of blue balls in the sample is 0?

2
Submit

Exercise &&3&& (&&1&& Question)

Given a bag with 6 blue balls and 4 red balls, if the sample size is 4, what is the probability that exactly 1 blue ball is selected?

3
Submit

Exercise &&4&& (&&1&& Question)

What does the sample proportion  \(\hat{P}\) represent in sampling distributions?

4
Submit