\( \definecolor{colordef}{RGB}{249,49,84} \definecolor{colorprop}{RGB}{18,102,241} \)

Sampling

Sampling is a branch of mathematics at the crossroads of statistics and probability. It is used when we want to study properties of a population that is too large to be observed entirely. Instead, we select a sample of this population to obtain information about the total population with a certain degree of precision.

The Concept of a Sample

Definition Random Sample
A random sample of size \(n\) consists of the results of \(n\) independent repetitions of the same random experiment.
Example
An opaque bag contains blue and red marbles. We draw a marble at random, record its color, and place it back in the bag. By repeating this process \(n\) times, we constitute a sample of marbles from the bag.

Sampling Fluctuation

Definition Sampling Fluctuation
In a population, we denote by \(p\) the proportion of individuals possessing a certain characteristic. When we take a sample and observe the frequency \(f\) of this characteristic, we notice that \(f\) varies from one sample to another. This phenomenon is called sampling fluctuation.
Example
In a high school, \(63\pourcent\) of the students are girls (\(p=0.63\)). If we take 10 different samples of 50 students each, the observed percentage of girls might look like this:
Sample 1 2 3 4 5 6 7 8 9 10
Percentage of girls \(62\pourcent\) \(68\pourcent\) \(60\pourcent\) \(68\pourcent\) \(66\pourcent\) \(68\pourcent\) \(68\pourcent\) \(54\pourcent\) \(66\pourcent\) \(70\pourcent\)
We observe that the frequencies fluctuate around the true population proportion of \(0.63\).

The Law of Large Numbers

Proposition Law of Large Numbers
As the sample size \(n\) becomes very large, the observed frequency \(f\) of a characteristic tends to get closer and closer to the true probability \(p\).
Example
When tossing a fair coin, the probability of Heads is \(p = 0.5\). A computer simulation of different sample sizes gives:
Sample size \(n\) 100 1,000 10,000 100,000
Number of Heads 54 490 5,010 49,942
Observed frequency \(f\) 0.54 0.49 0.501 0.49942
As \(n\) increases, the frequency \(f\) settles closer to \(0.5\).

Estimation

Definition Estimation
When the true proportion \(p\) of a characteristic in a population is unknown, we can use the frequency \(f\) observed in a random sample as an estimation of \(p\).
Example
Before a referendum, a survey is conducted on 1,200 people. If 636 people say they will vote "Yes," the observed frequency is:$$ f = \frac{636}{1200} = 0.53 $$Thus, an estimation of the proportion of "Yes" voters in the whole country is \(0.53\) (or \(53\pourcent\)).