Usually, defining a random variable begins by establishing:
- a sample space, that is, the set of all possible outcomes,
- a probability associated with this sample space,
- a function \(X\) that assigns a number to each outcome in the sample space.
This is quite a lengthy task. However, often, we prefer to directly define a random variable \(X\) with a given probability distribution, relying on the context of the situation being studied. For example, imagine we survey a class of 30 students about their siblings and obtain these results: 10 students have 0 siblings, 12 have 1 sibling, 5 have 2 siblings, and 3 have 3 siblings. We can then define the random variable \(X\) as the number of siblings of a randomly chosen student, with this probability distribution:
\(x\) | 0 | 1 | 2 | 3 |
\(P(X = x)\) | \(\frac{10}{30}\) | \(\frac{12}{30}\) | \(\frac{5}{30}\) | \(\frac{3}{30}\) |
The theorem below shows that it is always possible to construct a sample space, a probability, and a function \(X\) to obtain a random variable with this probability distribution.