AOS4 Topic 7: Cumulative Distribution Functions

In Probability and Statistics, the Cumulative Distribution Function (CDF) of a real-valued random variable, say “X”, which is evaluated at x, is the probability that X takes a value less than or equal to x. A random variable is a variable that defines the possible outcome values of an unexpected phenomenon. It is defined for both discrete and random variables. It is also used to specify the distribution of multivariate random variables. If the random variable is above a particular level, it is known as tail distribution or the Complementary Cumulative Distribution Function (CCDF). In this article, you will understand what the cumulative distribution function is, its properties, formulas, applications, and examples.

What is a Cumulative Distribution Function?

The Cumulative Distribution Function (CDF) of a real-valued random variable X, evaluated at x, is the probability function that X will take a value less than or equal to x. It is used to describe the probability distribution of random variables in a table. With the help of this data, we can easily create a CDF plot in an Excel sheet.

In other words, CDF finds the cumulative probability for the given value. It is used to determine the probability of a random variable and to compare the probability between values under certain conditions. For discrete distribution functions, CDF gives the probability values up to the specified value, and for continuous distribution functions, it gives the area under the probability density function up to the specified value.

Cumulative Distribution Function Formula

The CDF defined for a discrete random variable is given as:

Fx(x) = P(X ≤ x)

Where X is the probability that takes a value less than or equal to x and that lies in the semi-closed interval (a,b], where a < b.

Therefore, the probability within the interval is written as:

P(a < X ≤ b) = Fx(b) – Fx(a)

The CDF defined for a continuous random variable is given as:

Cumulative Distribution Function

Here, X is expressed in terms of integration of its probability density function fx.

In case the distribution of the random variable X has the discrete component at value b,

P(X = b) = Fx(b) – limx→b- Fx(x)

Cumulative Distribution Function Properties

The cumulative distribution function Fx(x) of a random variable has the following important properties:

  • Every CDF Fx is non-decreasing and right-continuous.
  • limx→-∞ Fx(x) = 0 and limx→+∞ Fx(x) = 1.

For all real numbers a and b with continuous random variable X, the function fx is equal to the derivative of Fx, such that:

Properties of CDF

If X is a completely discrete random variable, it takes the values x1, x2, x3,… with probability pi = p(xi), and the CDF of X will be discontinuous at the points xi:

FX(x) = P(X ≤ x)

This function is defined for all real values; sometimes it is defined implicitly rather than explicitly. The CDF is an integral concept of the PDF (Probability Distribution Function).

Example of CDF

Consider a simple example of CDF, which is given by rolling a fair six-sided die, where X is the random variable.

We know that the probability of getting an outcome by rolling a six-sided die is given as:

  • Probability of getting 1 = P(X≤ 1) = 1 / 6
  • Probability of getting 2 = P(X≤ 2) = 2 / 6
  • Probability of getting 3 = P(X≤ 3) = 3 / 6
  • Probability of getting 4 = P(X≤ 4) = 4 / 6
  • Probability of getting 5 = P(X≤ 5) = 5 / 6
  • Probability of getting 6 = P(X≤ 6) = 6 / 6 = 1

From this, it is noted that the probability value always lies between 0 and 1, and it is non-decreasing and right-continuous in nature.

Cumulative Distribution Function Applications

The most important application of the cumulative distribution function is in statistical analysis. In statistical analysis, the concept of the CDF is used in two ways:

1. Finding the frequency of occurrence of values for the given phenomena using cumulative frequency analysis.

2. Deriving some simple statistical properties by using an empirical distribution function, which provides a formal direct estimate of CDFs.

Example 1

The random variable with PDF is given by:

\[ f(x) = \begin{cases} k(x^2 + x) & \text{if } 0 \leq x \leq 1 \\ 0 & \text{otherwise} \end{cases} \]

Find the cumulative distribution function (CDF).

Solution:

Step 1: Find the Constant \( k \)

The constant \( k \) is determined by ensuring that the total area under the PDF curve equals 1:

\[ \int_{-\infty}^{\infty} f(x) \, dx = 1 \]

Since the PDF is zero outside the interval \([0, 1]\), the integral reduces to:

\[ \int_{0}^{1} k(x^2 + x) \, dx = 1 \]

Calculate the integral: \[ \int_{0}^{1} k(x^2 + x) \, dx = k \left[ \frac{x^3}{3} + \frac{x^2}{2} \right]_{0}^{1} \]

Simplify: \[ k \left( \frac{1^3}{3} + \frac{1^2}{2} \right) = k \left( \frac{1}{3} + \frac{1}{2} \right) = k \left( \frac{5}{6} \right) = 1 \]

Solving for \( k \): \[ k = \frac{6}{5} \]

Step 2: Express the PDF with the Constant \( k \)

Substituting the value of \( k \) into the PDF:

\[ f(x) = \begin{cases} \frac{6}{5}(x^2 + x) & \text{if } 0 \leq x \leq 1 \\ 0 & \text{otherwise} \end{cases} \]

Step 3: Find the Cumulative Distribution Function (CDF)

The CDF \( F(x) \) is defined as the integral of the PDF from \(-\infty\) to \( x \):

\[ F(x) = \int_{-\infty}^{x} f(t) \, dt \]

For different ranges of \( x \), the CDF is as follows:

For \( x < 0 \):
\[ F(x) = 0 \]

For \( 0 \leq x \leq 1 \):
\[ F(x) = \int_{0}^{x} \frac{6}{5}(t^2 + t) \, dt \] \[ F(x) = \frac{6}{5} \left[ \frac{t^3}{3} + \frac{t^2}{2} \right]_{0}^{x} \] \[ F(x) = \frac{6}{5} \left( \frac{x^3}{3} + \frac{x^2}{2} \right) \]

For \( x > 1 \):
\[ F(x) = 1 \]

Example 2

Let \( X \) be a discrete random variable with range \( R_X = \{1, 2, 3, \ldots\} \). Suppose the PMF of \( X \) is given by

\[ P_X(k) = \frac{1}{2^k} \text{ for } k = 1, 2, 3, \ldots \]

Find the CDF of \( X \), \( F_X(x) \).

Find \( P(2 < X \leq 5) \).

Find \( P(X > 4) \).

Solution:

Verify the PMF:

First, note that this is a valid PMF. In particular,

\[ \sum_{k=1}^{\infty} P_X(k) = \sum_{k=1}^{\infty} \frac{1}{2^k} = 1 \text{ (geometric sum)} \]

Find the Cumulative Distribution Function (CDF):

To find the CDF \( F_X(x) \), note the following:

For \( x < 1 \), \[ F_X(x) = 0 \]

For \( 1 \leq x < 2 \), \[ F_X(x) = P_X(1) = \frac{1}{2} \]

For \( 2 \leq x < 3 \), \[ F_X(x) = P_X(1) + P_X(2) = \frac{1}{2} + \frac{1}{4} = \frac{3}{4} \]

In general, for \( 0 < k \leq x < k + 1 \), \[ F_X(x) = P_X(1) + P_X(2) + \cdots + P_X(k) = \frac{1}{2} + \frac{1}{4} + \cdots + \frac{1}{2^k} = 1 - \frac{1}{2^k} \]

Find \( P(2 < X \leq 5) \):

We can write \[ P(2 < X \leq 5) = F_X(5) - F_X(2) = \frac{31}{32} - \frac{3}{4} = \frac{7}{32} \]

Or equivalently, \[ P(2 < X \leq 5) = P_X(3) + P_X(4) + P_X(5) = \frac{1}{8} + \frac{1}{16} + \frac{1}{32} = \frac{7}{32} \] which gives the same answer.

Find \( P(X > 4) \):

We can write \[ P(X > 4) = 1 - P(X \leq 4) = 1 - F_X(4) = 1 - \frac{15}{16} = \frac{1}{16} \]

Exercise &&1&& (&&1&& Question)

What is the value of the CDF \( F_X(x) \) for a discrete random variable \( X \) with PMF \( P_X(k) = \frac{1}{2^k} \) for \( k = 1, 2, 3, \ldots \), at \( x = 2 \)?

1
Submit

Exercise &&2&& (&&1&& Question)

For a discrete random variable \( Y \) with PMF given by \( P_Y(k) = \frac{1}{6} \) for \( k = 1, 2, 3, 4, 5, 6 \), what is the value of the CDF \( F_Y(4) \)?

2
Submit