Expected Values, Linearity of Expectation, and Indicator Functions

Section 4.3 Expected Values, Linearity of Expectation, and Indicator Functions

We next define the expected value of a random variable, which is our main tool for analyzing random variables.

Definition 4.3.1.

Given a random variable \(X:\Omega\mapsto \mathbb{R}\) on a finite probability space \((\Omega,P)\text{,}\) the expected value of \(X\) is defined to be

\begin{equation*} E[X]=\sum_{x\in \Omega} P(x)X(x) \, . \end{equation*}

Intuitively, the expected value of \(X\) is the average value of \(X\) if we select an event from \(\Omega\) according to the probability distribution \(P\text{.}\)

🔗

Example 4.3.2.

Consider again the set \(\Omega=\{00,10,01,11\}\) of binary strings of length two, with the uniform distribution and the random variable \(X:\Omega\mapsto \mathbb{R}\) by

\begin{equation*} X(00)=0, X(10)=1, X(01)=1, X(11)=2\text{.} \end{equation*}

The expected value of \(X\) is

\begin{equation*} E[X]=\sum_{w\in \Omega} P(w)X(w)=\frac{1}{4}(0)+\frac{1}{4}(1)+\frac{1}{4}(1)+\frac{1}{4}(2)=1 \, . \end{equation*}

The interpretation of this expected value is that if we select a binary string of length two uniformly at random, we expect it to contain one \(1\) on average.

🔗

Example 4.3.3.

Consider again the set of outcomes from rolling a die, which is the values appearing on the six sides: \(\Omega=\{1,2,3,4,5,6\}\text{,}\) with the uniform distribution and the random variable \(Y:\Omega\mapsto \mathbb{R}\) by

\begin{equation*} Y(i)=i\text{.} \end{equation*}

The expected value of \(Y\) is

\begin{align*} E[Y]\amp =\sum_{i=1}^6 P(i)Y(i) \\ \amp = \frac{1}{6}(1)+\frac{1}{6}(2)+\frac{1}{6}(3)+\frac{1}{6}(4)+\frac{1}{6}(5)+\frac{1}{6}(6)\\ \amp =3.5 \, . \end{align*}

The interpretation of this expected value is that if we roll a die uniformly at random, we expect the value appearing on the top face to be \(3.5\) on average.

🔗

Note that the expected value of a random variable need not be a value that the random variable can actually take on. For example, in the die-rolling example above, the expected value of \(Y\) is \(3.5\text{,}\) but \(Y\) never takes on the value \(3.5\text{.}\)

🔗

Checkpoint 4.3.4.

Consider the set of permutations \(\Omega=\ss_3\) with the uniform distribution.

For the random variable

\begin{equation*} X(\pi)=\text{number of fixed points of }\pi \, , \end{equation*}

what is \(E[X]\text{?}\)

🔗

🔗
For the random variable

\begin{equation*} X(\pi)=\text{number of inversions in }\pi \, , \end{equation*}

what is \(E[X]\text{?}\)

🔗

🔗

🔗

Checkpoint 4.3.5.

Consider the set \(\Bin_3\) of binary strings of length three with the binomial distribution

\begin{equation*} P_{p}(w)=(p)^{\text{number of }0\text{'s in }w}(1-p)^{\text{number of }1\text{'s in }w} \, . \end{equation*}

with parameter \(p=1/4\text{.}\)

For the random variable

\begin{equation*} X(w)=\text{number of }1\text{'s in }w \, , \end{equation*}

what is \(E[X]\text{?}\)

🔗

🔗
For the random variable

\begin{equation*} Y(w)=\text{number of }1\text{'s in first two positions of }w \, , \end{equation*}

what is \(E[Y]\text{?}\)

🔗

🔗

🔗

Here is a helpful observation to make about expected values that is specific to the uniform distribution. Suppose that \(|\Omega|=n\text{.}\) Then for any random variable \(X:\Omega\mapsto \mathbb{R}\text{,}\) we have that

\begin{equation*} E[X]=\sum_{x\in \Omega} P(x)X(x)=\sum_{x\in \Omega} \frac{1}{n}X(x)=\frac{1}{n}\sum_{x\in \Omega} X(x) \, . \end{equation*}

In other words, when the probability distribution is uniform, the expected value of a random variable is simply the average of the values of the random variable.

🔗

It is important to remember that this observation only holds for the uniform distribution. It does not hold for other probability distributions!

🔗

In general, it is difficult to compute expected values directly from the definition. However, if we can express a random variable as a sum of simpler random variables, then it often becomes easier to compute the expected value.

🔗

Example 4.3.6.

Consider \(\ss_n\) with the uniform distribution and the random variable \(X:\ss_n\mapsto \mathbb{R}\) defined by

\begin{equation*} X(\pi)=\text{number of fixed points of }\pi \, . \end{equation*}

We can express \(X\) as a sum of simpler random variables as follows. For \(i=1,2,\ldots,n\text{,}\) let \(X_i:\ss_n\mapsto \mathbb{R}\) be the random variable defined by

\begin{equation*} X_i(\pi)= \begin{cases} 1 \amp \text{if } \pi(i)=i \\ 0 \amp \text{if } \pi(i)\neq i \, . \end{cases} \end{equation*}

Then \(X_i\) is the indicator function for the event that \(i\) is a fixed point of \(\pi\text{,}\) and we have that

\begin{equation*} X(\pi)=\sum_{i=1}^n X_i(\pi) \, . \end{equation*}

Thus, we can express \(X\) as a sum of the simpler random variables \(X_1,X_2,\ldots,X_n\text{.}\)

🔗

Checkpoint 4.3.7.

Consider \(\ss_n\) with the uniform distribution and the random variable \(Y:\ss_n\mapsto \mathbb{R}\) defined by

\begin{equation*} Y(\pi)=\text{number of descents in }\pi \, . \end{equation*}

Express \(Y\) as a sum of simpler random variables.

🔗

Checkpoint 4.3.8.

Consider \(\Bin_n\) with the uniform distribution and the random variable \(Z:\Bin_n\mapsto \mathbb{R}\) defined by

\begin{equation*} Z(w)=\text{number of }1\text{'s in }w \, . \end{equation*}

Express \(Z\) as a sum of simpler random variables.

🔗

Given that we can express a random variable as a sum of simpler random variables, we want to be able to compute the expected value of the sum in terms of the expected values of the simpler random variables. The following theorem allows us to do this.

🔗

Theorem 4.3.9. Linearity of Expectation.

Let \(\Omega\) be a finite set with a probability distribution \(P\text{,}\) let \(X,Y:\Omega\mapsto \mathbb{R}\) be random variables, and let \(a,b\in \mathbb{R}\text{.}\) Then \(aX+bY\) is a random variable with

\begin{equation*} E[aX+bY]=aE[X]+bE[Y] \, . \end{equation*}

More generally, if \(X_1,X_2,\ldots,X_n:\Omega\mapsto \mathbb{R}\) are random variables and \(a_1,a_2,\ldots,a_n\in \mathbb{R}\text{,}\) then

\begin{equation*} E\left[\sum_{i=1}^n a_i X_i\right]=\sum_{i=1}^n a_i E[X_i] \, . \end{equation*}

🔗

Proof.

We prove the first part; the second part follows by induction. Using the definition of expected value, we have that

\begin{align*} E[aX+bY] \amp =\sum_{x\in \Omega} P(x)(aX(x)+bY(x)) \\ \amp = a\sum_{x\in \Omega} P(x)X(x)+b\sum_{x\in \Omega} P(x)Y(x) \\ \amp = aE[X]+bE[Y] \, . \end{align*}

🔗

Checkpoint 4.3.10.

Discuss the proof above. Does it make sense? Why or why not?

🔗

A particularly useful type of random variable is an indicator function; we have already seen several examples of these in our earlier Checkpoints and Examples. We will focus on indicator functions for events, which are subsets of the sample space, and we will use this tool to derive multiple results about subsets, permutations, and graphs.

🔗

Definition 4.3.11.

An indicator function for an event \(A\subseteq \Omega\) is the random variable \(I_A:\Omega\mapsto \mathbb{R}\) defined by

\begin{equation*} I_A(x)= \begin{cases} 1 \amp \text{if } x\in A \\ 0 \amp \text{if } x\notin A \, . \end{cases} \end{equation*}

Intuitively, the indicator function \(I_A\) indicates whether or not the event \(A\) has occurred. Note that \(I_A\) only takes on the values \(0\) and \(1\text{.}\) We often refer to \(I_A(x)\) as the indicator of the event \(A\) on the event \(x\text{.}\)

🔗

Observe that

\begin{equation*} \sum_{x\in \Omega}I_A(x)=|A| \, . \end{equation*}

Thus, indicator functions can be used to count the size of sets. This is one of the reasons that indicator functions are useful, especially in the context of computing expectation under the uniform distribution.

🔗

Example 4.3.12.

Note that for \(\ss_n\) with the uniform distribution and the random variable

\begin{equation*} X_i(\pi)= \begin{cases} 1 \amp \text{if } \pi(i)=i \\ 0 \amp \text{if } \pi(i)\neq i \, , \end{cases} \end{equation*}

we have that \(X_i\) is the indicator function for the event that \(i\) is a fixed point of \(\pi\text{.}\) In other words, let \(A_i=\{\pi\in \ss_n : \pi(i)=i\}\text{.}\) Then \(X_i=I_{A_i}\text{.}\)

🔗

Checkpoint 4.3.13.

Consider the Checkpoint you completed earlier where you considered \(\ss_n\) with the uniform distribution and you wrote the random variable \(Y:\ss_n\mapsto \mathbb{R}\) defined by

\begin{equation*} Y(\pi)=\text{number of descents in }\pi \end{equation*}

as a sum of indicator functions. What were those indicator functions?

🔗

Checkpoint 4.3.14.

Consider again the set \(\Bin_3\) of binary strings of length three with the binomial distribution

\begin{equation*} P_{p}(w)=(p)^{\text{number of }0\text{'s in }w}(1-p)^{\text{number of }1\text{'s in }w} \, . \end{equation*}

with parameter \(p=1/4\text{.}\)

For the random variable

\begin{equation*} X(w)=\text{number of }1\text{'s in }w \, , \end{equation*}

write \(X\) as a sum of indicator functions. Does this help you compute \(E[X]\) in this case?

🔗

🔗
For the random variable

\begin{equation*} Y(w)=\text{number of }1\text{'s in first two positions of }w \, , \end{equation*}

write \(Y\) as a sum of indicator functions. Does this help you compute \(E[Y]\) in this case?

🔗

🔗

🔗

There is a useful way to rewrite the expected value for a random variable taking only finite many values.

🔗

Lemma 4.3.15.

Suppose that \(X:\Omega\mapsto \mathbb{R}\) is a random variable that takes on the values \(x_1,x_2,\ldots,x_k\text{;}\) in other words, the range of \(X\) is the finite set \(\{x_1,x_2,\ldots,x_k\}\text{.}\) Then we can rewrite the expected value of \(X\) as

\begin{equation*} E[X]=\sum_{i=1}^k x_i P(X=x_i) \, , \end{equation*}

where \(P(X=x_i)\) is the probability that \(X\) takes on the value \(x_i\text{,}\) i.e.,

\begin{equation*} P(X=x_i):=\sum_{\substack{x\in \Omega \\ X(x)=x_i}} P(x) \, . \end{equation*}

🔗

Proof.

Using the definition of expected value, we have that

\begin{align*} E[X] \amp =\sum_{x\in \Omega} P(x)X(x) \\ \amp =\sum_{i=1}^k \sum_{\substack{x\in \Omega \\ X(x)=x_i}} P(x)X(x) \\ \amp =\sum_{i=1}^k \sum_{\substack{x\in \Omega \\ X(x)=x_i}} P(x)x_i \\ \amp =\sum_{i=1}^k x_i \sum_{\substack{x\in \Omega \\ X(x)=x_i}} P(x) \\ \amp =\sum_{i=1}^k x_i P(X=x_i) \, . \end{align*}

🔗

This formulation is often useful when the number of values taken on by \(X\) is much smaller than the size of the sample space \(\Omega\text{.}\)

🔗

Example 4.3.16.

Consider again the case of binary strings of length two with the uniform distribution and the random variable \(X:\Omega\mapsto \mathbb{R}\) defined by

\begin{equation*} X(00)=0, X(10)=1, X(01)=1, X(11)=2\text{.} \end{equation*}

The random variable \(X\) takes on the values \(0,1,2\text{.}\) Using the lemma above, we can rewrite the expected value of \(X\) as

\begin{align*} E[X] \amp =\sum_{i=1}^3 x_i P(X=x_i) \\ \amp = 0\cdot P(X=0)+1\cdot P(X=1)+2\cdot P(X=2) \\ \amp = 0\cdot \frac{1}{4}+1\cdot \frac{2}{4}+2\cdot \frac{1}{4} \\ \amp =1 \, . \end{align*}

This matches the expected value we computed earlier.

🔗

In the case of indicator functions, this gives a particularly simple expression for the expected value.

🔗

Corollary 4.3.17.

Given an indicator function \(I_A:\Omega\mapsto \mathbb{R}\) for an event \(A\subseteq \Omega\text{,}\) we have that

\begin{equation*} E[I_A]=P(A) \, . \end{equation*}

In other words, the expected value of an indicator function is simply the probability of the event it indicates.

🔗

Proof.

We have that

\begin{align*} E[I_A] \amp =\sum_{i=1}^2 x_i P(I_A=x_i) \\ \amp = 0\cdot P(I_A=0)+1\cdot P(I_A=1) \\ \amp = P(A) \, . \end{align*}

🔗

Checkpoint 4.3.18.

Discuss the result and proof above. Does it make sense? Why or why not?

🔗

Prev Top Next