CMSC 27100 — Lecture 7: Conditional Probability

Something that doesn't quite have a direct analogue from our discussions on set theory is the notion of conditional events.

One way of looking at this is if we restrict our attention to the event $B$ and ask, what is the probability of $A$ in this space? In this case, we are asking for the intersection of the events $A$ and $B$. Then to "zoom out" back to the sample space $\Omega$, we divide by the probability of $B$.

Another is to look at the version below the definition, where it's multiplied out. If we want to compute $\Pr(A \cap B)$, the probability that two events both occur, then one clear way to do so is to find the probability that the first event occurs ($\Pr(B)$) and then find the probability that the second occurs, assuming the first occurs ($\Pr(A \mid B)$).

Consider a roll of two fair dice.

What is the probability that the sum of the dice is 4, given that the first die shows 3?
What is the probability that the first die shows 3, given that the sum of the dice is 4?

The set of outcomes when rolling two dice is $\Omega = \{1,\ldots,6\}^2 = \{(1,1), (1,2), \ldots, (6,6)\}$. We have $|\Omega| = 36$.

Let $A$ be the even the sum if $4$ and let $B$ be the event that the first die shows $3$. We want $\Pr(A\mid B)$. Then \[ A = \{(1,3), (2,2), (3,1)\}, \] \[ B = \{(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)\}, \] and $A \cap B = \{(3,1)\}$. Since all outcomes of the rolls are equally likely, we can use naive probability: \begin{align*} \Pr(A \mid B) & = \frac{\Pr(A \cap B)}{\Pr(B)} \\ &= \frac{|A\cap B|/|\Omega|}{|B|/|\Omega|} \\ &= \frac{1/36}{6/36} = \frac{1}{6}. \end{align*} We note that, curiously, $\Pr(A\mid B) = \Pr(A) = 1/6$. That is, conditioning on $B$ didn't change the probability.
We can work this problem more intuitively by observing that when we condition on $B$, it means that one of its six outcomes happens with equal probability. But only one of those six has a sum of four, so the probability is $1/6$. This type of thinking will often work, but you should be very cautious about errors in intuition. The value of the definition is that we could work the problem without appealing to intuition and be confident in our answer.
We now want $\Pr(B\mid A)$. Working this intuitively, when we condition on $A$ we have that its three outcomes occur with equal probability. Only one of these outcomes has the first die equal to $3$, so the answer should be $1/3$.
The formal calculation is \begin{align*} \Pr(B \mid A) & = \frac{\Pr(A \cap B)}{\Pr(A)} \\ &= \frac{|A\cap B|/|\Omega|}{|A|/|\Omega|} \\ &= \frac{1/36}{3/36} = \frac{1}{3}, \end{align*} confirming the intuitive approach.

The prior example shows that in general $\Pr(A\mid B) \neq \Pr(B\mid A)$. They might be equal however.

Here is an example of why caution is required until you have developed a very sharp intuition about conditional probability.

Suppose we flip two fair coins.

What is the probability that we get two Heads, given that at least Heads was shown?
What is the probability that we get two Heads, given that the first flip showed Heads?

Many people might guess, and even insist, that these probabilities are equal. Let's calculate. Let $A$ be the event that two Heads are shown, $B$ be the event that at least one Heads was shown, and $C$ be the event that the first flip showed Heads. Then $A=\{HH\}, B=\{HT, TH, HH\},$ and $C = \{HT, HH\}$, and $\Pr(A) = 1/4$, $\Pr(B)=3/4$ and $\Pr(C)=1/2$.

We want $\Pr(A\mid B)$ for the first part and $\Pr(A\mid C)$ for the second. We calculate \begin{align} \Pr(A \mid B) &= \frac{\Pr(A \cap B)}{\Pr{B}} \\ &= \frac{\Pr(\{HH\})}{\Pr(B)} \\ &= \frac{1/4}{3/4} = 1/3. \end{align} Next, \begin{align} \Pr(A \mid C) &= \frac{\Pr(A \cap C)}{\Pr{C}} \\ &= \frac{\Pr(\{HH\})}{\Pr(C)} \\ &= \frac{1/4}{1/2} = 1/2. \end{align} So these conditional probabilities are not equal. If you find this result counter-intuitive, you're not alone. This happened because conditioning on $B$ leaves three possibilities open while conditioning only $C$ leaves only two.

Independence of Events

In our problem-solving, there will two ways that events can be independent: First, by calculation we can show that the necessary equality holds, using some pre-established model. The next example shows how this can happen. The second way is by fiat, where we look at an experiment we're modeling, and jump straight to the conclusion that some events should be independent (to the point where we would reject any model in which they weren't). The second example gives a case where we would do this.

From the definitions, it is easy to see that $A$ and $B$ are independent, then $\Pr(A \mid B) = \frac{\Pr(A) \cdot \Pr(B)}{\Pr(B)} = \Pr(A)$, which should fit our intuition about conditional probability and independence.

Consider a roll of a two six-sided dice and the events

$F_i$: the first roll comes up $i$,
$S_j$: the second roll comes up $j$,
$T_k$: the sum (or total) of the two rolls is $k$.

All the $F_i$ and $S_j$ should be independent: The second roll doesn't depend on the outcome of the first, or vice-versa. Mathematically, for any $i$ and $j$, $$\Pr(F_i \cap S_j) = \frac{1}{36} = \frac{1}{6} \cdot \frac{1}{6} = \Pr(F_i) \cdot \Pr(S_j)$$

Equally clearly, most $T_k$ are correlated with the $F_i$: You can't get a total of 11 unless the first die comes up $5$ or $6$. The one exception is $T_7$: $\Pr(T_7 \mid F_i) = \frac{1}{6}$ and likewise for $S_i$. However, obviously $T_7$ isn't independent of the combination of $F_i$ and $S_i$s: $\Pr(T_7 \mid (F_4 \cap S_3)) = 1$ whereas $\Pr(T_7 \mid (F_4 \cap S_4)) = 0$. (We tend to write this as $\Pr(T_7 \mid F_4, S_4) = 0$.) This leads to want to distinguish between the independence of two events in isolation versus a larger group of events.

Consider a coin biased to show Heads with probability $3/4$, and suppose we flip it twice. What is the probability that we get two Heads? By fiat, we declare that the events $H_1,H_2$ of getting heads on the first and second toss respectively should be independent. So we have $$ \Pr(H_1 \cap H_2) = \Pr(H_1)\Pr(H_2) = \left(\frac{3}{4}\right)^2 = \frac{9}{16}. $$

Bayes' Theorem

The last topic in this lecture is just a bit of algebraic manipulation away from what we've already seen today in class, but has a surprisingly wide array of applications.

We found earlier in this lecture that the definition of conditional probability gives us $\Pr(A \cap B) = \Pr(A \mid B) \cdot \Pr(B)$. Because the event $A \cap B$ is the same as the event $B \cap A$, we know $\Pr(A \cap B) = \Pr(B \mid A) \cdot \Pr(A)$. Substituting, we find $$\Pr(A \mid B) \cdot \Pr(B) = \Pr(B \mid A) \cdot \Pr(A),$$ so $$\Pr(A|B) = \frac{\Pr(B|A) \cdot \Pr(A)}{\Pr(B)}.$$

One interpretation of this theorem that is particularly useful is thinking of $A$ as a hypothesis (e.g. this medicine cures the disease) and $B$ as an observation (e.g. this patient who took the medicine is now better). We'd like to know whether $A$ is true, so we run a test and get observation $B$. Does that mean we should conclude that $A$ is true? How should it affect our belief in $A$? We should think that it increases the probability of $A$ being true, but by how much?

Bayes' Theorem gives us a nice structure to talk about it. It tells us that $\Pr(A|B)$ can be described as an interaction between $\Pr(A)$ (our current belief in $A$), $\Pr(B)$ (the probability of getting better), and $\Pr(B|A)$ (the probability of the patient getting better given that the medicine does cure it). Each of these values are at least easier to grapple with - we can estimate them based on data we already have. This approach to hypothesis testing is known as Bayesian statistics, and has become popular as a method for testing the strength of models, an endeavor increasingly relevant with the advent of large computers doing large predictive computations.

One canonical example application of Bayesian statistics is in disease testing. Say I am being tested for a disease that only affects $1\%$ of the population. The test has a $10\%$ false negative rate - a person with the disease will test negative $10\%$ of the time. It has a $30\%$ false positive rate - a person without the disease will test positive $30\%$ of the time. I am tested for the disease, and it returns positive. What is the probability that I have the disease?

Here, we invoke Bayes' Theorem. Define $A$ to be the event that I have the disease and $B$ to be the event that I test positive. To use Bayes' Theorem, I need to compute $\Pr(B|A)$, $\Pr(A)$, and $\Pr(B)$.

First, we know that the false negative rate is $10\%$, so $\Pr(B|A) = 0.9$. Next, we know that $\Pr(A) = 0.01$ because only $1\%$ of the population is affected.

To determine $B$, we need to use the fact that $$\Pr(B) = \Pr(A \wedge B) + \Pr(\neg A \wedge B).$$

An application of the definition of conditional probability then tells us that $$\Pr(B) = \Pr(A) \cdot \Pr(B|A) + \Pr(\neg A) \cdot \Pr(B|\neg A).$$

Plugging in the values given, we note that $$\Pr(B) = 0.01 \cdot 0.9 + 0.99 \cdot 0.3 = 0.306.$$

With this, we invoke Bayes' Theorem to find that $$\Pr(A|B) = \frac{\Pr(B|A) \cdot \Pr(A)}{\Pr(B)} = \frac{0.9 \cdot 0.01}{0.306} \approx 0.029,$$ so there is actually less than a 3% chance that I even have the disease based on this test!

CMSC 27100 — Lecture 7

The notes for this course began from a series originally written by Tim Ng, with extensions by David Cash and Robert Rand. I have modified them to follow our course.

Conditional Probability

Independence of Events

Bayes' Theorem