Last lecture, we introduced combinatorics, the study of counting the sizes of sets. To be frank, while combinatorics certainly has its applications in computer science, especially in algorithm analysis, it's not nearly as prevalent as probability, which has applications throughout the field. Cryptography is probabilistic, machine learning is probabilistic, and even modern data structures rely heavily on probabilistic guarantees to make them efficient. For that reason, before we move further with counting, I want to take a brief detour to tie combinatorics to probability to help motivate the study of combinatorics and to allow us to ask questions about probability using the tools we've already developed. Next week, we'll dive much further into the study of probability.
Probability models all sorts of phenomena related to unpredictability and uncertainty. We are going to start with a simple class of random experiments: Those where every outcome is equally likely. We'll call this Naive Probability and later come back to think about situations where outcomes aren't equally likely.
To model a random experiment, one first defines a set of possible outcomes.
A sample space for a random experiment is a set $\Omega$, the members of which correspond to the possible outcomes.
Once we have a sample space, we could stop there and just consider individual outcomes. However, much of the richness of probability comes from using sets of outcomes, which correspond to English-language descriptions of features of outcomes. This motivates the following definition.
Let $\Omega$ be the sample space for a random experiment. An event is a subset of $\Omega$.
These examples use $\Omega_1,\ldots,\Omega_4$ from the previous example.
Events can be combined via unions, intersections, and complements to give new events. These operations have a very intuitive interpretation: If $E_1,E_2 \subseteq \Omega$ are events on the same sample space, then $E_1 \cup E_2$ represents "$E_1$ or $E_2$", $E_1 \cap E_2$ represents "$E_1$ and $E_2$", and $E_1^c$ represents "not $E_1$". (The complement is taken with respect to the universe $\Omega$.)
(This example is from [BH].) Suppose we flip $10$ coins. The sample space will be all length-$10$ strings consisting of $H$ and $T$ letters. That is, $\Omega$ contains strings like $HTTHTTTHHT$.
Finally, we define the probability of an event.
Let $\Omega$ be the sample space for a random experiment and $E\subseteq \Omega$ be an event. The (naive) probability of $E$ is defined to be $$ \mathrm{Pr}(E) = \frac{|E|}{|\Omega|}. $$
Of course, for this to make sense, $\Omega$ must be finite. You can work through some of the small examples above to calculate probabilities by hand, but to do the larger ones you'll need to infer the sizes of the events and samples spaces indirectly. This explains why counting is the main technical challenge in naive probability: everything depends on your ability to determine the sizes of sets!
As mentioned last lecture, we really don't have many ways to determine the sizes of sets. If we can enumerate them then we can count them, or if we can describe them as the sum or product of different sets whose sizes we know then we have some footing to work with, but most sets are going to be more complex than that.
Given that we don't have ways to directly count these more complex sets, our task them is to change the way we view the problem. As mentioned last lecture, this is the real important skill in combinatorics. We only have a few techniques to count sets, so the task is instead to reimagine the problem, to shift our perspective, to use some creativity to twist the complex problem into one that our techniques actually do work for. This is the crux of combinatorics, and is an incredibly valuable problem-solving skill in general as well!
The rest of this lecture will be focused on introducing a number of well-known tricks to help us transform a given problem into one that's easier for us to work with. The tricks we introduce here will show up over and over again in counting problems, but it's not enough to see and understand them on their own - it's really important to be able to recognize when they can be applied as well. This is something that takes a lot of dedicated practice and reflection, so I encourage you to think really carefully about each problem we list here and each problem on the homework, especially thinking about why a trick worked for a particular problem or what is common between problems where the trick could be applied.
Let's start by thinking about a problem that won't go the way we expect if we just naively approach it with the addition and multiplication rules from last lecture. Consider the following problem: I flip three coins. What is the probability that I get at least one heads?
One approach to this problem is the following. Denote by $H$ the event that I get at least one heads, and consider the following events:
Then, by definition, $H = X \cup Y \cup Z$. After all, $H$ can only occur if one of $X$, $Y$, or $Z$ occur! The addition rule then tells us that $$P(X \cup Y \cup Z) = P(X) \cup P(Y) \cup P(Z) = \frac{1}{2} + \frac{1}{2} +\frac{1}{2} = \frac{3}{2}. $$ So the probability then is 150%?!
This is obviously bad: nothing can have more than 100% chance to occur! So our really naive approach doesn't quite work out as well as we'd hope - let's look at some tricks we have to sharpen our calculation.
The reason that the prior computation is probably pretty clear to you as well - the three events as defined are overlapping! If we write out the outcomes counted by each event, we have
Looking at it this way, the union of the sets is clearly not the direct sum of their individual sizes - after all, all three sets have overlapping terms with each other.
To figure out how to address this, let's take a look at the following venn diagram between two sets:
One thing we could do here is to simply count each section separately: $$|X \cup Y| = |X \wedge \neg Y| + |X \wedge Y| + |\neg X \wedge Y|.$$ This approach works, but it ends up getting pretty unwieldy with all these intersections, especially if $X$ and $Y$ are complex events.
A better approach would simply be to determine the size of the overlap and account for it. Note that if we had approached it naively we would have added $|X| + |Y|$, but then the central region ($|X \cap Y|$) is counted twice. In that case, to account for this mistake, we can simply subtract that region's size once to make sure we're no longer overcounting, telling us that $$|X \cup Y| = |X| + |Y| - |X \cap Y|.$$
Let's think about the three-set case now.
A similar approach to last time will get us almost there: if we try $$|X| + |Y| + |Z|,$$ we find that the outer regions are counted once, the regions in two circles are counted twice, and the center region is counted three times. Subtracting the intersections as follows $$|X| + |Y| + |Z|- |X \cap Y| - |X \cap Z| - |Y \cap Z|,$$ we have removed the overcounting problem, but note that we have now subtracted the central region three times as well! In terms of our example problem, the case HHH is in every single one of $|X|, |Y|, |Z|, |X \cap Y| , |X \cap Z|,$ and $|Y \cap Z|,$ so our current expression actually adds it three times and the subtracts it three times, which means we're not counting it in the end. To address this problem, we have to add it back: $$|X \cup Y \cup Z| = |X| + |Y| + |Z|- |X \cap Y| - |X \cap Z| - |Y \cap Z| + |X \cap Y \cap Z|.$$
You may notice a pattern here: we're subtracting even-termed intersections and adding odd-termed intersections. This is no coincidence.
The size of the union of sets can be computed as follows: $$\left|\bigcup_{i=1}^n A_i\right| = \sum_{j=1}^n (-1)^{j+1}\left(\sum_{S \subseteq \{1...n\}, \: |S| = j} \left|\bigcap_{s \in S}A_s\right|\right).$$ For two sets, this evaluates to $$|X \cup Y| = |X| + |Y| - |X \cap Y|.$$ For three, it evaluates to $$|X \cup Y \cup Z| = |X| + |Y| + |Z|- |X \cap Y| - |X \cap Z| - |Y \cap Z| + |X \cap Y \cap Z|.$$
This is provable by induction on $n$ if you are interested, though for the purposes of this class you only really need to know that it follows this pattern.
It's also worth noting that if the sets are disjoint, this summation actually resolves back down to the original addition rule!
To wrap up our problem: there are four possibilities for $|X|$ (or any of the other individual set), two possibilities for $|X \cap Y|$ (because the last coin can be H or T), and one possibility for $|X \cap Y \cap Z|$ (only the outcome HHH), leaving us with, by PIE, $$|X \cup Y \cup Z| = 4 + 4 + 4 - 2 - 2 - 2 + 1 = 7.$$ There are $8$ possibilities in total for three coinflips, giving us a probability of $7/8$.
Here's a different trick we could've employed to make solving this problem even faster. Something you may have noticed in the prior strategy is that the entire sample space consists of four possibilities - the three we mentioned, plus the event where I flip zero heads. Specifically, if we define $D$ to be the event where we flip zero heads, then the total sample space is $\Omega = A \cup B \cup C \cup D,$ or substituting we find $\Omega = H \cup D$. Because these events are disjoint, we also get that $|\Omega| = |H| + |D|$, or more helpfully for this problem $|H| = |\Omega| - |D|$. Instead of dealing with all the possibilities where we flip at least one heads, we can save ourselves a lot of trouble by considering the complement, which in this case is flipping no heads!
The benefit is that, for this problem in particular, the complement is super easy to compute. After all, for any $n$ coins, there is only one outcomes that results in flipping zero heads. For our three coins, there are $8$ possible outcomes, leaving us with $|H| = |\Omega| - |D| = 8 - 1 = 7$, in a much sleeker approach than either of the previous ones.
This trick of counting the complement is a strategy that is not too commonly useful but can be incredibly powerful when it works.
If $U, A$ are finite sets, and $A \subseteq U$, then $$|U \setminus A| = |U| - |A|.$$ In the language of complements, this is $$|A^c| = |U|-|A|$$ or, equivalently, $$|A| = |U|-|A^c|,$$ where $U$ is the universe under consideration.
I also consider this self-evident, but one can give a proof from the Addition Rule.
Let us count the number of $6$ character passwords consisting of lowercase letters and digits, that have at least one digit.
First, let's try to use a standard decision process. If we try that approach, then the first option has $36$ choices. But how many options does the second choice have? Well, it's still $36.$ But if you can continue until the end sixth choice, when your number of choices will depend on the past: If you picked a digit in one or more of the first five spots, then you have $36$ options. If not, then you must pick a digit, and you only have $10$ options.
The result is a tree that is not regularly shaped: At the last level, some nodes will have $36$ branches and some will have only $10$. This means we can't apply the Multiplication Rule, at least not directly.
Instead, let's do this by counting the compliment: How many passwords are there that do not contain a digit? That's easy, those are just passwords with all letters, and there are $26^6$ of them. The set of all passwords consisting of letters and digits has size $36^6$, as there are $36$ options for each of the $6$ characters. Putting this together gives $36^6-26^6$.
Balls are distinguishable | Balls are indistinguishable | |
---|---|---|
At most one ball per bin | Example 6.10 | Example 6.12 |
Any # of balls per bin | Example 6.11 | ???? |
Here, when we say "distinguishable," one way to interpret it is that swapping the locations of two balls results in a different outcome. For example, in a problem about putting people in line, swapping two people results in a different outcome, but in a problem about forming a committee, swapping two people within a committee does not change the committee. Let's work through these one at a time. To demonstrate the flexibility of this diagram, each problem will be introduced in a different context.
There are $10$ people are auditioning for a play there are $7$ total roles. How many ways are there to assign people to roles, assuming any person can play any role?
Let's first map this to the balls and bins analogy. Here, we can consider the roles to be balls, and people to be bins. We're trying to determine the number of ways to assign balls to bins, where balls are distinguishable and at most one ball can fit in each bin.
Our decision process is simple: for each ball (role), we select a previously unselected bin (person). Then the number of possibilities for the first decision is $10$, then $9$, and so on until $4$ for the last role. In general, the number of possibilities is $$n \cdot (n-1) \cdot (n-2) \cdot \cdots \cdot (n-k+1) = \frac{n!}{(n-k)!}$$
There are $8$ people at a restaurant. Each one of them is selecting a dinner entree from a menu with $13$ selections. How many possibilities are there for what everyone orders?
Again, let's first map this to the balls and bins analogy. Here, we can consider the bins to be the entrees and the balls to be the people. Each ball (person) is different, and each bin (entree) can be taken by more than one ball (person), so we know we are in the bottom left square of the table above.
A decision process will help us here. Each ball (person) has $13$ bins (entrees) possible, so there are $13^8$ possibilities in total. More generally, there are $n^k$ possible outcomes.
For your major, you have to take $3$ elective courses. There are a total of $20$ possible elective courses to choose from. How many different combinations of three electives can you take?
In this case, we have $3$ balls (courses taken) that we are distributing among $20$ bins (possible courses). You can't take a course more than once (at least, not in this scenario), so we are in the top right section of our table.
To actually count the possibilities, we note that this is a straightforward case of us choosing subsets of size $3$ from a set of size $20$. Thus, the number of options is $\binom{20}{3}$, or more generally $\binom{n}{k}$.
Through our examples, we have seen that a wide variety of situations can actually be represented as balls in bins situations. Furthermore, our examples have given us nice representations for three of the squares. Throwing $k$ balls into $n$ bins, we have:
Balls are distinguishable | Balls are indistinguishable | |
---|---|---|
At most one ball per bin | $\frac{n!}{(n-k)!}$ | $\binom{n}{k}$ |
Any # of balls per bin | $n^k$ | ???? |
What about the last square? We'll see now that the last square will require a more sophisticated approach.
It'll help for us to ground our discussion in an example.
Suppose you're ordering six scoops of ice cream and there is a choice of four types, say cookies & cream, pralines & cream, salted caramel, and gold medal ribbon. How many different combinations can you make, with repetition? For an ordinary combination, we would only choose one of each flavor, but because we're concerned about classes of objects rather than single objects, we're able to choose multiples of a particular type.
We can model this situation naturally as balls (scoops) in bins (flavors), but any decision process we try to make quickly falls apart. If our decision process goes one scoop at a time, we end up overcounting by an amount that depends on the decision we've made, so it's not easy to correct for it. If our decision process goes one flavor at a time instead, we end up with an unclear number of choices for each step after the first one. Either way, the strategies we have developed so far won't quite work for us.
Let's try something different. Let's consider one possible selection, $C,P,G,C,C,G$ (three cookie, one praline, two gold), assuming this is the order in which we chose our scoops. However, since this is a combination and some of the elements are indistinguishable anyway, the order doesn't really matter, so let's group them together into $CCCPGG$. Now, let's separate these so they don't touch each other and cross contaminate the flavours or something, and we have something that looks like $CCC|P|GG$.
We can play with this analogy further and suppose we have a cup for each flavor, regardless of the number that we end up choosing, so we have something like $CCC|P||GG$. Finally, we note that since each cup contains a specific flavor, we don't need to specifically denote the flavor, and we can represent our choice by $***|*||**$.
Let's consider another possible choice: $*||*|****$, which is one cookies & cream, one salted caramel, and four gold medal ribbon. What we observe is that each choice of six items from four classes can be represented by an arrangement of six stars representing the items and three bars representing the division of classes of items.
But this is something we've already seen before: it's just a string problem over the alphabet $\{*,|\}$. Since we have six objects and four classes, we can view our possible selections as a string of length 9 with 6 $*$s and 3 $|$s and ask how many such strings there are. There are $$\binom 9 6 = \binom 9 3 = \frac{9!}{6!3!} = \frac{9 \cdot 8 \cdot 7}{3 \cdot 2 \cdot 1} = 3 \cdot 4 \cdot 7 = 84$$ such strings.
This method of using stars and bars to denote the objects and categories was popularized by William Feller's An Introduction to Probability Theory and its Applications in 1950.
There are $\binom{n+k-1}{k} = \binom{k+n-1}{n-1} = \frac{(n+k-1)!}{k!(n-1)!}$ outcomes when throwing $k$ balls into one of $n$ bins, allowing for multiple balls per bin but with indistinguishable balls.
We can view each possible selection as a string of length $k+n-1$ over the alphabet $\{\star,|\}$. We know that each string contains exactly $k$ $\star$'s and $n-1$ $|$'s. Then there are $\binom{k+n-1}{k}$ possible ways to choose spots for the $k$ $\star$s. Since all remaining spots must be occupied by $|$s, this is the same as choosing spots for $n-1$ $|$s, and there are $\binom{n+k-1}{n-1}$ ways to do so.
The Stars and Bars strategy also has a wide range of applications to different scenarios. The following sitations can all be counted with it:
One more surprising application is shown below.
How many solutions to $x+y+z = 13$ are there, for non-negative integers $x,y,z$? Here, we can think of $x,y,z$ as our types of objects, of which we want to choose 13 in some combination. For instance, one solution would be to choose 6 $x$s, 4 $y$s, and 3 $z$s, which would give us $6+4+3 = 13$. Then the number of solutions is just $$\binom{13+3-1}{13} = \frac{15!}{13!2!} = \frac{15 \cdot 14}{2} = 105.$$
As a final recap, here is our completed table. Throwing $k$ balls into one of $n$ labelled bins, we found:
Balls are distinguishable | Balls are indistinguishable | |
---|---|---|
At most one ball per bin | $\frac{n!}{(n-k)!}$ | $\binom{n}{k}$ |
Any # of balls per bin | $n^k$ | $\binom{n+k-1}{k}$ |
I really want to emphasize that the beauty and the difficulty of combinatorics is that almost every problem is about twisting the situation (through changed perspectives, the tricks we discussed earlier, or just creative analogies) into one that consists of a tractable decision process and/or a situation in our table above. It takes a ton of practice to learn when different analogies and different strategies are reasonable, and it is completely natural to try a bunch of wrong ones before stumbling upon one that works. Nobody ever said combinatorics was easy, but getting good at this will translate really well to becoming an adept problem-solver in any context!