21 September 2013

De Méré's Problem

Often cited as the problem that began the theory of probability is the so called chevalier de Méré's problem. Chevalier de Méré was the nickname of Antoine Gombaud, a 17th century French writer. He would be quite upset to know that textbooks today often call him a gambler, because not only that's false, but he disliked gamblers in general. In his letter to his friend Mitton, he writes: "But I confess also that on my part I deplore you who are confined to gambling, longing for nothing but luck, without eyes for anything but this artificial world".

De Méré was a sociable man with some significant presence in the royal court. In fact, he was so sociable that he could befriend a nerd like Pascal. This friendship turned out to be a beneficial one both for them and the rest of the world. De Méré brought Pascal's attention to a couple of interesting probability problems and Pascal proceeded to set up the modern probability theory by solving these problems together with Fermat. De Méré still gets quite a bit of credit for his contribution in asking the right questions (life strategy: befriend smart people and ask them hard questions) and have those problems named after him.

There are two problems called de Méré's problems, but then neither were truly original problems that de Méré came with. In fact, one of the problems is more commonly known as the problem of points and had been around at least couple hundred years by then. The other problem also likely predates de Méré but there is no other good name for it and we exclusively call it de Méré's problem.

De Méré's problem goes like this: how many times do you have to throw a pair of dice so that the chance of throwing a pair of six at least once is greater than one-half?

The problem isn't too difficult to solve for anyone who studied some probability (showing how much we have progressed since then). The chance of throwing a double six at least once is one minus the chance of never throwing a double six. The chance of latter is 35/36, so given the dice are thrown n times, the chance of having at least one double six is 1 - (35/36)n. At n=24, this probability is about 0.4914 and at n=25, it is about 0.5055. So the answer is 25 times.

So why did this problem confuse de Méré? My lecture note says he falsely thought that the chance of getting at least once six in four throws is (1/6) times 4 and consequently computed the probability wrong. Well, it turns out that he wasn't so stupid. He in fact claims that the chance of rolling at least one six is greater than half when a die is rolled four times and correctly gives the odd as 671 to 625. What really troubled de Méré was that this result doesn't add up with the so called Old Gambler's Rule.

Critical value is the least number of trials you need to have least one success at least half the times (essentially the median of geometric random variable). So we can rephrase the problem as finding the critical value of rolling a double six. Suppose an experiment has one in N chance of success and has critical value C, and another experiment has one in M chance of success and has critical value D. Then Old Gambler's Rule states that the ratio of N to M is equal to the ratio of C to D.

According to this rule, since rolling at least one six (which has chance of one in 6) has critical value of 4, rolling at least one double six (which has chance of one in 36) should have the critical value of 24. Of course, as we have shown, this is not true, and de Méré keenly noticed this (supposedly through experience).

So the real problem is not so much about finding the critical value of rolling a double six but finding what's wrong with Old Gambler's Rule. To investigate, let's be little more modern and denote the probability of success as p rather than 1/N. As we have shown, the chance of having at least one success in C trial is 1 - (1 - p)C. Now, consider the problem of finding a real (rather than integer) number x such that 1 - (1 - p)x = 1/2. Formulated this way, there is a unique value x for each value of p, and we can still easily find the critical value by taking the ceiling of x.

Solving for x, we get x = -log(2)/log(1-p). For small value of p, log(1-p) is close to -p, so we can approximate the value of x as log(2)/p. In fact, this is the estimation given by de Moivre in The Doctrine of Chances.

Going back to Old Gambler's Rule, let experiment 1 have the success probability p with critical value equal to the ceiling of x. Similarly, let experiment 2 have the success probability q with critical value equal to the ceiling of y. Then we have the relationship x/y = log(1-q)/log(1-p). If p and q are close to 0, we can give an approximate relationship of x/y = q/p. And finally, letting p = 1/N and q = 1/M, we get x/y = N/M, which looks similar to the Old Gambler's Rule (which stated C/D = N/M).

So we see there is a grain of truth in Old Gambler's Rule, but also why it failed for de Méré. First, the rule approximates log(1+p) as p, but this approximation has large error for large values of p. Secondly, the rule is stated in terms of integers rather than the real numbers, adding another layer of approximation error.

This post relied heavily on Oystein Ore's paper Pascal and the Invention of Probability Theory and Prakash Gorroochurn's book Classic Problems of Probability.

01 August 2013

Revisiting 'Guess My Number'

Introduction

Three years ago, I posted about the 'Guess My Number' game, which involves one player (call her chooser) choosing an integer from 1 to 100 and the other player (call her guesser) trying to guess that number. Whenever the guesser names a number, the chooser tells whether her number is higher or lower than the guess. In that post, I asked whether the bisection method, in which the guesser names the median of the current range of numbers, is her best strategy. This post is a response to that question.

Thinking continuously

As it is often the case, the game is easier to analyze in real numbers rather than in integers. So suppose the chooser can pick any real number in (0,1). The problem with this simplification is that if the chooser's choice as a probability function doesn't have any mass point, then the guesser has zero probability of making the exact right guess. Therefore, we cannot measure the expected value of the number of guesses until the correct guess as we would usually do in the integer case.

One way to get around this problem is to think of how fast the guesser can narrow down the range of which the correct number belongs to. To formalize, let Sn be the search space after n-th guess, where S0 is (0,1). Then we are interested in the average ratio of the length of Sn+1 to the length of Sn given a choice pdf and a guessing strategy. To save some writing, let P[(x,y)] be the probability that the chooser's number lies in between x and y.

Suppose the choice pdf is uniform over (0,1). It is rather trivial to show that bisection search is optimal guessing strategy in this case. Suppose Sn is (a,b) and that the guessed number is x (which of course lies in between a and b). Then the expected length of Sn+1 is (x-a)P[(x-a)|Sn=(a,b)] + (b-x)P[(b-x)|Sn=(a,b)] = ((x - a)^2 + (b - x)^2)/(b-a). Since there is only one critical point and the second derivative is constantly positive, it is easy to find that the solution is x = (a+b)/2. Given this optimal choice, the expected length of Sn+1 is (b-a)/2 or as a ratio to the previous length, 1/2, which is expected from the bisection method. This confirms that bisection search is optimal given uniform pdf.

It remains to ask whether it is optimal for the chooser to pick a number uniformly. The answer follows from the observation that when the guesser uses the bisection method, the expected length of the next search space is always one-half of the current search space length, regardless of the choice pdf. Therefore, no matter what probability distribution the chooser takes, the guesser can guarantee to halve the search space. Hence the uniform distribution, against which the bisection method is the best possible strategy, is optimal for the chooser.

I have shown that chooser picking a number uniformly and the guesser using bisection method forms an equilibrium. I have not shown, however, that this is a unique one. Given that the guesser does not mix her strategies, I believe that this equilibrium is unique. Heuristically, suppose the chooser picks any other distribution. Then at some point the guesser will reach an interval S = (a,b) with the midpoint m = (a+b)/2 such that P[(a,m)|S=(a,b)] does not equal to P[(m,b)|S=(a,b)]. Suppose that probability of lower half is greater than the probability of upper half. Then there is some t < m such that P[(a,t)|S=(a,b)] = P[(t,b)|S=(a,b)] = 1/2. Let the guess be x = (t+m)/2. Then the expected length of the next search space is (x-a)P[(a,x)|S=(a,b)] + (b-x)(1-P[(a,x)|S=(a,b)]) which is less than (b-a)/2 since (x-a) < (b-a)/2 and P[(a,x)|S=(a,b)] > 1/2. Therefore, guessing x performs better than guessing m and thus the bisection method fails to be the best response to a non-uniform distribution.

Enter epsilon

Another way of getting around the problem of zero-probability of making the correct guess is to allow some range of error, ε. Instead of requiring the guesser to make the precisely correct guess, we allow her to guess within the epsilon neighborhood of the chooser's number. In other words, given the choice number c, the guess x is correct if x is in between c-ε and c+ε where ε > 0. This time, we can consider the expected number of guesses until the guess hits the target region.

Again, let us start the analysis by making the choice pdf uniform. Given uniform distribution and the search space S = (a,b), the probability of hitting the correct guess, as long as the guess is within (a+ε,b-ε), is 2ε/(b-a). Of course, guessing outside of that region only reduces the probability of the correct guess. Note that due to the uniformity, the probability does not depend on the actual guess but only on the length of the search space. Therefore, the problem is essentially identical to the continuous case in which the guesser tries to minimize the search space length and hence the guesser's best response is again the bisection method.

The problem becomes more complex, however, when we consider the chooser's strategy. If the chooser knows that the guesser will use the bisection method, then she can easily choose the number that will take the largest number of guesses. For example, suppose the initial search space is (0, 10) and epsilon is 1. If the guesser uses bisection, then a number in (4, 6) will be guessed in single turn, a number in (1, 3) or (7, 9) will be guessed in two turns, and a number outside of those regions (ie. the union of (0,1], [3,4], [6,7], and [9,10)) will be guessed in three turns. Assuming uniform choice pdf, the expected number of turns to within-epsilon guess is 1(0.2) + 2(0.4) + 3(0.4) = 2.2. If the chooser knows that the guesser is using bisection, however, then she can choose a number from the 3-turn region, in which case the expected number of turns becomes 3.

More generally, the chooser can force the greatest number of guesses by picking a number from the region which is found by consecutively taking out the middle 2ε-intervals. Given S0, we take out (m-ε,m+ε), where m is the median of S. Then S1 consists of two disjoint intervals so we take out 2ε regions from each of the two intervals. Then S2 consists of four disjoint intervals and from each we take 2ε intervals to get eight disjoint intervals and so forth. We repeat this process until the remaining intervals are of size less than 2ε. (The process is somewhat similar to the way the Cantor set is constructed) If the chooser picks a number from these intervals, the guesser will take the greatest number of guesses the bisection method will take. More precisely, given the initial search space of size ℓ and the tolerance ε, the greatest number of guesses the bisection method will take is the smallest integer n such that 2ε(1+2+22+⋯+2n) ≥ ℓ. Using geometric series formula, we can write n = ceiling(log2(ℓ/(2ε) + 1)).

We have found the best response of the chooser to the guesser's bisection method. Now, knowing that chooser is choosing a number uniformly from the worst-case region, can the guesser do better than bisection? The answer is yes. In the above example, in which the worst case region consists of (0,1], [3,4], [6,7], and [9,10), the bisection search takes 3 turns. However, if the guesser picks 6.5 first and then 3.5 if the guess is too high, then the expected number of turns until the correct guess is 2, which is not only lower than 3 but also lower than 2.2, the expected performance of bisection in case of uniform distribution over (0, 10). Thus the guesser can do better than bisection when the chooser picks from the worst-case region.

Back to the discrete

Note that the worse-case region consists of 2n-1 discrete intervals (where n is the number of guesses as found above) of size less than 2ε, distributed evenly across the initial search space with the distance between each being exactly 2ε. When the chooser picks a number from this set of intervals, the game essentially transforms into the original 'Guess My Number' game, where the choice set consists of integers rather than real numbers. The above example is essentially identical to guessing from {1,2,3,4}, where (0,1] maps to 1, [3,4] maps to 2, and so forth. Under this isomorphism, the original bisection method is equivalent to initially guessing 2.5, which is clearly a poor choice compared to picking either 2 or 3 as the first guess (which has positive probability of being the correct guess and gives better or same information if incorrect as guessing 2.5).

Now that we have reduced the chooser's pick from the worst-case region to the integer guess problem, what is the optimal strategy for the guesser? Assuming uniform choice distribution, the bisection method is still optimal. The guesser only have to make a slight modification by choosing the nearest integer when the mid value is non-integer. That this modified bisection is optimal follows directly from the real number analogue and it is quite easy to see that it does not matter in terms of expected number of turns, given uniform choice distribution, whether the guesser rounds the mid-value upward or downward.

But once again, the chooser does not have to pick uniformly. Since the guesser responds to a uniform pick by bisection, the chooser can again pick the worst-case integers. But the guesser can respond to this choice by applying bisection on the worst-case integers, and this process can repeat until only the end points remain as the worst-case integers.

For example, if the search space is integers from 1 to 7, then the bisection method will choose 4, then 2 or 6. The remaining "regions" of {1,3,5,7} would require 3 guesses and thus the chooser would pick one of those four if the guesser is using bisection. But the process doesn't stop here. If the guesser knows that the chooser knows that the guess is using bisection so that the chooser is picking from {1,3,5,7}, then the guesser can apply bisection method on this subset and start the guess with 3 or 5. The chooser can again respond to this and reduce its choice set further down to {1,7}. The guesser can respond to this by initially guessing 1, and then 7 if incorrect. But if guesser takes this strategy, then the chooser can simply use uniform distribution, against which the end-point guesses are far from optimal.

To approach this problem more formally, let us consider the simplest case in which the initial search space is {1,2,3}. The chooser has two strategies: (A) choose any one of the numbers with 1/3 chance; (B) choose 1 or 3 with 1/2 chance each. The guesser also has two strategies: (a) guess 2 first, and (b) guess 1 or 3 first, and 2 last (note that guessing 2 second is dominated). The expected number of guesses with each strategy pair is summarized in the following table:

Guesser
Chooser
a (bisection)
b (end points)
A (uniform)
5/3
2
B (extreme)
2
3/2

It is easy to check that there is no pure Nash equilibrium in this game. Thanks to Nash, we know there must exist a mixed equilibrium and simple algebra shows that (0.6, 0.4) is an equilibrium strategy for both players (that is, 60% A and 40% B for the chooser and 60% a and 40% b for the guesser). Of course, the "mixed strategy" of chooser can be summarized as a distribution of 0.4, 0.2, and 0.4 for picking 1, 2, and 3, respectively. Thus, the equilibrium can be summarized as the chooser picking a number with the distribution (0.4, 0.2, 0.4) and the guesser guessing 2 first 60% of the time and guessing 2 last 40% of the time. At this equilibrium, the expected number of guesses is 9/5.

Conclusion

The next natural step would be to consider the case in which the initial search space is {1,⋯,n} for an arbitrary n. The generalization seems to be nontrivial so the analysis will have to wait until next time. I think I have enough materials in this post, however, to conclude that the number guessing game is probably not as trivial as it may appear at first. I have shown that in the case of simply reducing the search space, the bisection method is the best strategy against the uniform distribution, which is also the best the chooser can do. When we introduce some tolerance such that the guess can be "correct", we found that the best response to the bisection method quickly brought down the game into the discrete case. With discrete search space, the problem becomes a bit like a convoluted rock-scissor-paper. The choose, in the equilibrium, picks a number non-uniformly, against which the guesser mixes her strategies.