SOBER, LIKELIHOOD, AND SIMPLICITY
There is another way that simplicity might be introduced into Bayes' theorem, viz. through the “likelihood” factor, p(e/h), the probability of the evidence e, given the hypothesis h.
Can the simplicity of a hypothesis h somehow increase likelihood? Not if h entails the evidence e, whether h is simple or not. In that case p(e/h) = 1, and no increase is possible. So let us take a case in which p(e/h) is less than one. Elliott Sober provides an example, where likelihoods are empirically determined.[83]Susan, who goes to the lake each day for a week, sees a red sailboat each day. Let e be that the sailboat she sees each day is red. There are two hypotheses, hl and h7; hl says that the same sailboat was on the lake each day that Susan was there and no other boat was; h7 says that 7 different sailboats were on the lake during this week, one each day. Now, Sober makes the assumption that 10% of the boats that use that lake are red, so that the probability that a boat is red, given that it uses the lake, is 1/10. He also makes the assumption that if a boat is on the lake, then the probability is close to 1 (I will say it is 1) that Susan sees it and its color. Now, let us determine the likelihood of each hypothesis, h1 and h7: p(e/h1) = 1/10, since if the same boat were there each day (h 1), then (assuming that boats don't change color overnight) the probability that the boat she sees each day will be red, is just the probability that a boat using the lake is red, viz. 1/10. But the likelihood of h7—i.e., p(e/h7), will not be equal to 1/10. This is because the events in this case, unlike the former one, are probabilistically independent. Given that it is the same boat each day, the probability that it will have a different color the next day is zero. But if there are seven different boats, then the probability of whatever boat is there each day being red is just the probability of a boat using the lake being red multiplied by itself 7 times—i.e., (1/10)7, a pretty tiny number.
But since h 1 postulates one boat producing the sightings, while h7 postulates 7 different ones, one each day, h1 is much simpler than h7 (at least according to Sober's argument). Here, then, we have a case in which we express this simplicity by means of the likelihood, rather than the prior.Taking this even further than Sober does, suppose we can assign equal priors to hl and h7, so that it is just as probable that the same boat was on the lake each day as that 7 different boats were on the lake, one per day. (Imagine that in half the weeks the same boat is present each day, and in half the weeks 7 boats are present, one per day.) Then, from Bayes' theorem, and the likelihoods just mentioned, we can conclude that the posterior probability p(h 1/e) is much greater than the posterior probability p(h7/e), since p(e) is the same in both calculations. In such a case, it might be concluded, a difference in simplicity of the hypotheses makes a big difference to the (posterior) probabilities of these hypotheses, given the evidence e.
Let us agree that, in these cases, the simpler hypothesis has the higher likelihood and posterior probability. But is simplicity really doing the work here? No, what is doing the work is the fact that one hypothesis, h7, makes the probability of getting a red sighting on a given day independent of the probability of a red sighting on another day. Given h7, the probability of e is the probability of a red sighting on Monday times the probability of a red sighting on Tuesday, etc. By contrast, this is not so if we assume h 1: given h 1, that the boat is the same each day, the probability of a red sighting on any or all days is just the probability of that boat's being red.
To bring this out, let us change the example a bit. We keep h1, understanding it to be that one and the same boat was sailing each day and that boat had the same color each day. Now consider h2: two different boats with the same color were sailing during the 7-day period, each one on alternate days.
Now by the reasoning above, p(e/h1) = p(e/h2) = 1/10, since with both hl and h2, the boat sailing will be the same color each day, and, given this, the probability that it will be red is 1/10, by hypothesis. So the likelihoods of these two conflicting hypotheses are the same. Now, since h1 postulates one boat and h2 postulates two boats, h1 is simpler than h2 (in accordance with Sober's idea). But this fact does not change the likelihoods at all. Even if h1 is simpler than h2, that fact does no epistemic work here. Indeed, if we suppose in addition that in three-quarters of all the weeks there are two boats of the same color on the lake and in one-quarter there is one boat and it has the same color, then we get these priors: p(hl) = p(h2) = %. In such a case, the posterior probability of h2 will be three times that of h1, even though h1 is simpler than h2.A different idea Sober introduces is based on a statistical theorem proved by Hirotugu Akaike, which gives a general formula for determining an unbiased estimate of a model's predictive accuracy.[84] The general idea might be put like this: You have obtained some data points from some unknown source of data and you want to connect these points by a curve. A model gives you a way to do so. '1 he question is how to select an unbiased model with the most predictive accuracy. Akaike's theorem yields a way to do so based in part on simplicity: the fewer the number of adjustable parameters in the model, the better the model. The problem is that the theorem is of very limited use. First, it is restricted to cases in which the data are produced by the same unknown source. In the real world, we may not know whether the source is the same. (If lots of bullet holes are produced in a target and our measuring device gives us the positions of some of them, we may or may not know whether they were produced by the same gun or same shooter.) Second, it is restricted because it is not applicable to cases involving predictions about future data outside the range of the data obtained.
Third, it is restricted to cases in which models are being compared, one of which is true. These are major restrictions for the usual tasks of science.Finally, Sober makes the following claim: “there are three parsimony paradigms that explain how the simplicity of a theory can be relevant to saying what the world is like.”[85] The first consists of the claim that “sometimes simpler theories have higher [prior] probabilities.” The second is the claim that “sometimes simpler theories are better supported by the observations.” The third is that “sometimes the simplicity of a model is relevant to estimating its predictive accuracy” The first, he tells us, is illustrated by the advice to young doctors that “when you hear hoof beats, think horses not zebras,” since horses are more common. The second is illustrated by the case in which a single cause for a group of similar events is inferred rather than a more complex group of independent separate causes. The third is illustrated by the use of Akaike's theorem.
Since I have noted the very restricted applicability of the latter in the previous paragraph, let me comment here on Sober's first two “paradigms.” In both cases the epistemic work is being done not by simplicity but by empirical evidence. It is because horses are much more common than zebras, not because of simplicity, that it is more probable that the hoof beats are those of a horse than that they are those of a zebra. (In African grasslands, the reverse is the case.) Similarly, to infer a common cause, rather than multiple independent causes, in a group of people with broken legs in an orthopedic surgeon's waiting room would be empirically rash, even if one common cause is simpler than many. We infer a common cause in such a case only if we have evidence that points in that direction (e.g., these people were involved in the same accident). In both the horse and the broken leg examples, it is empirical information, not simplicity, that justifies an inference to the “simpler” hypothesis. Sober is right in saying that sometimes simpler theories have higher prior and posterior probabilities. But what he needs to demonstrate—which his examples do not—is that these higher probabilities are due to simplicity.[86]
10.