## "Abnormal" Events -- Droughts and Perfect Games

Most folks, and I would include myself in this, have terrible intuitions about probabilities and in particular the frequency and patterns of occurance in the tail ends of the normal distribution, what we might call "abnormal" events. This strikes me as a particularly relevant topic as the severity of the current drought and high temperatures in the US is being used as absolute evidence of catastrophic global warming.

I am not going to get into the global warming bits in this post (though a longer post is coming). Suffice it to say that if it is hard to accurately directly measure shifts in the mean of climate patterns given all the natural variability and noise in the weather system, it is virtually impossible to infer shifts in the mean from individual occurances of unusual events. Events in the tails of the normal distribution are infrequent, but not impossible or even unexpected over enough samples.

What got me to thinking about this was the third perfect game pitched this year in the MLB. Until this year, only 20 perfect games had been pitched in over 130 years of history, meaning that one is expected every 7 years or so (we would actually expect them more frequently today given that there are more teams and more games, but even correcting for this we might have an expected value of one every 3-4 years). Yet three perfect games happened, without any evidence or even any theoretical basis for arguing that the mean is somehow shifting. In rigorous statistical parlance, sometimes shit happens. Were baseball more of a political issue, I have no doubt that writers from Paul Krugman on down would be writing about how three perfect games this year is such an unlikely statistical fluke that it can't be natural, and must have been caused by [fill in behavior of which author disapproves]. If only the Republican Congress had passed the second stimulus, we wouldn't be faced with all these perfect games....

**Postscript:** We like to think that perfect games are the ultimate measure of a great pitcher. This is half right. In fact, we should expect entirely average pitchers to get perfect games every so often. A perfect game is when the pitcher faces 27 hitters and none of them get on base. So let's take the average hitter facing the average pitcher. The league average on base percentage this year is about .320 or 32%. This means that for each average batter, there is a 68% chance for the average pitcher in any given at bat to keep the batter off the base. All the average pitcher has to do is roll these dice correctly 27 times in a row.

The odds against that are .68^27 or about one in 33,000. But this means that once in every 33,000 pitcher starts (there are two pitcher starts per game played in the MLB), the average pitcher should get a perfect game. Since there are about 4,860 regular season starts per year (30 teams x 162 games) then average pitcher should get a perfect game every 7 years or so. Through history, there have been about 364,000 starts in the MLB, so this would point to about 11 perfect games by average pitchers. About half the actual total.

Now, there is a powerful statistical argument for demonstrating that great pitchers should be over-weighted in perfect games stats: the probabilities are VERY sensitive to small changes in on-base percentage. Let's assume a really good pitcher has an on-base percentage against him that is 30 points less than the league average, and a bad pitcher has one 30 points worse. The better pitcher would then expect a perfect game every 10,000 starts, while the worse pitcher would expect a perfect game every 113,000 starts. I can't find the stats on individual pitchers, but my guess is the spread between best and worst pitchers on on-base percentage against has more than a 60 point spread, since the team batting average against stats (not individual but team averages, which should be less variable) have a 60 point spread from best to worst. [update: a reader points to this, which says there is actually a 125-point spread from best to worst. That is a different in expected perfect games from one in 2,000 for Jared Weaver to one in 300,000 for Derek Lowe. Thanks Jonathan]

**Update: **There have been 278 no-hitters in MLB history, or 12 times the number of perfect games. The odds of getting through 27 batters based on a .320 on-base percentage is one in 33,000. The odds of getting through the same batters based on a .255 batting average (which is hits but not other ways on base, exactly parallel with the definition of no-hitter) the odds are just one in 2,830. The difference between these odds is a ratio of 11.7 to one, nearly perfectly explaining the ratio of no-hitters to perfect games on pure stochastics.