There has been much discussion about the recent run of perfect and almost-perfect games. A variety of hypotheses have been floated, including
pitching dominance (including a higher strike out ratio), improved defense, and the confluence of expansion, better player evaluation, and a drug-free world.
Perfect games are a rare event, so we run the risk of seeing a random cluster as a trend. There have now been 20 perfect games -- 18 in the "modern era" (since 1900), 14 since the expansion era began in 1961, and two so far in the 2010 season. How can we tell if this "streak" of two perfect games in a single season is simply a random fluctuation?
Calculating the probability of a perfect game: allowing runners
One approach is to calculate a theoretical probability based on on-base percentage (OBP). Tango has a blog entry "Perfect Game calculation" that presents one approach. His estimate was 1 perfect game per 15,000.
Another example of this appears in Mathletics by Wayne L. Winston, who calculated a probability of 0.0000489883, or 1 game in just over 20,400. Winston noted at the time the book went to press (before the 2009 season) there had been nearly 173,000 regular season games since 1900 and each game provides 2 opportunities for a perfect game (so we have 346,000 "team games"). Winston then goes on to note that we would therefore expect there to be 16.95 perfect games over that period -- almost perfectly matching the observed total of 17 to that point in time.
A side note: after Mark Buehrle's perfect game in 2009, Sky Andrecheck took a similar approach for individual players. He worked out the individual chances for the 16 modern-era players who had tossed a perfect game, based on the sum of the on-base percentage and reached-on-error percentage they allowed over their careers.
Calculating the probability of a perfect game: observed rate
A second approach to calculating the probability is to compare the observed number of perfect games and to the number of opportunities. I decided to use 1961 as year one. This was a natural point to begin -- this was the first year of baseball's expansion, and it falls mid-way between Don Larson's 1956 World Series perfecto (which had been the first in 22 years) and Jim Bunning's 90 pitch masterpiece in 1964. Between 1961 and 2009 inclusive, there were 12 perfect games -- and there were 201,506 regular season "team games". This gives us a probability of 0.00005955, or 1 perfect game every 16,790 team games played.
This method yields a result that is roughly the mid-point between Tango's and Winston's approaches.
What are the odds of two perfect games in one season?
While most statistical analysis makes the assumption that the distribution of the events is "normal", when we are dealing with rare discrete events the distribution does not resemble the normal distribution. The most common distribution used for this is the Poisson distribution.
At the probability of 1 in 16,790 across a season of 4,860 "team games" (the current number per season -- based on 2,430 games and therefore 4,860 perfect game opportunities) and 4,112 (the average number since 1961) that the probabilities, expected frequencies, and observed frequencies are as follows:
So over 50 seasons, we would predict that there would be between 1 and 2 seasons with 2 perfect games, and between 9 and 11 seasons with 1 perfect game.
So to answer the question posed in the title, the answer is "Yes -- two perfect games in one season is well within the expected distribution." The fact that 2010 has been the first season with 2 perfect games in the 50 years since 1961 fits perfectly with the expected distribution.
In future posts I will repeat the calculation of probabilities and frequencies, with modified probabilities (once the dust settles on the "correct" way to calculate the probabilities...)
Comments and questions are always welcome.