Cardinals' second baseman Bo Hart
Over on 3-D Baseball, Kincaid has a nice explanation of regression to the mean in a post titled "On Correlation, Regression, and Bo Hart". The blog entry starts with the story of Bo Hart, who got called up to the Cardinals in June 2003, and promptly hit .412 over his first 75 at-bats. Since Kincaid wrote a regression to the mean article, you can guess where Hart's season went -- he finished with 286 at-bats and a .277 average.
But Kincaid flirts with a few notions that I think are worth following in a bit more detail.
First up, what are the odds that a .277 hitter will break .400 across a string of 75 at-bats?
The answer is roughly 1 in 200.
This is calculated through the fact that the binomial distribution approximates the normal distribution -- in English, if you repeat a set of binomial trials, the histogram of the count of success rates for the trials will look like the normal curve. This leads us to the probability density function, which allows us to state the probability that a value (in this case, a batting average of .412) falls at a certain point given the mean value (.277).
Using Bo Hart's season batting average of .277 as his "true talent" (or "population mean") across 75 at-bats, we can calculate the standard deviation of the distribution (0.052). We then determine that .412 lies at 2.60 standard deviations from the mean (2.60=[.412-.277]/.052). As a probability, 2.60 standard deviations is 0.5% -- or 1 in 200.
What was unusual about Bo Hart is that his 1 in 200 string of successful at-bats occurred at the beginning of his Major League career. Calculating that probability is a task for another day.
In my next post I will explore Kincaid's statements about evaluating "true talent" based on a number of observations. Specifically, I'll delve into the following questions: "At what point can we be relatively certain about our inferences of true talent based on observed performance? 75 PAs is not enough, and one million is plenty, but what about 1000?"