December 22, 2010

The ERA distribution curve

NOTE: Tango and MGL at The Book, through the "Lemonade" thread, have critiqued the analysis below and pointed out errors in my assumptions. These errors mean that my closing conclusions are wrong -- good math, but bad statistics on my part. Through this post, you will find italicized text describing my errors.
Revised January 5, 2011

J.C. Bradbury's recent blog postings (here and here) have included histograms showing the distribution of ERA across major league pitchers for the 2009 season.  For his analysis, Bradbury omitted those pitchers with fewer than 100 batters faced -- in both his blog and book Hot Stove Economics, he justifies this due to the wide variance in ERA scores, much of which will be due to the small number of "samples" for each pitcher.  (As we saw in my earlier post about Bo Hart, it's possible for an average player to do very well over the short term; the inverse applies too.)

But a few comments on Bradbury's blog from readers ask about the impact that those "missing cases", who account for nearly a third (28%) of all individuals who pitched in MLB in 2009, would have on the curve.

Here's the answer: 

Figure 1: MLB Pitching, 2009 -- Number of Pitchers by ERA, by Number of Batters Faced

Incorporating the <100 BFP pitchers (the black chunks of each bar) adds pitchers across the whole range, although they are skewed to the right (i.e. higher ERAs).  While there is a stack on the left with very low ERAs, there's a bigger group of players with an ERA greater than 10. (The highest ERA of this group was 135.00.)

NOTE 1: ERA is a poor measure to use for this type of evaluation -- for pitchers with a low number of batters faced or innings pitched, it's easy for huge numbers to appear. That 135.00 ERA is the equivalent of 15 earned runs with only a single recorded out.  These exaggerated values then lead to an upward distortion of the mean for the group.  A better measure would be wOBA, or other measure that resembles a probability between 0 and 1.

The table below shows the average ERA of this group and three other groups based on the number of batters faced.  What we see is that the <100 BFP pitchers have a higher ERA than those who pitched more frequently.  (This difference is statistically significant.)  In spite of the variation in their ERAs, this group on average are less skilled than the other three groupings of pitchers.

NOTE 2: This is where I went wrong. The math is correct, but there is bias in the sample that I ignored. We can be fairly confident that pitchers who get off to a poor start won't get many opportunities to pitch -- and therefore won't get the opportunity to regress to the mean. Pitchers who do better at the start of their season will continue to pitch, and regress to the mean.  This process may take them some time, which may push them over the arbitrary line of 100 batters faced.  Thus the statistical significance is an artifact of the bias.

Figure 2: MLB Pitching, 2009 -- Average ERA, by Number of Batters Faced

In a thread on The Book blog that covered this same topic, I made a similar statement (reply #8): "What I’m trying to say is that our best estimate of the “true talent” of this group is an ERA of 8.11 [in the current case, 8.72], and that estimate is quite accurate". That statement got a response from Tango (reply #9) of "That is not accurate. If you look at how those pitchers who faced fewer than 100 batters did in the season preceding or the season following, THAT will give you a much better indicator of the true talent level."

So let me clarify.  The average level of skill of the pitchers who faced fewer than 100 batters in 2009, is an average ERA of 8.72. Although Tango is correct in his assertion that the poorest performers would regress upwards, by the same token the best pitchers (some of whom managed a 0.00 ERA in their short stint) would get worse. But if we were to let all 188 of them continue to pitch, we can be 95% certain that the "true" ERA of the group would end up somewhere between 6.92 and 10.52.

Even the lower bound (i.e. the lowest score we would expect with our more rigorous testing) is higher than the highest range from the other groups.

NOTE 3:  My statement above would be correct, if it were not for the bias in the sample.  My belief had been that this group would regress not to the MLB average, but to the average of the <100 BFP pitchers.  But because of the selection bias, this does not hold true.
Here's a simple example to demonstrate how this works. Think of the probability professor's favourite tool, the coin toss. If we have a penny and toss it repeatedly -- say, 10,000 times -- and recorded the result each time, the proportion of heads would very accurately reflect the true probability of the individual penny. And we'd need plenty of tosses to get an accurate measure of the single penny.

But what if instead of one penny we had 188 pennies, and we varied the number of tosses each penny got? Although the average number of tosses would be 50, some pennies might get only one toss, while others would get as many as 100 tosses. Some of those short sequences might come up all heads, while others would heavily favour the tails. On average, though, across the 188 pennies, we would find that the group average was a close reflection of "true average" of the group.

NOTE 4: The error in the initial assumption causes my coin flipping analogy to fall apart.  If “success” is a head, then the coin that comes up heads >0.5 will keep being flipped, possibly with enough flips to no longer be part of the “low flip” group (over that arbitrary threshold).  Meanwhile, a coin that runs tails more often will get pulled from the trials quickly, and end up <0.5 and with few flips.  Thus, as a group, the coins with a smaller number of flips will end up looking worse than those that keep getting flipped.  Selection bias causes an apparent difference, where none really exists.

And so it is with the pitchers in question. If they were like the other pitchers in MLB, we would expect that some of the <100 batters faced pitchers would have ERAs above the league average, while others would fall below. What we see, however, is that while there is a wide variation, the average is substantially higher than the other groups of pitchers.

NOTE 5:  ...because of selection bias!  The lesson:  selection bias can crop up anywhere, even if you are not the one doing the selecting.


December 20, 2010

Agreeing with Bill James

In 1988, the Bill James Abstract included "A Bill James Primer", with 15 statements expressing what he deemed to be useful knowledge. On that list was:
2. Talent in baseball is not normally distributed. It is a pyramid. For every player who is 10 percent above the average player, there are probably twenty players who are 10 percent below average.

I agree. (Others don't; for further discussion also see here.)

But what is this thing called "talent"? Talent is a combination of a high level of skill and sustained, consistent performance. Skill in baseball is measured through metrics such as ERA (earned run average) and OPS (on-base average plus slugging percentage) -- measures that turn counting stats into an efficiency or rate measure. While this type of measure is important, they fail to account for the fact that some players have lengthy careers, while other players have a very short MLB career. Teams will sign long-term contracts with aging superstars because the player's skill is still above average, even though they may have diminished with age.

In short, career length becomes a valid proxy for talent.

The charts below plot the number of pitchers over the period 1996-2009, by both the number of games played (which favours the relief pitchers) and innings pitched (which favours the starters). During this period a total of 2,134 individuals pitched in MLB -- but the chart shows that very few of them stuck around for any length of time.

At the head of the "games" list at 898 is the still-active Mariano Rivera, while the pitcher with the most innings over this period was Greg Maddux (2887.67 innings; and Maddux threw more than 2,100 innings before 1996, as well). These two individuals, and other Hall of Fame calibre pitchers, are out at the far right of the long tail. Close to the origin at the left are pitchers whose entire career lasted but 1/3 of an inning -- a single out.
Figure 1: Number of Pitchers, by Career Innings Pitched (1996-2009)

Figure 2: Number of Pitchers, by Career Games (1996-2009)

But what of the average skill level of those pitchers? Pitchers who get a small amount of MLB experience (fewer than 27 innings) have a higher ERA than those who get more opportunities to pitch. This group -- 27% of all MLB pitchers -- recorded an average ERA of 8.08, compared to 5.15 for the 42% who pitched between 27 to 269 innings, and 4.45 for the 27% who threw between 270 and 1349 innings. The elite, those who pitched 1350 innings and above, recorded the lowest ERA of all, 4.17.

In spite of the wide variance in the ERAs of the coffee drinkers, the differences in the mean scores are statistically significant.

Figure 3: MLB Pitchers, average ERA, by number of innings pitched (1996-2009)

In summary: there is an abundance of players who are less talented than the major league average, while at the same time the number of above-average talents is low. The distribution, at the major league level, is not normal. Just like Bill James said 22 years ago.


December 16, 2010

Angell turns comic

Roger Angell, writing on the New Yorker site, cracks me up with his article "Stats".

Yes, Cliff Lee had a high UPUBB (Unexpectedly Passing Up Big Bucks), but where does it rank in the history of the game? Greg Maddux did the same in 1993, when the Yankees made him a better (well, more financially lucrative) offer than the Braves, but has anyone done a similar analysis of non-monetary influences on player signing decisions?


December 10, 2010

Slugging regression II

Building on my previous post, this time around we'll look at a bigger group of hitters, those with at least 75 at-bats in both 2007 and 2008. This is a total of 360 players.
Theoretically with fewer at-bats, we would see a greater number of very high SLG values and also a larger number of below-average SLG values. But we've already seen hints that player talent gets evaluated early on (in the previous post, I identified the fact that the worst SLG in the 400+ group wasn't as awful to the same degree as the best hitters are good).

How to read the charts below: in both cases, there are 25 players plotted. Those that fall between the 100% and zero lines are regressing to the league mean. And the closer they are to the line, the bigger the regression. As shown in Figure 1, 22 of the top sluggers regressed toward the mean in 2008, 3 improved (led by Albert Pujols) and none fell below the league average.

For these players, 66% of their 2008 SLG score was accounted for by their 2007 SLG (and therefore the league average accounted for 44%).

An interesting observation is that these players are by and large the same as the 400+ AB group I dealt with in the previous post. Of the 25, 19 had 400+ ABs in both years. And of the remaining 6, 4 of the players had below 400 in 2007 and then over 400 in 2008. This group includes familiar names -- Josh Hamilton, David Murphy, and Cody Ross. All of them are young sluggers who did well in a short stint in 2007, and were given the opportunity to continue to play in 2008.

Figure 1: Top 25 SLG (2007), minimum 75 at-bats

For hitters at the bottom of the slugging table, we see a similar pattern of regression. Figure 2 shows SLG "improvement" in the opposite direction: the closer the bar gets to the bottom of the chart, the bigger the improvement. Thus of the 25 players, 17 regressed toward the mean without achieving it, and 3 others exceeded the league average (the ones who fell "below zero"). The remaining 5, on the other hand, started out below average in 2007 and fared worse in 2008.
For this group, the previous year's SLG accounted for only 55% of their 2008 SLG.
And for this group of 25, they are decidedly not the same players as the least sluggerly of the 400+ AB group. Only 1 -- Jason Kendall -- appears in both lists.

In short, the "survivor bias" that keeps good players active with opportunities to hit has an inverse impact on the players at the bottom of the stack. Unless they show a huge improvement that brings them much closer to the league average, these players seem to be destined to part-time roles.

Figure 2: Bottom 25 SLG (2007), minimum 75 at-bats

This analysis is best described as "proof of concept". Certainly any conclusions drawn should be tentatively stated, and a more robust analysis over a greater number of seasons is warranted.


Slugging regression

Tango issued a multi-part challenge, of which the first part is:
1. Take the top 10 in SLG in each of the last 10 years, and tell me what the overall average SLG of these 100 players was in the following year.

The point of the challenge is to demonstrate that top performing players will regress toward the mean in subsequent seasons, and that the year under consideration accounts for, as a rule of thumb, 70% of the next season's performance, and the league average (to which their performance regresses) the other 30%.

In algebraic terms, X is predicted to be 70% when
SLG1 is season 1 slugging average,
SLG2 is season 2 slugging average,
LSLG is the average league slugging average (from season 1)

Leo quickly responded (comment #1 to Tango's post), with his calculation that for slugging, 73.3% was accounted for by the player's average in the first season. To my way of thinking, the challenge has been met -- job well done, Leo!

But as I started to think about it further, I began to wonder how far through the rankings this rule of thumb holds -- as we approach the league average, the player's SLG and the league SLG become one and the same number. And at the opposite end of the scale -- the non-sluggers -- do they regress upwards towards the mean?

So my first step was to simplify the challenge, and only look at two consecutive seasons, 2007 and 2008. Using only those players who had a minimum of 400 at bats each season, I pruned the list down to 129 players in both the NL and AL. Simplifying matters further is the fact that the 2007 SLG for the NL was the same as the AL -- .423. So for my "top sluggers" I then looked at the top 25 across both leagues.

The result: for these 25 players, on average, 66% of their 2008 SLG was accounted for through their 2007 score. A few percentage points from Tango's rule of thumb, but close enough.

Charting the results shows that all but two of the top 25 sluggers regressed downwards towards the mean. And of the two, only one improved dramatically: Albert Pujols (who inched up still further in 2009, before regressing ever-so-slightly in 2010). Were Pujols not in the mix, the 2007 SLG would account for only 62% of the 2008 scores.

Another interesting observation is that of these top performers, not one fell so far in 2008 to end up with a SLG below the league average. That's not to say that it wouldn't happen, but it suggests that at the extreme end of the performance curve, as determined over the course of a full season, top performers really are above average. (NOTE: further testing required!)

But what of the other end of the ranking? I looked at the lowest performing players that I had selected, and the rule of thumb does not work. From the bottom up, the percentage explained was 87%, 84%, -4.4%, -28%, ...

At this point, I started to wonder -- why minus values? A quick check of the numbers, and I saw that these players regressed up, and to a point above the league average.

So what's different about the bottom of the range? It's simple: survivorship bias. My "sample" of 139 players who had 400+ ABs in each of 2007 and 2008, while ensuring I found the top hitters, automatically excluded those weak-slugging players who don't get many plate appearances but who collectively drag down the league average. Thus the "worst" players of the 139 with lots of ABs were not (by and large) far from the league average. The bottom of the list was Jason Kendall, who slugged .309 in 2007 for the A's and the Cubs while catching. Perform much worse than that, and you'll end up playing Triple A. Or in Kendall's case, for the Royals.

On deck: regression toward the mean, SLG with 75+ ABs.


December 9, 2010

The fundamentals

Tango continues to provide the essential ingredients. First it was the run expectancy matrix, now it's bases & outs by events. These two tables are fundamentals of sabermetrics -- understanding baseball (and in particular, the costs and benefits of different strategies) starts here.

[Right: Agent Smith ponders the matrix.]


December 7, 2010

Graphing run expectancy

Baseball Prospectus provided the world with the 2010 situational run expectancies, and Joshua Maciel has provided a graphic display. The graph was first posted to and then evolved as a result of feedback from readers at Tango's blog -- a fascinating process in and of itself. The graph is still a bit busy to my mind (but it's certainly not a Tufte-ian duck...), but it's a fine piece of work displaying what is arguably one of the most important pieces of baseball data.

And remember, it was George Lindsey who first published these data, back in 1963.

[Right: Joshua's run expectancy chart. Click to see it full-size, and visit his blog for all the details.]