December 22, 2010

The ERA distribution curve

NOTE: Tango and MGL at The Book, through the "Lemonade" thread, have critiqued the analysis below and pointed out errors in my assumptions. These errors mean that my closing conclusions are wrong -- good math, but bad statistics on my part. Through this post, you will find italicized text describing my errors.
Revised January 5, 2011

J.C. Bradbury's recent blog postings (here and here) have included histograms showing the distribution of ERA across major league pitchers for the 2009 season.  For his analysis, Bradbury omitted those pitchers with fewer than 100 batters faced -- in both his blog and book Hot Stove Economics, he justifies this due to the wide variance in ERA scores, much of which will be due to the small number of "samples" for each pitcher.  (As we saw in my earlier post about Bo Hart, it's possible for an average player to do very well over the short term; the inverse applies too.)

But a few comments on Bradbury's blog from readers ask about the impact that those "missing cases", who account for nearly a third (28%) of all individuals who pitched in MLB in 2009, would have on the curve.

Here's the answer: 

Figure 1: MLB Pitching, 2009 -- Number of Pitchers by ERA, by Number of Batters Faced

Incorporating the <100 BFP pitchers (the black chunks of each bar) adds pitchers across the whole range, although they are skewed to the right (i.e. higher ERAs).  While there is a stack on the left with very low ERAs, there's a bigger group of players with an ERA greater than 10. (The highest ERA of this group was 135.00.)

NOTE 1: ERA is a poor measure to use for this type of evaluation -- for pitchers with a low number of batters faced or innings pitched, it's easy for huge numbers to appear. That 135.00 ERA is the equivalent of 15 earned runs with only a single recorded out.  These exaggerated values then lead to an upward distortion of the mean for the group.  A better measure would be wOBA, or other measure that resembles a probability between 0 and 1.

The table below shows the average ERA of this group and three other groups based on the number of batters faced.  What we see is that the <100 BFP pitchers have a higher ERA than those who pitched more frequently.  (This difference is statistically significant.)  In spite of the variation in their ERAs, this group on average are less skilled than the other three groupings of pitchers.

NOTE 2: This is where I went wrong. The math is correct, but there is bias in the sample that I ignored. We can be fairly confident that pitchers who get off to a poor start won't get many opportunities to pitch -- and therefore won't get the opportunity to regress to the mean. Pitchers who do better at the start of their season will continue to pitch, and regress to the mean.  This process may take them some time, which may push them over the arbitrary line of 100 batters faced.  Thus the statistical significance is an artifact of the bias.

Figure 2: MLB Pitching, 2009 -- Average ERA, by Number of Batters Faced

In a thread on The Book blog that covered this same topic, I made a similar statement (reply #8): "What I’m trying to say is that our best estimate of the “true talent” of this group is an ERA of 8.11 [in the current case, 8.72], and that estimate is quite accurate". That statement got a response from Tango (reply #9) of "That is not accurate. If you look at how those pitchers who faced fewer than 100 batters did in the season preceding or the season following, THAT will give you a much better indicator of the true talent level."

So let me clarify.  The average level of skill of the pitchers who faced fewer than 100 batters in 2009, is an average ERA of 8.72. Although Tango is correct in his assertion that the poorest performers would regress upwards, by the same token the best pitchers (some of whom managed a 0.00 ERA in their short stint) would get worse. But if we were to let all 188 of them continue to pitch, we can be 95% certain that the "true" ERA of the group would end up somewhere between 6.92 and 10.52.

Even the lower bound (i.e. the lowest score we would expect with our more rigorous testing) is higher than the highest range from the other groups.

NOTE 3:  My statement above would be correct, if it were not for the bias in the sample.  My belief had been that this group would regress not to the MLB average, but to the average of the <100 BFP pitchers.  But because of the selection bias, this does not hold true.
Here's a simple example to demonstrate how this works. Think of the probability professor's favourite tool, the coin toss. If we have a penny and toss it repeatedly -- say, 10,000 times -- and recorded the result each time, the proportion of heads would very accurately reflect the true probability of the individual penny. And we'd need plenty of tosses to get an accurate measure of the single penny.

But what if instead of one penny we had 188 pennies, and we varied the number of tosses each penny got? Although the average number of tosses would be 50, some pennies might get only one toss, while others would get as many as 100 tosses. Some of those short sequences might come up all heads, while others would heavily favour the tails. On average, though, across the 188 pennies, we would find that the group average was a close reflection of "true average" of the group.

NOTE 4: The error in the initial assumption causes my coin flipping analogy to fall apart.  If “success” is a head, then the coin that comes up heads >0.5 will keep being flipped, possibly with enough flips to no longer be part of the “low flip” group (over that arbitrary threshold).  Meanwhile, a coin that runs tails more often will get pulled from the trials quickly, and end up <0.5 and with few flips.  Thus, as a group, the coins with a smaller number of flips will end up looking worse than those that keep getting flipped.  Selection bias causes an apparent difference, where none really exists.

And so it is with the pitchers in question. If they were like the other pitchers in MLB, we would expect that some of the <100 batters faced pitchers would have ERAs above the league average, while others would fall below. What we see, however, is that while there is a wide variation, the average is substantially higher than the other groups of pitchers.

NOTE 5:  ...because of selection bias!  The lesson:  selection bias can crop up anywhere, even if you are not the one doing the selecting.


December 20, 2010

Agreeing with Bill James

In 1988, the Bill James Abstract included "A Bill James Primer", with 15 statements expressing what he deemed to be useful knowledge. On that list was:
2. Talent in baseball is not normally distributed. It is a pyramid. For every player who is 10 percent above the average player, there are probably twenty players who are 10 percent below average.

I agree. (Others don't; for further discussion also see here.)

But what is this thing called "talent"? Talent is a combination of a high level of skill and sustained, consistent performance. Skill in baseball is measured through metrics such as ERA (earned run average) and OPS (on-base average plus slugging percentage) -- measures that turn counting stats into an efficiency or rate measure. While this type of measure is important, they fail to account for the fact that some players have lengthy careers, while other players have a very short MLB career. Teams will sign long-term contracts with aging superstars because the player's skill is still above average, even though they may have diminished with age.

In short, career length becomes a valid proxy for talent.

The charts below plot the number of pitchers over the period 1996-2009, by both the number of games played (which favours the relief pitchers) and innings pitched (which favours the starters). During this period a total of 2,134 individuals pitched in MLB -- but the chart shows that very few of them stuck around for any length of time.

At the head of the "games" list at 898 is the still-active Mariano Rivera, while the pitcher with the most innings over this period was Greg Maddux (2887.67 innings; and Maddux threw more than 2,100 innings before 1996, as well). These two individuals, and other Hall of Fame calibre pitchers, are out at the far right of the long tail. Close to the origin at the left are pitchers whose entire career lasted but 1/3 of an inning -- a single out.
Figure 1: Number of Pitchers, by Career Innings Pitched (1996-2009)

Figure 2: Number of Pitchers, by Career Games (1996-2009)

But what of the average skill level of those pitchers? Pitchers who get a small amount of MLB experience (fewer than 27 innings) have a higher ERA than those who get more opportunities to pitch. This group -- 27% of all MLB pitchers -- recorded an average ERA of 8.08, compared to 5.15 for the 42% who pitched between 27 to 269 innings, and 4.45 for the 27% who threw between 270 and 1349 innings. The elite, those who pitched 1350 innings and above, recorded the lowest ERA of all, 4.17.

In spite of the wide variance in the ERAs of the coffee drinkers, the differences in the mean scores are statistically significant.

Figure 3: MLB Pitchers, average ERA, by number of innings pitched (1996-2009)

In summary: there is an abundance of players who are less talented than the major league average, while at the same time the number of above-average talents is low. The distribution, at the major league level, is not normal. Just like Bill James said 22 years ago.


December 16, 2010

Angell turns comic

Roger Angell, writing on the New Yorker site, cracks me up with his article "Stats".

Yes, Cliff Lee had a high UPUBB (Unexpectedly Passing Up Big Bucks), but where does it rank in the history of the game? Greg Maddux did the same in 1993, when the Yankees made him a better (well, more financially lucrative) offer than the Braves, but has anyone done a similar analysis of non-monetary influences on player signing decisions?


December 10, 2010

Slugging regression II

Building on my previous post, this time around we'll look at a bigger group of hitters, those with at least 75 at-bats in both 2007 and 2008. This is a total of 360 players.
Theoretically with fewer at-bats, we would see a greater number of very high SLG values and also a larger number of below-average SLG values. But we've already seen hints that player talent gets evaluated early on (in the previous post, I identified the fact that the worst SLG in the 400+ group wasn't as awful to the same degree as the best hitters are good).

How to read the charts below: in both cases, there are 25 players plotted. Those that fall between the 100% and zero lines are regressing to the league mean. And the closer they are to the line, the bigger the regression. As shown in Figure 1, 22 of the top sluggers regressed toward the mean in 2008, 3 improved (led by Albert Pujols) and none fell below the league average.

For these players, 66% of their 2008 SLG score was accounted for by their 2007 SLG (and therefore the league average accounted for 44%).

An interesting observation is that these players are by and large the same as the 400+ AB group I dealt with in the previous post. Of the 25, 19 had 400+ ABs in both years. And of the remaining 6, 4 of the players had below 400 in 2007 and then over 400 in 2008. This group includes familiar names -- Josh Hamilton, David Murphy, and Cody Ross. All of them are young sluggers who did well in a short stint in 2007, and were given the opportunity to continue to play in 2008.

Figure 1: Top 25 SLG (2007), minimum 75 at-bats

For hitters at the bottom of the slugging table, we see a similar pattern of regression. Figure 2 shows SLG "improvement" in the opposite direction: the closer the bar gets to the bottom of the chart, the bigger the improvement. Thus of the 25 players, 17 regressed toward the mean without achieving it, and 3 others exceeded the league average (the ones who fell "below zero"). The remaining 5, on the other hand, started out below average in 2007 and fared worse in 2008.
For this group, the previous year's SLG accounted for only 55% of their 2008 SLG.
And for this group of 25, they are decidedly not the same players as the least sluggerly of the 400+ AB group. Only 1 -- Jason Kendall -- appears in both lists.

In short, the "survivor bias" that keeps good players active with opportunities to hit has an inverse impact on the players at the bottom of the stack. Unless they show a huge improvement that brings them much closer to the league average, these players seem to be destined to part-time roles.

Figure 2: Bottom 25 SLG (2007), minimum 75 at-bats

This analysis is best described as "proof of concept". Certainly any conclusions drawn should be tentatively stated, and a more robust analysis over a greater number of seasons is warranted.


Slugging regression

Tango issued a multi-part challenge, of which the first part is:
1. Take the top 10 in SLG in each of the last 10 years, and tell me what the overall average SLG of these 100 players was in the following year.

The point of the challenge is to demonstrate that top performing players will regress toward the mean in subsequent seasons, and that the year under consideration accounts for, as a rule of thumb, 70% of the next season's performance, and the league average (to which their performance regresses) the other 30%.

In algebraic terms, X is predicted to be 70% when
SLG1 is season 1 slugging average,
SLG2 is season 2 slugging average,
LSLG is the average league slugging average (from season 1)

Leo quickly responded (comment #1 to Tango's post), with his calculation that for slugging, 73.3% was accounted for by the player's average in the first season. To my way of thinking, the challenge has been met -- job well done, Leo!

But as I started to think about it further, I began to wonder how far through the rankings this rule of thumb holds -- as we approach the league average, the player's SLG and the league SLG become one and the same number. And at the opposite end of the scale -- the non-sluggers -- do they regress upwards towards the mean?

So my first step was to simplify the challenge, and only look at two consecutive seasons, 2007 and 2008. Using only those players who had a minimum of 400 at bats each season, I pruned the list down to 129 players in both the NL and AL. Simplifying matters further is the fact that the 2007 SLG for the NL was the same as the AL -- .423. So for my "top sluggers" I then looked at the top 25 across both leagues.

The result: for these 25 players, on average, 66% of their 2008 SLG was accounted for through their 2007 score. A few percentage points from Tango's rule of thumb, but close enough.

Charting the results shows that all but two of the top 25 sluggers regressed downwards towards the mean. And of the two, only one improved dramatically: Albert Pujols (who inched up still further in 2009, before regressing ever-so-slightly in 2010). Were Pujols not in the mix, the 2007 SLG would account for only 62% of the 2008 scores.

Another interesting observation is that of these top performers, not one fell so far in 2008 to end up with a SLG below the league average. That's not to say that it wouldn't happen, but it suggests that at the extreme end of the performance curve, as determined over the course of a full season, top performers really are above average. (NOTE: further testing required!)

But what of the other end of the ranking? I looked at the lowest performing players that I had selected, and the rule of thumb does not work. From the bottom up, the percentage explained was 87%, 84%, -4.4%, -28%, ...

At this point, I started to wonder -- why minus values? A quick check of the numbers, and I saw that these players regressed up, and to a point above the league average.

So what's different about the bottom of the range? It's simple: survivorship bias. My "sample" of 139 players who had 400+ ABs in each of 2007 and 2008, while ensuring I found the top hitters, automatically excluded those weak-slugging players who don't get many plate appearances but who collectively drag down the league average. Thus the "worst" players of the 139 with lots of ABs were not (by and large) far from the league average. The bottom of the list was Jason Kendall, who slugged .309 in 2007 for the A's and the Cubs while catching. Perform much worse than that, and you'll end up playing Triple A. Or in Kendall's case, for the Royals.

On deck: regression toward the mean, SLG with 75+ ABs.


December 9, 2010

The fundamentals

Tango continues to provide the essential ingredients. First it was the run expectancy matrix, now it's bases & outs by events. These two tables are fundamentals of sabermetrics -- understanding baseball (and in particular, the costs and benefits of different strategies) starts here.

[Right: Agent Smith ponders the matrix.]


December 7, 2010

Graphing run expectancy

Baseball Prospectus provided the world with the 2010 situational run expectancies, and Joshua Maciel has provided a graphic display. The graph was first posted to and then evolved as a result of feedback from readers at Tango's blog -- a fascinating process in and of itself. The graph is still a bit busy to my mind (but it's certainly not a Tufte-ian duck...), but it's a fine piece of work displaying what is arguably one of the most important pieces of baseball data.

And remember, it was George Lindsey who first published these data, back in 1963.

[Right: Joshua's run expectancy chart. Click to see it full-size, and visit his blog for all the details.]


November 29, 2010

Good math, bad statistics

In the past few days, a pair of posts on other blogs caught my attention -- they seem to be coming at the same issue from different directions.
First, William R. Briggs posted "Statistics Is Not Math" (November 16, 2010). Then, Tango over at The Book posted "Detrending: when statisticians attack!" (November 24, 2010). I responded to the Tango post (comment #4), but I would like to here elaborate further.
One of the things that jumped out at me from Briggs' post was the statement that "Statistics rightly belongs to epistemology, the philosophy of how we know what we know. Probability and statistics can even be called quantitative epistemology." In other words, statistics is useful only if we have some understanding of the subject matter at hand. No amount of fancy math will help our understanding if we do not start our research with some knowledge of the topic.
In the "Detrending" post, Tango links to an unpublished (in the academic sense that it's not been published in a peer-reviewed journal) paper, by three physicists, Alexander M. Petersen , Orion Penner, and H. Eugene Stanley, entitled "Detrending career statistics in professional baseball:
Accounting for the steroids era and beyond". I may offer a longer critique of this paper at a later date, but the first thing that jumps out is an apparent ignorance of The Literature (i.e. what's been written earlier about the topic -- baseball -- from a statistical basis). This leads the authors to make conclusions that have been supported elsewhere (for example, pitcher wins are not a good measure of pitcher performance, or that standardizing allows for inter-season comparisons).
There's lots of fancy maths (some of which isn't as fancy or new-fangled as the authors seem to think) and plenty of Greek letters, but in the end it doesn't add a great deal to our understanding of baseball.
This article serves as a reminder that when we are assessing the quality of any sabermetric writing, we need to consider two factors:
1. Is the author using the appropriate statistical tools and interpreting the mathematical results correctly?
2. Does the author understand the game, including how baseball has evolved and the analytic literature that has been written over the past 50 years?


November 22, 2010

Bo knows probability

Cardinals' second baseman Bo Hart

Over on 3-D Baseball, Kincaid has a nice explanation of regression to the mean in a post titled "On Correlation, Regression, and Bo Hart". The blog entry starts with the story of Bo Hart, who got called up to the Cardinals in June 2003, and promptly hit .412 over his first 75 at-bats. Since Kincaid wrote a regression to the mean article, you can guess where Hart's season went -- he finished with 286 at-bats and a .277 average.

But Kincaid flirts with a few notions that I think are worth following in a bit more detail.

First up, what are the odds that a .277 hitter will break .400 across a string of 75 at-bats?

The answer is roughly 1 in 200.

This is calculated through the fact that the binomial distribution approximates the normal distribution -- in English, if you repeat a set of binomial trials, the histogram of the count of success rates for the trials will look like the normal curve. This leads us to the probability density function, which allows us to state the probability that a value (in this case, a batting average of .412) falls at a certain point given the mean value (.277).

Using Bo Hart's season batting average of .277 as his "true talent" (or "population mean") across 75 at-bats, we can calculate the standard deviation of the distribution (0.052). We then determine that .412 lies at 2.60 standard deviations from the mean (2.60=[.412-.277]/.052). As a probability, 2.60 standard deviations is 0.5% -- or 1 in 200.

What was unusual about Bo Hart is that his 1 in 200 string of successful at-bats occurred at the beginning of his Major League career. Calculating that probability is a task for another day.

In my next post I will explore Kincaid's statements about evaluating "true talent" based on a number of observations. Specifically, I'll delve into the following questions: "At what point can we be relatively certain about our inferences of true talent based on observed performance? 75 PAs is not enough, and one million is plenty, but what about 1000?"


November 14, 2010

Sealing the exits?

The Victoria Seals of the Golden Baseball League announced on Wednesday (November 10, 2010) that they were "ceasing operations". While acknowledging that the league has some serious challenges, the club was pointing more fingers at the City of Victoria, who owns and operates the Seals' home, Royal Athletic Park (RAP).

Like practically every other pro sports franchise that plays in a publicly owned facility, the Seals wanted a better deal. In the news release and press conference, the Seals stated they had asked for a larger share of the gate and concession revenue, and solutions to a variety of issues relating to the playing field, including the ability to leave the outfield fence up all summer. The City's position is that they are unwilling to have taxpayers subsidize the club, and since RAP is a multi-purpose facility available for many users their hands are tied on the field issues.

I don't know enough about the details on either side to offer an informed opinion. But I can say both sides seem to have entrenched positions, based on their specific operating requirements (for the City, that includes the political reality as well as the business side) that seem reasonable enough. In short, there may not be a middle ground that is satisfactory to both parties.

The local press has played up the distance between the Seals and the City (the news article is here, but the local daily also weighed in with this opinion piece and this more resigned article). But in so doing, the press has missed a key element that the Seals acknowledge: the Golden Baseball League is itself in a shambles. The team's press release describes the league as being in an "unstable state", but I suspect that understates the troubles.

To my way of thinking, the biggest problem is the distribution of the teams in the league. To expand the league off the mainland of North America to Victoria (on Vancouver Island) was one thing -- it guarantees a higher-per-mile travel cost (those ferries aren't cheap) and perhaps an extra hotel night. But what was the league thinking, given the evidence that the league is in a tenuous state to start with, adding clubs in Mexico (Tijuana -- not far from the mothballed San Diego club but across an international border) and in particular Hawaii (Maui)?

I have to wonder if, with their "cease operations" announcement, the management of the Seals is trying to press both the City of Victoria and the management of the Golden Baseball League. Perhaps it is just wishful thinking on my part, but if the Seals can get a more satisfactory arrangement with the City of Victoria (or another municipality in the greater Victoria area -- we have 13, after all), while pressuring the league to make some sensible choices about where the franchises are located, then perhaps we haven't seen the end of this round of professional baseball in Victoria.

Update (2010-11-15): The Globe & Mail also chimes in.

The Bayes Ball Bookshelf, #1

The Numbers Game: Baseball's Lifelong Fascination with Statistics, by Alan Schwarz. 2004, St. Martin's Press.

In The Numbers Game, Alan Schwarz presents a well-written and tidy history of the development and evolution of the statistics that record the history of the game. Or more accurately, it's a history of baseball, and its evolution over the past century and a half, from the perspective of the numerical record and analysis of the game.
Thus Schwarz begins in the mid-nineteenth century, with Henry Chadwick's influence on the information that got recorded. But more importantly, Schwarz points out (and this becomes a recurring theme) that how the game was played was an influence on what got recorded. In the early days, the ball was "pitched" to the batter in a way that facilitated batting it -- and because pitching was secondary to hitting and fielding, there was no record of pitching performance. And as the game evolved, so did the numbers that recorded the game and got used to evaluate the players.
A second recurring theme is the weaving of the technical aspects of the statistics with the personal characters of those who developed and promoted various measures. This is very much a character-driven book -- we hear not only about the "why" of the statistics that were recorded, but the people who developed them and the means of recording them. So we hear about Al Munro Elias, Allan Roth's career with the Dodgers, Hal Richman's development of Strat-O-Matic, and George Lindsey's articles that appeared in academic journals beginning in the late 1950s. We also get an entire chapter devoted to the publication in 1969 of The Baseball Encyclopedia, and another to Bill James.
One of the things that jumps out to me is the impact that computers-- particularly the personal computer -- has had on the volume of statistics available, and the precision of the analysis that is now available. (And what is perhaps a topic for another day, the proliferation of analysts of varying quality.)
In The Numbers Game, Schwarz has written what may well be the single best introduction to sabermetrics. But it's not a technical manual that will tell you how to calculate any one statistic, or how another measure should be interpreted. Instead it's a lively history of major league baseball, and the numerical record and analysis that accompanies it.

Assessment: home run.

2010 in retrospect

The 2010 MLB season has come to a close, and so begins a time of reflection and resolutions.

a) I started this blog, and rarely posted. I started a few posts, often in response to other blogs, but finished fewer still. I am confounded by the traffic on the blogosphere -- I thought I could respond thoughtfully and add something of value, but I find myself either repeating what gets said elsewhere, or sounding like a condescending pedant. Or both.

So I'll start off on a different tack, starting now.

b) I went to two MLB games in 2010. (There's nothing like living across an international border to the closest team, and 4,300 km from the "national" club). Two shutouts! Fangraphs has the results here and here.

c) The local pro team just announced they are closing up shop. More on that later.

October 27, 2010

World Series predictions

The 2010 World Series starts this evening, and many pundits are making their predictions on who will win (a sample: The Baseball Analysts, New York Times, and Sports Illustrated.

But perhaps the best way to think about evaluating the pundits is contained in this article "Slick talkers and bad forecasters" by Dan Gardner. Gardner's article is about economic forecasting, but the point is relevant -- when it comes to predicting the outcome, a nuanced understanding of all of the influencing factors produces the best forecast. Or as Gardner puts it, "experts who gathered information from many sources, who were comfortable with complexity and uncertainty, and were more prepared to admit mistakes and adjust conclusions accordingly -- these were the experts worth listening to."

August 12, 2010

Skill and luck on the links

Differentiating skill and luck has been a hot topic of late -- here's an article at Slate by Michael Agger titled "Dead Solid Lucky" looking at the topic in the context of golf.

The article draws heavily on an analysis by Robert A. Connolly and Richard J. Rendleman Jr., both from the Kenan-Flagler Business School, University of North Carolina. (Their article, from the Journal of the American Statistical Association, can be found in PDF format here.)

A tidy summary, quoted directly from Agger's article: "How big a deal is luck on the golf course? On average, tournament winners are the beneficiaries of 9.6 strokes of good luck. Tiger Woods' superior putting, you'll recall, gives him a three-stroke advantage per tournament. Good luck is potentially three times more important. When Connolly and Rendleman looked at the tournament results, they found that (with extremely few exceptions) the top 20 finishers benefitted from some degree of luck. They played better than predicted. So, in order for a golfer to win, he has to both play well and get lucky."

Sounds like real life. And baseball.

July 27, 2010

Skill, luck, and more than a little style

Pictured: "Mr. May", Dave Winfield, comes through in the clutch in the 1992 World Series with, in his words, "One stinkin' little hit." The 11th-inning double drove in two runs and sealed the World Series win for the Blue Jays.

The BBC has posted an article and calculator ("Can chance make you a killer?") that is used to demonstrate the challenges in differentiating luck from skill. In this case, a simple scenario with fixed parameters is linked to a calculator that generates the range of possibilities.

While I'm not sure how this could be used in a baseball setting, it is a very good tool for demonstrating that it can be difficult -- particularly if you just look at "the numbers" in a selective way -- to make definitive statements about a player's ability. Such as, say, clutch hitting.

(Acknowledgement: The Book.)

July 26, 2010

Baseball imitates real life

How understanding luck in baseball can help understanding real life, or at least your investment portfolio: "Untangling skill and luck" by Michael J. Mauboussin.

Mauboussin uses a variety of sabermetric analysis, including Jim Albert's 2004 paper “A Batting Average: Does It Represent Ability or Luck?” and Tango's True Talent Level analysis.

July 18, 2010

Probability of winning the division

The Cool Standings website presents the probabilities that any Major League Baseball team will win the division or wild card. They also have this available for the NHL, NFL, and NBA.

If I am interpreting their methodology correctly, they are using a Pythagorean basis in a Monte Carlo simulation. (To read more, here's an ESPN article on the method.)


July 14, 2010

Behind in the count

Time to get caught up with the goings-on elsewhere...

First up, Tom at Heureusement, ici, c'est le Blog! cleverly adapted the same Poisson method I used for perfect games to examine the plethora of no-hitters this season.

And no surprise, the number fits nicely with the outer range of the "expected" frequencies.

June 24, 2010

Tireless tennis

After a bit of water-cooler chat today at work, I had planned to spend part of my evening working out the probability of the crazy Wimbledon match that saw John Isner win a marathon match against Nicolas Mahut, after the tie-breaker went 138 games and had Isner prevail 70-68.

I got home to find not one but two well-presented analyses that tackle the question, so my work would be redundant. Thus I present the following links for your reading pleasure:

1. Carl Bialik, "Isner Fitting Winner of Marathon Wimbledon Match", Wall Street Journal

2. Phil Birnbaum, "What were the odds of the 70-68 score at Wimbledon?", Sabermetric Research. (Birnbaum graciously acknowledges Bialik's article in an appended post-post foreward.)

The final summary: this was a one-in-a-million (give or take, depending on some of the assumptions) event.

June 6, 2010

Perfectly random?

There has been much discussion about the recent run of perfect and almost-perfect games. A variety of hypotheses have been floated, including
pitching dominance (including a higher strike out ratio), improved defense, and the confluence of expansion, better player evaluation, and a drug-free world.

Perfect games are a rare event, so we run the risk of seeing a random cluster as a trend. There have now been 20 perfect games -- 18 in the "modern era" (since 1900), 14 since the expansion era began in 1961, and two so far in the 2010 season. How can we tell if this "streak" of two perfect games in a single season is simply a random fluctuation?

Calculating the probability of a perfect game: allowing runners

One approach is to calculate a theoretical probability based on on-base percentage (OBP). Tango has a blog entry "Perfect Game calculation" that presents one approach. His estimate was 1 perfect game per 15,000.

Another example of this appears in Mathletics by Wayne L. Winston, who calculated a probability of 0.0000489883, or 1 game in just over 20,400. Winston noted at the time the book went to press (before the 2009 season) there had been nearly 173,000 regular season games since 1900 and each game provides 2 opportunities for a perfect game (so we have 346,000 "team games"). Winston then goes on to note that we would therefore expect there to be 16.95 perfect games over that period -- almost perfectly matching the observed total of 17 to that point in time.

A side note: after Mark Buehrle's perfect game in 2009, Sky Andrecheck took a similar approach for individual players. He worked out the individual chances for the 16 modern-era players who had tossed a perfect game, based on the sum of the on-base percentage and reached-on-error percentage they allowed over their careers.

Calculating the probability of a perfect game: observed rate

A second approach to calculating the probability is to compare the observed number of perfect games and to the number of opportunities. I decided to use 1961 as year one. This was a natural point to begin -- this was the first year of baseball's expansion, and it falls mid-way between Don Larson's 1956 World Series perfecto (which had been the first in 22 years) and Jim Bunning's 90 pitch masterpiece in 1964. Between 1961 and 2009 inclusive, there were 12 perfect games -- and there were 201,506 regular season "team games". This gives us a probability of 0.00005955, or 1 perfect game every 16,790 team games played.

This method yields a result that is roughly the mid-point between Tango's and Winston's approaches.

What are the odds of two perfect games in one season?

While most statistical analysis makes the assumption that the distribution of the events is "normal", when we are dealing with rare discrete events the distribution does not resemble the normal distribution. The most common distribution used for this is the Poisson distribution.

At the probability of 1 in 16,790 across a season of 4,860 "team games" (the current number per season -- based on 2,430 games and therefore 4,860 perfect game opportunities) and 4,112 (the average number since 1961) that the probabilities, expected frequencies, and observed frequencies are as follows:

So over 50 seasons, we would predict that there would be between 1 and 2 seasons with 2 perfect games, and between 9 and 11 seasons with 1 perfect game.

So to answer the question posed in the title, the answer is "Yes -- two perfect games in one season is well within the expected distribution." The fact that 2010 has been the first season with 2 perfect games in the 50 years since 1961 fits perfectly with the expected distribution.

In future posts I will repeat the calculation of probabilities and frequencies, with modified probabilities (once the dust settles on the "correct" way to calculate the probabilities...)

Comments and questions are always welcome.

June 4, 2010

A closer look at payroll and performance

A recent post on Hawkonomics presented a regression analysis of Major League Baseball team performance as a function of payroll. This post has generated some chatter in the sabermetric blogs (Sabermetric Research and The Book). If I may be so bold, the original post wasn't very well articulated, which has led to some critiques. Herein I aim to repeat the original analysis, and provide some elaboration that will aid interpretation.

It is clear that the calibre of the players on the team influences the number of wins. What is less clear is the relationship between team calibre and the total amount the team pays in salaries. We have all heard it said that rich teams "buy a championship" by loading up on highly paid free agents, but how true is it?

This relationship has been analyzed in the past. One such analysis can be found in the book The Wages of Wins by Berri, Schmidt, & Brook, and there are plenty of other sources around the sabermetric blogs. (One interesting visualization tool can be found on Ben Fry’s site.)

One of the most common ways to test a relationship between two variables is through a regression analysis. This is the approach taken by Stacey Brook over at Hawkonomics. (Note: Brook is one of the co-authors of The Wages of Wins.)

I have re-run the regression using the data supplied on his blog. I changed two things to make the results more readily comprehensible. First, I changed the salary figures to be represented as millions; thus the Yankee’s salary is expressed not as $206,333,389 but $206.3. More dramatically, I used each team’s current winning percentage and projected it out over 162 games – essentially a forecast of where the teams will end up at the end of the 2010 season if they continue at the pace established over the first ~50 games of the season.

NOTE: These transformations alter neither the “goodness of fit” of the model nor the statistical significance.

Let’s look in detail at the model that results.

1. The Correlation: The strength of the relationship between team salaries and wins

The correlation coefficient (often identified as the Pearson correlation coefficient, and represented as "R") is a unitless measure that simply tells us how much the two variables vary together in a linear manner. If they both move up in lockstep, the correlation coefficient will be 1 (the temperature outside and the amount of electricity used to run air conditioners); if one moves up while the other moves down in lockstep, the R value will be -1 (the temperature outside and the amount of natural gas burned keeping your house warm). If there is no relationship at all, then R will equal zero (the temperature outside and the amount of energy used to heat the gallons of hot water used by teenagers in the shower).

For MLB salaries and wins so far this season, the R value is 0.224. This is interpreted as being a weak linear relationship. In plainer language, the data do not really follow a linear pattern.

But there is another value -- R2 or R-squared -- that gives us some language to work with. In this case, O.224 squared is 0.0503. From this, we can say that salaries improve our prediction of a team's winning success by 5.03% -- not a very big improvement at all.

This is easily seen in the X-Y chart below. Salary is plotted across the bottom, with the forecast wins up the side (I converted the team win percentages to a forecast season wins -- more on this later.) Each team is represented by one of the dark blue diamonds scattered about the chart. The predicted values derived from the regession model are shown in the form of the red dots joined by a nice straight line.

[click for a bigger version]

From this, it is easy to see that the model is not a very good predictor of actual wins. While there are some blue dots that fall close to the line, there are others that are well above or below the line. If there isn’t much difference between the actual and predicted values, we have a good model. That clearly is not the case here.

The model tells us little about what makes a winning team, because a lot of the difference in team success cannot be explained by salaries. In short, this model has no oomph.

(Back to the side note from earlier: the correlation coefficient will remain the same regardless of how we express our variables. We can convert the dollars to a percentage of the average for the season (thus the low-spending Pirates would be said to have a salary that is 39% of the average while the Yankees are spending 206% above), and the correlation coefficient remains at 0.224. Or we could convert the winning percentage to actual wins, with no change in the R value.

2a. The regression equation -- how much does a change in salaries influence wins?

The regression equation is expressed as
Where Y is the predicted number of wins, and X is the salary. The constant is the point on the Y axis where X is equal to zero, also known as the Y intercept. The beta value is the amount that a change in X will generate in increase of one in the predicted Y value. The constant and the beta value are calculated in the model.

In this model, it becomes

The interpretation: each extra $87,000 spent yields an increase in a single win, starting at a base of 73 wins. A team that spends an average amount on salaries ($90.6 million) will get an average number of wins (81).

When we start to think about this equation, it’s easy to see why the model isn’t very robust. There are some teams that are going to end the season with less than 73 wins if they keep on the way they have been. To end up below 73 wins, the model says the players should be paying the team!

2b. This year’s Moneyball teams

The model does give us a way to see which teams are getting the most production (i.e. wins) for every dollar spent – the gap between the team’s actual performance and the number of wins predicted in the model is the “residual”, and it ranges from a high of 29 wins above what the model predicts for Tampa Bay and 21 for San Diego, to a low of -34 for Baltimore and -23 for Houston.

3. Statistical significance

This model is NOT statistically significant.

So what? All this means is that if we were to use another group of 30 team wins-team salary pairs, we would likely get a different R value. We could improve the significance of the model with more team salary and wins data pairs.

But if the data points are still as dispersed as they are in this case, more data points might yield a “statistically significant” model that (and this is the important part…) has the same correlation coefficient – the model would still have no oomph. All we have then is a model that we can be confident tells us that team salary has a small relationship with being the number of wins earned.

Some parting thoughts

So we have arrived at the inescapable conclusion that this model does not tell us much about what influences wins, since there is little relationship between salaries and wins at this point in the 2010 season. The model is both weak and statistically insignificant.

The fact that the model is so weak runs counter to earlier research, which tended to find a stronger relationship. Is 2010 different, or is it just too early in the season to tell?

Comments and questions are always welcome.