October 18, 2011

World Series prediction: the Bill James method

Bill James developed a method for predicting playoff series winners, last updated in the 1984 edition of Baseball Abstract in an essay titled "The World Series Prediction System, Revisited".  At that point, it had a pretty good track record -- 73% success in predicting the winner of all the postseason series in the 20th century.

Mike Lynch over at seamheads.com used the method (without any adjustments, updates, or other tweaks) to predict the 2010 World Series -- which correctly identified the Giants.

This year, Lynch has again used the tool and tabulated the Rangers and the Cardinals according to the Bill James method. 

The result:  the Rangers come out as solid favorites.

(A couple of other older references to previous use of method are here and here. Other than that, I haven't found anything on the web that uses or updates the method.)


October 17, 2011

World Series prediction

The 2011 World Series starts in a couple of days, and it's time for the pundits to come out and make their predictions.  Over on coolstandings.com they've posted their prediction for the World Series.  Here's a screenshot of their "smart" prediction:

(The "dumb" prediction is 50/50 for either team, so there's no point talking about that. And I've posted a screenshot, since their predictions are live and will change upon the outcome of the first game of the World Series. An example of the Monty Hall problem, in real life.)

To summarize:  Texas shows as having a 68.2% probability of winning the World Series.

I'm not sure of the details of their methodology, but we can use each team's regular season win/loss record to employ the "log5" approach to come up with our own prediction.  So I did that, and my first prediction is for a Texas victory (58% probability) -- and if pressed to predict the series length, it would be Texas in 6 games (17% of the outcomes are Texas 4-2).  Both probabilities are substantially lower than the coolstandings prediction.

But we can be a bit more sophisticated in our approach, using an adjusted win/loss percentage that employs a Bayesian adjustment to each team's final result.  (This is the same method I used back in May for the early season results -- after 162 games the impact of the prior is much reduced.) This changes Texas' winning percentage to 0.571, and St. Louis to 0.543.  (Google doc spreadsheet here.)  Using the log5 formula, this gives the Rangers a 0.538 edge over the Cardinals.

Working through the 7 game series, Texas' probability of winning the World Series is 56%.

And we can be still more clever, by considering the road/home splits of each team.

Team      W-L     %   posterior
-------- -----  ----  ---------
Texas    96-66  .593   .571
- home   52-29  .642   .591
- road   44-37  .543   .527

St Louis 90-72  .556   .543
- home   45-36  .556   .535
- road   45-36  .556   .535

The home-road splits improve things for the Cardinals, since they had a better home record than Texas' road record and thus become more likely to win a home game. As well, the Cardinals have home field advantage (but only on game 7 -- the Rangers have home field advantage in a 5-game series. But I digress.)  After using the home-road splits, Texas still remains the favorite, but the probability is down to 54%.

While my approaches still give Texas the biggest likelihood of victory, my estimates are less emphatic than the probabilities over at coolstandings.  Based on the characterizations used at coolstandings, my methods lie somewhere between "dumb" and "smart".  "Average intelligence", perhaps.

Tow Mater says "Rangers in 6. But I had the Phillies and the Brewers beating the Cardinals, too".


October 7, 2011

WPA contribution infographic

I like these WPA word cloud graphics from SB Nation by Kevin Dame, describing the player contributions in last night's Tiger-Yankee ALDS game 5.  (From

One of the things I like is that they emphasize that that WPA (Win Probability Added) is circumstantial. 

For the Tiger pitching staff, the starter Fister gave up only one run over five innings (that is, four scoreless innnings), but gets a smaller font than the closer Valverde who worked only the scoreless ninth inning.  An easy example is Fister worked a 1-2-3 1st innning with a 2-0 lead, which was worth 0.052 WPA.  By contrast, Valverde's 1-2-3 9th inning with a one-run lead (3-2 score), was worth 0.222 WPA.  Being later in the game and with a tighter score yielded a higher WPA.

And for the Yankee hitters, ARod's strikeout to end the game (the end of Valverde's 1-2-3 ninth) was only one-third as important to the Yankee defeat (-0.053 WPA) as Swisher's strikeout to end the 7th inning, when the bases were loaded (-0.154).  Of course, on Swisher's strikeout Tiger pitcher Joaquin Benoit set himself up for the big 0.154 WPA by coming in with a runner on 1st, then giving up two singles to load the bases, followed by a walk to close the lead to one run.

(The Fangraphs box score has the details that were used to make the word clouds, while the individual play log, with the WPA for each at-bat, is here.)


October 4, 2011

Actuarial baseball

Been off the grid for a while...

A couple of weeks ago, Josh Hamilton of the Rangers hit a grand slam that got an above-average amount of attention, since it was tied into a promotion being run by a flooring company.  The title of this article describing the homer could instead be "Josh Hamilton's grand slam yields big insurance payout":

Somebody, somewhere, in some insurance company, sold coverage for this promotion.  And that same somebody (we hope) must have sat down and calculated the probability of Hamilton hitting a grand slam over a one month period, and set the premium based on that probability.

Summary:  insurance is gambling.