December 14, 2011

OT: Unemployment Rates in the U.S.A.

A great example of a truly awful graph was posted on Flowing Data, starting a conversation on The Book.

I posted the following comment there:

Wexler/27 beat me to the BLS data sets... I will  note that the "discouraged" numbers can be found in the "characteristics of the unemployed" tables.  The different ways of parsing what constitutes "unemployment" are ways to try to get to the nuance in why people aren't working.  The narrow "actively seeking work" definition of defining who is in the labour market is a way to cut through demographic changes (e.g. in the post-WWII period, when most women were June Cleavering and not seeking work), increases in post-secondary enrollments, etc.
With that said, the increase in people who have thrown in the towel is, to me, one of the most disturbing parts of the current recession.
Wexler/7 and MGL/8 raise the question of attribution -- how much is the President (in the current circumstance, Obama) responsible for unemployment rates?
I dug up some historical U.S. unemployment rate data going back to 1948, and I have posted a chart of it to my own blog (because I have no idea how to do that directly).
A summary:  the current increase in unemployment started in the last year of G.W. Bush's 2nd term (rising from 5.0% in January 2008 to 6.8% by the November, the month of the election).  Going back, there was a peak in unemployment during the first G.W. presidency, and another that straddled G.H.W. Bush and Clinton.  And the worst unemployment rates since the Great Depression (higher rates and with a longer peak than the current phase) were during the first term of Ronald Reagan's presidency.  


Here's a chart of U.S. unemployment rates from January 1948 to November 2011:

And for those of you wanting to focus on the more recent period, the past 20 years (less a month):





I obtained the U.S. Department of Labor data set from the Economic Research pages of the Federal Reserve Bank of St. Louis here, and have converted that text file to an Excel file.

More Bureau of Labor Statistics (BLS) unemployment data can be found herehere and here.

-30-

November 21, 2011

Farm system success

Flip Flop Flyball has had a number of good infographics (and humour items) in the past, but their recent "Wins and loses throughout each team's system" chart is particularly interesting.  One thing that caught my eye is that no team in the Houston Astros system managed to break the .500 mark in 2011.


This raises a question in my mind. Can the current performance of minor league afflitates be used to predict MLB team performance at some future date?  (Economists would call this a leading indicator.)  All of the research on minor league performance that I'm aware of is in service of forecasting individual player performance. For good review of that work, see "The Projection Rundown" at Fangraphs.

But I'm wondering if the fluid nature of the minor leagues will yield any sort of meaningful result at the team level.  Not only are players constantly moving up and down between the levels, it also seems to me that they are every bit as likely (if not more so) to move mid-season from one organization's farm system to another (resource: Baseball America's listing of minor league players).  And the farm teams themselves are prone to shifting from one organization to another, and moving up and down the levels.  As one example, the Vancouver Canadians of the single-A Northwest League were affiliates of the Oakland A's for 11 seasons, but in 2011 came under to Toronto Blue Jays umbrella (they finished with a 0.513 record, second in their division).

Which then leads me back to the infographic: other than 2011 results, does it tell us anything?

-30-

November 14, 2011

Lewis & Beane interview

Moneyball (the movie) opens in the U.K. on November 25, and as part of the publicity, The Financial Times features an in-depth interview with both Michael Lewis and Billy Beane by Simon Kuper.

It's quite a revealing interview, that digs into the relationship between Lewis and Beane -- why Lewis was interested in finding the story, and why Beane let Lewis hang around.

But to whet your appetite, here's a couple of highlights. First, a quote from Michael Lewis:
"Baseball is a stupid-making enterprise in that nobody wants to be singled out or say something dumb. You wander in the clubhouse and it’s amazing how incurious the players are. One reason I was attracted to Scott Hatteberg [the former A’s player] as a character: he was just curious: ‘What the hell are you doing here, man?’”

On the criticisms of Moneyball:

There are two silly objections often made to Lewis’s book. The first is that if Moneyball works so well, then why haven’t the A’s had a winning season since 2006? We meet on a sunny October morning, mid-playoffs, a perfect day for baseball, but the team’s season has long since ended.

However, the people who make this objection don’t seem to grasp the basic principles of imitation and catch-up. Once all teams are playing Moneyball, then playing Moneyball no longer gives you an edge. Indeed, the richer clubs have the means to play it smarter. The New York Yankees recently hired 21 statisticians, Beane marvels.

The other common snipe is that Beane should never have spilled his secrets to Lewis. That ruined the A’s, the critics say. But Lewis dismisses the charge. First, he notes, Beane had never imagined their conversations would spiral into a book. Lewis says, “I was going to do something little. By the time I thought I was going to do something big I’d hung around so much it would have been socially awkward to ask me to leave.”

Second, notes Lewis, by 2002 Moneyball was already spreading. The book ends with the Red Sox offering Beane the highest GM’s salary in baseball history. Only when Beane turned them down, having decided after Stanford that he’d never do anything just for money again, did the Red Sox hire Epstein. “The market was moving already,” says Lewis. “The teams that wanted to do it were going to do it anyway, so no book was going to make any difference. My view is the only effect of the book was to give them [the A’s] the credit. If no book had been written, Theo would have been branded the man who reinvented baseball.”

Of course, Epstein's stuff worked in the playoffs.

-30-

November 7, 2011

The Bayes Ball Bookshelf, #2

Baseball Analyst, 1982-1989 (Bill James, publisher and editor)

SABR is now hosting -- the the blessing of Bill James, and through the work of Phil Birnbaum -- the complete Baseball Analyst.  Between 1982 and 1989, Bill James published 40 issues of Baseball Analyst, which in retrospect is now recognized as the launch pad for some fundamental thinking about using quantitative approaches to understand baseball.

The initial issue got off to a great start, with an article about fielding by Paul Schwarzenbart. In his introduction to the issue, James writes that the article "demonstrates that fielding statistics, like batting and pitching but apparently even more so, are the products in part of circumstances as well as men." This is a topic that, 30 years later, continues to provide plenty of fodder for analysis (e.g. this blog post from a month ago by Tangotiger, "Not all fielding opportunities are created the same").

In later issues, there are articles covering the usual parade of topics: clutch hitting, ballpark effects, how much young pitchers should work, ageing of ball players, and of course movie reviews.

There's also familiar names: Pete Palmer, Phil Birnbaum, and Bill James himself.

All in all, Baseball Analyst is an interesting time capsule. The tools the sabermetric community use to communicate have shifted -- when was the last time you subscribed to a magazine produced on a typewriter and mimeograph? But more importantly, it demonstrates how thinking about these topics has shifted. This shift is both because of further research (we know more than we used to) and because of the proliferation of data and cheap computing power

But it also shows that in spite of 30 years of analysis, there are still many questions unresolved.

-30-

October 18, 2011

World Series prediction: the Bill James method

Bill James developed a method for predicting playoff series winners, last updated in the 1984 edition of Baseball Abstract in an essay titled "The World Series Prediction System, Revisited".  At that point, it had a pretty good track record -- 73% success in predicting the winner of all the postseason series in the 20th century.

Mike Lynch over at seamheads.com used the method (without any adjustments, updates, or other tweaks) to predict the 2010 World Series -- which correctly identified the Giants.

This year, Lynch has again used the tool and tabulated the Rangers and the Cardinals according to the Bill James method. 

The result:  the Rangers come out as solid favorites.

(A couple of other older references to previous use of method are here and here. Other than that, I haven't found anything on the web that uses or updates the method.)

-30-

October 17, 2011

World Series prediction

The 2011 World Series starts in a couple of days, and it's time for the pundits to come out and make their predictions.  Over on coolstandings.com they've posted their prediction for the World Series.  Here's a screenshot of their "smart" prediction:



(The "dumb" prediction is 50/50 for either team, so there's no point talking about that. And I've posted a screenshot, since their predictions are live and will change upon the outcome of the first game of the World Series. An example of the Monty Hall problem, in real life.)

To summarize:  Texas shows as having a 68.2% probability of winning the World Series.

I'm not sure of the details of their methodology, but we can use each team's regular season win/loss record to employ the "log5" approach to come up with our own prediction.  So I did that, and my first prediction is for a Texas victory (58% probability) -- and if pressed to predict the series length, it would be Texas in 6 games (17% of the outcomes are Texas 4-2).  Both probabilities are substantially lower than the coolstandings prediction.

But we can be a bit more sophisticated in our approach, using an adjusted win/loss percentage that employs a Bayesian adjustment to each team's final result.  (This is the same method I used back in May for the early season results -- after 162 games the impact of the prior is much reduced.) This changes Texas' winning percentage to 0.571, and St. Louis to 0.543.  (Google doc spreadsheet here.)  Using the log5 formula, this gives the Rangers a 0.538 edge over the Cardinals.

Working through the 7 game series, Texas' probability of winning the World Series is 56%.

And we can be still more clever, by considering the road/home splits of each team.

Team      W-L     %   posterior
-------- -----  ----  ---------
Texas    96-66  .593   .571
- home   52-29  .642   .591
- road   44-37  .543   .527


St Louis 90-72  .556   .543
- home   45-36  .556   .535
- road   45-36  .556   .535


The home-road splits improve things for the Cardinals, since they had a better home record than Texas' road record and thus become more likely to win a home game. As well, the Cardinals have home field advantage (but only on game 7 -- the Rangers have home field advantage in a 5-game series. But I digress.)  After using the home-road splits, Texas still remains the favorite, but the probability is down to 54%.

While my approaches still give Texas the biggest likelihood of victory, my estimates are less emphatic than the probabilities over at coolstandings.  Based on the characterizations used at coolstandings, my methods lie somewhere between "dumb" and "smart".  "Average intelligence", perhaps.

Tow Mater says "Rangers in 6. But I had the Phillies and the Brewers beating the Cardinals, too".

-30-