February 17, 2013

Run production, one team at a time

In a previous post, I used R to process data from the Lahman database to calculate index values that compare a team's run production to the league average for that year.  For the purpose of that exercise, I started the sequence at 1947, but for what follows I re-ran the code with the time period 1901-2012.

The R code I used can be found at this Github gist. Instead of boring you here with the ins and outs of what the code is doing, I've embedded that as documentation in the gist. The R code assumes that you've got a data frame called "Teams.merge" already in your workspace.  This can be achieved by running the previous code, or if you've done that before, you'll have created a csv file with the name "Teams.merge.csv", and now have the option to read that file as a data frame "Teams.merge".

The first step is to choose one of the current teams, and create a data frame that contains just that club's history.  Once this has been done, the code then creates trend lines (using the LOESS method, as I did with the leagues in previous posts), and then plot them.

For starters, let's look at the Seattle Mariners. The first thing that jumps out is that in the middle of the chart, the Mariners were scoring runs at above-average rates -- it started in the 1995 season, and lasted until 2003.  In 2007 the team was right at the league average (100.0), but in every other of the 9 seasons since the 1995-2003 peak, the Mariners have been producing runs at a rate below the league average. What's giving fans hope lately is that there's an encouraging up-tick since the nadir in 2010 (which, as we discovered, was a historically dismal performance).

It must be noted that Safeco Field has, since it opened mid-way through the 1999 season, consistently played as a pitcher's park, and this analysis makes no correction for park effects.  But it's worth pointing out that playing in a pitcher's park didn't stop the 2001 Mariners from producing runs at 117.7% of the league rate.  What an awesome team that was.

Next up, the Toronto Blue Jays, the Mariner's sibling in the 1977 American League expansion.  The Jays crossed the threshold to get above the league average in run scoring 12 years before than the Mariners, scoring runs at a rate above the league average from 1983-1993, with the single exception of the 1991 season -- a season they made the playoffs, and a year before they won the World Series in two consecutive seasons.  After a post-World Series/strike swoon, they've been zig-zagging on both sides the league average since 1998.

And finally, the New York Yankees.  This brings up an interesting quirk of how the Lahman database is coded ... my chart shows the Yankees going back to 1901, when they were one of the founding franchises when the American League was created that year. But for the first two years of the league, the franchise was known as the Baltimore Orioles.

So what we see is the well-established historical dominance of the Yankees. In only 22 of the 112 seasons has their run production fallen below the league average.  Some of this has to do with the fact that Yankee Stadium (in its various incarnations) has had a reputation as a hitter's park. Which, just looking at the ESPN Park Factors back to 2001, is entirely unwarranted. And what's with the crazy fluctuations? In 2004 the original Stadium was ranked #30, with a park effect of 0.694, only to be found in 2005 at the top of the chart with a park effect of 1.403.  Further research into this is warranted...a topic for a future post, perhaps.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.