The great baseball writer Roger Angell is the recipient of the 2014 J.G. Taylor Spink Award, the first time a non-BBWAA member has been given the award. Angell will be presented the award at the Baseball Hall of Fame at Cooperstown, during the induction weekend July 25 - 28, 2014.
Angell is, for my money, the best writer about baseball. His accounts of the game are from a fan's perspective, rather than the typical listing of the game's dramatic moments. Indeed, many of his greatest observations are about the fans and their experiences. One such essay is "The Interior Stadium"; required reading for anyone interested in sports and people's responses to the games.
Much of Angell's writing was published by his employer, the New Yorker, who have recently compiled two summaries of his writing. The first was offered by David Remmick, whose piece "Roger Angell Heads to Cooperstown" was published when the Spink award was announced in December 2013 and has links to a variety of Angell's best. More recently, "Hall of Fame Weekend: Roger Angell's Baseball Writing" (by Sky Dylan-Robbins) provides a different list of great essays.
One of the best things the New Yorker has made available is Angell's scorecard for Game 6 of the 2011 World Series, when the Cardinals were down to their final strike twice, but managed to come back and win the game (and then, in an anti-climactic game 7, the Series).
-30-
July 26, 2014
July 23, 2014
Left-handed catchers
Benny Distefano – 1985 Donruss #166 (source: baseball-almanac.com) |
Jack Moore, writing on the site Sports on Earth in 2013 (“Why no left-handed catchers?”), points out that lack of left-handed catchers goes back a long way. One interesting piece of evidence is a 1948 Ripley’s “Believe It Or Not” item with a left-handed catcher Dick Bernard (you can read more about Bernard’s signing in the July 1, 1948 edition of the Tuscaloosa News). Bernard didn’t make the majors, and doesn’t appear in any of the minor league records that are available on-line either.
Dick Bernard in Ripley’s “Believe It or Not”, 1948-12-30 (source: sportsonearth.com) |
There are a variety of hypotheses why there are no left-handed catchers, all of which are summarized in John Walsh’s “Top 10 Left-Handed Catchers for 2006” (a tongue-in-cheek title if ever there were) at The Hardball Times. A compelling explanation, and one supported by both Bill James and J.C. Bradbury (in his book The Baseball Economist) is natural selection; a left-handed little league player who can throw well will be groomed as a pitcher.
Throwing hand by fielding position as an example of a categorical variable
I was looking for some examples of categorical variables to display visually, and the lack of left-handed throwing catchers, compared to other positions, came to mind. The following uses R, and the Lahman database package.
The analysis requires merging the Master and Fielding tables in the Lahman database – the Master table gives the player's name and his throwing hand, and Fielding tells us how many games at each position they played. For the purpose of this analysis, we’ll look at the seasons 1954 (the first year in the Lahman database that has the outfield positions split into left, centre, and right) through 2012.
You may note that for the merging of the two tables, I used the new dplyr package. I tested the system.time of the basic version of “merge” to combine the two tables, and the “inner_join” in dplyr. The latter is substantially faster: my aging computer ran “merge” in about 5.5 seconds, compared to 0.17 seconds with dplyr.
# load the required packages
library(Lahman)
library(dplyr)
#
The first step is to create a new data table that merges the Fielding and Master tables, based on the common variable “playerID”. This new table has one row for each player, by position and season; we use the dim function to show the dimensions of the table.
Then, select only those seasons since 1954 and omit the records that are Designated Hitter (DH) and the summary of outfield positions (OF) (i.e. leave the RF, CF, and LF).
MasterFielding <- inner_join(Fielding, Master, by="playerID")
dim(MasterFielding)
## [1] 164903 52
#
MasterFielding <- filter(MasterFielding, POS != "OF" & POS != "DH" & yearID > "1953")
dim(MasterFielding)
## [1] 91214 52
This table needs to be summarized one step further – a single row for each player, counting how many games played at each position.
Player_games <- MasterFielding %.%
group_by(playerID, nameFirst, nameLast, POS, throws) %.%
summarise(gamecount = sum(G)) %.%
arrange(desc(gamecount))
dim(Player_games)
## [1] 19501 6
head(Player_games)
## Source: local data frame [6 x 6]
## Groups: playerID, nameFirst, nameLast, POS
##
## playerID nameFirst nameLast POS throws gamecount
## 1 robinbr01 Brooks Robinson 3B R 2870
## 2 bondsba01 Barry Bonds LF L 2715
## 3 vizquom01 Omar Vizquel SS R 2709
## 4 mayswi01 Willie Mays CF R 2677
## 5 aparilu01 Luis Aparicio SS R 2583
## 6 jeterde01 Derek Jeter SS R 2531
This table shows the career records for the most games played at the positions (for 1954-2012). We see that Brooks Robinson leads the way with 2,870 games played at third base, and the fact that Derek Jeter, at the end of the 2012 season, was closing in on Omar Vizquel’s career record for games played as a shortstop.
Cross-tab Tables
The next step is to prepare a simple cross-tab table (also known as contingency or pivot tables) showing the number of players cross-tabulated by position (POS) and throwing hand (throws).
Here, I’ll demonstrate two ways to do this: first with dplyr’s “group_by” and “summarise” (with a bit of help from reshape2), and then the “table” function in gmodels.
# first method - dplyr
Player_POS <- Player_games %.%
group_by(POS, throws) %.%
summarise(playercount = length(gamecount))
Player_POS
## Source: local data frame [17 x 3]
## Groups: POS
##
## POS throws playercount
## 1 1B L 411
## 2 1B R 1515
## 3 2B L 4
## 4 2B R 1560
## 5 3B L 4
## 6 3B R 1889
## 7 C L 4
## 8 C R 980
## 9 CF L 393
## 10 CF R 1252
## 11 LF L 544
## 12 LF R 2161
## 13 P L 1452
## 14 P R 3623
## 15 RF L 520
## 16 RF R 1893
## 17 SS R 1296
To transform this long-form table into a traditional cross-tab shape we can use the “dcast” function in reshape2.
library(reshape2)
## Loading required package: reshape2
dcast(Player_POS, POS ~ throws, value.var = "playercount")
## POS L R
## 1 1B 411 1515
## 2 2B 4 1560
## 3 3B 4 1889
## 4 C 4 980
## 5 CF 393 1252
## 6 LF 544 2161
## 7 P 1452 3623
## 8 RF 520 1893
## 9 SS NA 1296
A second method to get the same result is to use the “table” function in the gmodels package.
library(gmodels)
## Loading required package: gmodels
throwPOS <- with(Player_games, table(POS, throws))
throwPOS
## throws
## POS L R
## 1B 411 1515
## 2B 4 1560
## 3B 4 1889
## C 4 980
## CF 393 1252
## LF 544 2161
## P 1452 3623
## RF 520 1893
## SS 0 1296
A more elaborate table can be created using gmodels package. In this case, we’ll use the CrossTable function to generate a table with row percentages. You’ll note that the format is set to SPSS, so the table output resembles that software’s display style.
CrossTable(Player_games$POS, Player_games$throws,
digits=2, format="SPSS",
prop.r=TRUE, prop.c=FALSE, prop.t=FALSE, prop.chisq=FALSE, # keeping the row proportions
chisq=TRUE) # adding the ChiSquare statistic
##
## Cell Contents
## |-------------------------|
## | Count |
## | Row Percent |
## |-------------------------|
##
## Total Observations in Table: 19501
##
## | Player_games$throws
## Player_games$POS | L | R | Row Total |
## -----------------|-----------|-----------|-----------|
## 1B | 411 | 1515 | 1926 |
## | 21.34% | 78.66% | 9.88% |
## -----------------|-----------|-----------|-----------|
## 2B | 4 | 1560 | 1564 |
## | 0.26% | 99.74% | 8.02% |
## -----------------|-----------|-----------|-----------|
## 3B | 4 | 1889 | 1893 |
## | 0.21% | 99.79% | 9.71% |
## -----------------|-----------|-----------|-----------|
## C | 4 | 980 | 984 |
## | 0.41% | 99.59% | 5.05% |
## -----------------|-----------|-----------|-----------|
## CF | 393 | 1252 | 1645 |
## | 23.89% | 76.11% | 8.44% |
## -----------------|-----------|-----------|-----------|
## LF | 544 | 2161 | 2705 |
## | 20.11% | 79.89% | 13.87% |
## -----------------|-----------|-----------|-----------|
## P | 1452 | 3623 | 5075 |
## | 28.61% | 71.39% | 26.02% |
## -----------------|-----------|-----------|-----------|
## RF | 520 | 1893 | 2413 |
## | 21.55% | 78.45% | 12.37% |
## -----------------|-----------|-----------|-----------|
## SS | 0 | 1296 | 1296 |
## | 0.00% | 100.00% | 6.65% |
## -----------------|-----------|-----------|-----------|
## Column Total | 3332 | 16169 | 19501 |
## -----------------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 1759 d.f. = 8 p = 0
##
##
##
## Minimum expected frequency: 168.1
Mosaic Plot
A mosaic plot is an effective way to graphically represent the contents of the summary tables. Note that the length (left to right) dimension of each bar is constant, comparing proportions, while the height of the bar (top to bottom) varies depending on the absolute number of cases. The mosaic plot function is in the vcd package.
library(vcd)
## Loading required package: vcd
## Loading required package: grid
mosaic(throwPOS, highlighting = "throws", highlighting_fill=c("darkgrey", "white"))
Conclusion
The clear result is that it’s not just catchers that are overwhelmingly right-handed throwers, it’s also infielders (except first base). There have been very few southpaws playing second and third base – and there have been absolutely no left-handed throwing shortstops in this period.As J.G. Preston puts it in the blog post “Left-handed throwing second basemen, shortstops and third basemen”,
While right-handed throwers can be found at any of the nine positions on a baseball field, left-handers are, in practice, restricted to five of them.
So who are these left-handed oddities? Using the filter function, it’s easy to find out:
# catchers
filter(Player_games, POS == "C", throws == "L")
## Source: local data frame [4 x 6]
## Groups: playerID, nameFirst, nameLast, POS
##
## playerID nameFirst nameLast POS throws gamecount
## 1 distebe01 Benny Distefano C L 3
## 2 longda02 Dale Long C L 2
## 3 squirmi01 Mike Squires C L 2
## 4 shortch02 Chris Short C L 1
# second base
filter(Player_games, POS == "2B", throws == "L")
## Source: local data frame [4 x 6]
## Groups: playerID, nameFirst, nameLast, POS
##
## playerID nameFirst nameLast POS throws gamecount
## 1 marqugo01 Gonzalo Marquez 2B L 2
## 2 crowege01 George Crowe 2B L 1
## 3 mattido01 Don Mattingly 2B L 1
## 4 mcdowsa01 Sam McDowell 2B L 1
# third base
filter(Player_games, POS == "3B", throws == "L")
## Source: local data frame [4 x 6]
## Groups: playerID, nameFirst, nameLast, POS
##
## playerID nameFirst nameLast POS throws gamecount
## 1 squirmi01 Mike Squires 3B L 14
## 2 mattido01 Don Mattingly 3B L 3
## 3 francte01 Terry Francona 3B L 1
## 4 valdema02 Mario Valdez 3B L 1
My github file for this entry in Markdown is here: [https://github.com/MonkmanMH/Bayesball/blob/master/LeftHandedCatchers.md]
-30-
Labels:
Benny Distefano,
handedness,
Lahman database,
using R
Subscribe to:
Posts (Atom)