November 29, 2010

Good math, bad statistics

In the past few days, a pair of posts on other blogs caught my attention -- they seem to be coming at the same issue from different directions.
First, William R. Briggs posted "Statistics Is Not Math" (November 16, 2010). Then, Tango over at The Book posted "Detrending: when statisticians attack!" (November 24, 2010). I responded to the Tango post (comment #4), but I would like to here elaborate further.
One of the things that jumped out at me from Briggs' post was the statement that "Statistics rightly belongs to epistemology, the philosophy of how we know what we know. Probability and statistics can even be called quantitative epistemology." In other words, statistics is useful only if we have some understanding of the subject matter at hand. No amount of fancy math will help our understanding if we do not start our research with some knowledge of the topic.
In the "Detrending" post, Tango links to an unpublished (in the academic sense that it's not been published in a peer-reviewed journal) paper, by three physicists, Alexander M. Petersen , Orion Penner, and H. Eugene Stanley, entitled "Detrending career statistics in professional baseball:
Accounting for the steroids era and beyond". I may offer a longer critique of this paper at a later date, but the first thing that jumps out is an apparent ignorance of The Literature (i.e. what's been written earlier about the topic -- baseball -- from a statistical basis). This leads the authors to make conclusions that have been supported elsewhere (for example, pitcher wins are not a good measure of pitcher performance, or that standardizing allows for inter-season comparisons).
There's lots of fancy maths (some of which isn't as fancy or new-fangled as the authors seem to think) and plenty of Greek letters, but in the end it doesn't add a great deal to our understanding of baseball.
This article serves as a reminder that when we are assessing the quality of any sabermetric writing, we need to consider two factors:
1. Is the author using the appropriate statistical tools and interpreting the mathematical results correctly?
2. Does the author understand the game, including how baseball has evolved and the analytic literature that has been written over the past 50 years?


No comments:

Post a Comment