tag:blogger.com,1999:blog-684119683005319088.comments2016-11-30T08:00:54.484-08:00Bayes BallMartin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comBlogger27125tag:blogger.com,1999:blog-684119683005319088.post-6699542157029261622016-11-30T08:00:54.484-08:002016-11-30T08:00:54.484-08:00Hi Dan, thanks for this. Are you willing and able...Hi Dan, thanks for this. Are you willing and able to share your code, perhaps on github?<br /><br />I'm a regular visitor to the R Graph Catalog linked in the body of the blog post, but more resources are always a welcome addition.Martin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-6013775293080571312016-11-24T06:32:12.043-08:002016-11-24T06:32:12.043-08:00This is such a useful feature - thanks!This is such a useful feature - thanks!Cathhttp://www.blogger.com/profile/18320529194217135458noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-60643043823776637992016-11-17T19:32:43.329-08:002016-11-17T19:32:43.329-08:00I loved her book and have been trying to implement...I loved her book and have been trying to implement some of her graphs in R. Sometimes it proves harder than you would think, but still fun to try. Danhttp://www.blogger.com/profile/11784444140424071558noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-74070415283756589752016-03-14T21:27:45.411-07:002016-03-14T21:27:45.411-07:00I was trying to do the same,after having seen Bob&...I was trying to do the same,after having seen Bob's email. but missed out on adding the gtable dependency. Thanks for showing how it is done. <br />burhan haqhttp://www.blogger.com/profile/12607531148119768244noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-26474575671032255712016-03-14T21:10:08.903-07:002016-03-14T21:10:08.903-07:00I was trying to do the same,after having seen Bob&...I was trying to do the same,after having seen Bob's email. but missed out on adding the gtable dependency. Thanks for showing how it is done. <br />Unknownhttp://www.blogger.com/profile/12607531148119768244noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-34302148338326456322014-10-18T11:27:53.964-07:002014-10-18T11:27:53.964-07:00Hello Martin, I just finished reading your excelle...Hello Martin, I just finished reading your excellent book review about Marchi and Albert’s Analyzing Baseball Data with R (2013). Even for a stats novice like myself, it was a joy to read—and more so now that I’ve recently decided to relearn all of the knowledge I gleaned (but thereafter lost over the past several years) from a graduate quantitative methods & statistics course. Simultaneously, I’d like to start working w/ R—as SPSS is now out of my price range, which is fine b/c I doubt I’d recall any of it anyway. Since that graduate course was a painful experience (even though I scraped by w/ an A), I want to re-educate myself on R and statistics via baseball to hopefully make this more enjoyable. . . . Anyway, enough intro.: for someone at this blank slate stage, which baseball, stats., & R book would you recommend as the best starter title: Analyzing Baseball Data with R, Albert’s earlier R by Example, or perhaps some other even more rudimentary baseball & stats book? (Mind you, I can hardly remember what standard deviation means anymore.) Thanks for any advice!pulpephemerahttp://pulpephemera.wordpress.com/noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-34534603258888103272013-11-30T12:59:45.498-08:002013-11-30T12:59:45.498-08:00Hi John, I've put copies of the National and A...Hi John, I've put copies of the National and American league spreadsheets (as .csv files) on Google Docs.<br />National League: https://drive.google.com/file/d/0B7t4wpcrwqkBdEt2UG4xNDJTdms/edit?usp=sharing<br />American League: https://drive.google.com/file/d/0B7t4wpcrwqkBdEt2UG4xNDJTdms/edit?usp=sharing<br /><br />To create the files used the Lahman package in R, which is only current up to the 2012 season. I've updated the files manually with the 2013 data points.Martin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-88516530794199535472013-11-24T12:27:25.556-08:002013-11-24T12:27:25.556-08:00Hello, I was wondering if you had a spreadsheet by...Hello, I was wondering if you had a spreadsheet by league of average number of runs per year.<br />thanks<br />JohnThe Wizardhttp://www.blogger.com/profile/06498063512838483016noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-74362334548077376582013-10-04T07:15:06.432-07:002013-10-04T07:15:06.432-07:00You are such a wonderful statistical geek...I can ...You are such a wonderful statistical geek...I can learn so much from you :)<br />Cathy Laddshttp://www.blogger.com/profile/16894011278395887994noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-17070575807554277692013-08-19T06:57:10.740-07:002013-08-19T06:57:10.740-07:00Martin, this infographic about predicting baseball...Martin, this infographic about predicting baseball may interest you. Feel free to share on your blog. The graphic is titled Predicting Baseball: Demystifying Bayes' Theorem. http://www.sports-management-degrees.com/baseball/Pamhttp://www.blogger.com/profile/08038925571562350698noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-20724904616771848082013-07-21T06:46:00.835-07:002013-07-21T06:46:00.835-07:00r2evans, thanks for this ... much appreciated.
I&...r2evans, thanks for this ... much appreciated.<br /><br />I've taken the liberty of including your code into the gist for this blog post, which is now here:<br />https://gist.github.com/MonkmanMH/6048590Martin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-74599146636617192672013-07-19T10:10:04.854-07:002013-07-19T10:10:04.854-07:00I've been using similar plots recently, and ad...I've been using similar plots recently, and adapted panel.cor() to include Spearman's non-parametric test and the number of non-NA samples for each pair-wise comparison. The variably-sized text is good (and I use it periodically), but I chose against it in this adapted version since I was stacking more information.<br /><br />Additionally, since several of the datasets I've been depicting contain lots of samples, I opt to jitter() them and/or use a color of "#00000055" for transparency, as it makes coincident data points much more apparent.<br /><br />I like this kind of visualization technique. Thanks!<br /><br />panel.cor <- function(x, y, digits=3, ..., text.cex, text.col='black') {<br /> par(usr = c(0, 1, 0, 1))<br /> numsamples <- sum(! is.na(x) & ! is.na(y))<br /> r <- cor(x, y, use='complete.obs')<br /> spearman <- cor.test(x, y, method='spearman', continuity=TRUE, exact=FALSE)<br /><br /> if (require(RColorBrewer, quietly=TRUE)) {<br /> colbrew <- 'YlOrRd' ## 9 available colors<br /> ndiv <- 5 ## can be up to 9+1=10 since first cut has no color<br /> colors <- c(NA, brewer.pal(ndiv-1, 'YlOrRd'))<br /> } else {<br /> ## if RColorBrewer is not available, need to define 'colors' manually<br /> ndiv <- 4<br /> colors <- c(NA, 'yellow', 'orange', 'red') ## for ndiv=4<br /> }<br /> ## Could use c(0:ndiv/ndiv), but cut() looks at (0,0.2] so a<br /> ## p-value of 0, though highly unlikely, would break things.<br /> ## Using anything less than 0 side-steps this problem.<br /> cuts <- c(-1, 1:ndiv/ndiv)<br /> if (spearman$p.value <= 0.05)<br /> polygon( c(-2,2,2,-2,-2), c(-2,-2,2,2,-2),<br /> col=colors[ cut(abs(r), breaks=cuts, labels=FALSE) ])<br /> mindig <- max(0.001, 1/10^digits)<br /> if (spearman$p.value < mindig) {<br /> spearman$p.value <- mindig<br /> leq <- '<'<br /> } else leq <- '='<br /> ## Can "arbitrarily" add other info to this list for stacked display.<br /> labels <- list(sprintf('n = %d', numsamples),<br /> sprintf(paste0('%0.', digits, 'f'), r),<br /> sprintf(paste0('p %s %0.', digits, 'f'), leq, spearman$p.value))<br /> nn <- length(labels)<br /> ## Ensure the text isn't too big for the square in height or width.<br /> ## 0.9 is just a factor to give a little bit of buffer.<br /> if (missing(text.cex))<br /> text.cex <- min(0.9/((nn+1) * strheight(labels[[1]]) * 1.3),<br /> 0.9/max(strwidth(labels)))<br /> text(0.5, (nn:1)/(nn+1), labels, cex=text.cex, col=text.col, adj=0.5)<br />}r2evanshttp://www.blogger.com/profile/14241809734844683570noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-32747035662660454772013-03-30T12:30:08.206-07:002013-03-30T12:30:08.206-07:00Right you are. Now corrected.Right you are. Now corrected.Martin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-2671551594073731382013-03-30T10:34:21.602-07:002013-03-30T10:34:21.602-07:00*less aggressive.*less aggressive.eddie spageggihttp://www.blogger.com/profile/02794508928792524824noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-43962256565105368952013-02-14T07:53:15.828-08:002013-02-14T07:53:15.828-08:00Peter, thanks -- very elegant indeed. I'll ed...Peter, thanks -- very elegant indeed. I'll edit the Gist to reflect this improvement.Martin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-69227578025528127632013-02-11T16:30:42.829-08:002013-02-11T16:30:42.829-08:00On the R code,
You asked for a more elegant way....On the R code, <br /><br />You asked for a more elegant way. Lines 25 - 41 of your code could be replaced with simply:<br /><br />LG_RPG <- aggregate(cbind(R, RA, G) ~ yearID + lgID, data = Teams, sum)<br /><br />And then you don't even have to clean up the variable names!Peterhttp://www.blogger.com/profile/10238296465381975352noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-77066903174326656022013-02-03T21:42:45.063-08:002013-02-03T21:42:45.063-08:00Excellent post! The 1969 Padres weren't that f...Excellent post! The 1969 Padres weren't that far behind the 2010 Mariners in offensive ineptitude -- 71.13 vs 71.25. At least the Padres had an excuse -- they were an expansion team.Gushttp://www.blogger.com/profile/15961930177781793610noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-41291769671071339912012-12-29T09:49:27.652-08:002012-12-29T09:49:27.652-08:00I don't know if there is an exact fit, but I d...I don't know if there is an exact fit, but I do like your take on baseball. I wonder if there is some overlap in your work and what we do over at Camden Depot. We are the Orioles' affiliate in ESPN's Sweetspot Network.<br /><br />If you have ideas and interest, send me a line at camdendepot at gmail.<br /><br />Cheers.Jon Shepherdhttp://www.blogger.com/profile/03521809778977098687noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-50904273578976217042012-08-13T10:35:54.158-07:002012-08-13T10:35:54.158-07:00I used Excels "Data" "From Web"...I used Excels "Data" "From Web" import wizard and the data came in perfectly. The only mod I had to do was remove the titles at the top of each page break.Ken Fullertonhttp://www.blogger.com/profile/10741797464332048107noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-72441591989107510592012-02-04T07:24:34.664-08:002012-02-04T07:24:34.664-08:00perhaps you'd be interested in my blog:
errors...perhaps you'd be interested in my blog:<br />errorstatistics.com<br /><br />although it doesn't talk about Bayes-ball or other sports, it does does about some of those hackneyed criticisms of statistical significance tests.<br />MayoMAYO:ERRORSTAThttp://www.blogger.com/profile/02967648219914411407noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-43636675063822233322011-05-02T12:13:34.979-07:002011-05-02T12:13:34.979-07:00Phil, this analysis suggests that the Phillies wil...Phil, this analysis suggests that the Phillies will end up 93-69 for the season. The 18 wins they already have are included in the 93 total.<br /><br />The math behind the shortcut approach (which put them at 90 wins) is:<br />WINS: current wins + 69/2 = 18 + 34.5 = 52.5<br />GAMES: current games + 69 = 26 + 69 = 95<br /><br />This gives a W/G percentage of .553, or 90-72 over the 162 game season. (This is slightly different from the more complex approach, but close enough for this purpose).<br /><br />In both methods, as the sample size (number of games played) increases, the impact of the regression is reduced.<br /><br />If the Phillies go to 36-16 (the same winning percentage but with twice as many games played), their predicted success on the season will jump to 94-68 games in the shortcut approach.<br /><br />And if their record goes to 54-24 (same % but three times the games), the predicted record ends up at 98-64.Martin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-89814878811850398512011-05-01T14:58:11.949-07:002011-05-01T14:58:11.949-07:00Do those take into account the games already playe...Do those take into account the games already played? That is, when you say Philadelphia will end up with 93 wins, is that the 18-8 they already have, plus 75-61 for the rest of the season?<br /><br />Or are you saying that their TALENT is 93 wins, and they should end up with more because of their 18-8 start?Phil Birnbaumhttp://www.blogger.com/profile/03800617749001032996noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-54607466200053979992011-04-07T12:24:25.159-07:002011-04-07T12:24:25.159-07:00da5etcetc:
D'oh! A typo on my part in the ori...da5etcetc:<br /><br />D'oh! A typo on my part in the original (now fixed). I intended to put .500, in large part because it's the easier calculation.<br /><br />The details, for those following along, is that in cases like this the probabiity of a team winning against a .500 team is its "true talent". That is, a .400 team will beat a .500 team 40% of the time (i.e. .400), a .300 team will beat a .500 team 30% of the time, and a .750 team will beat a .500 team 75% of the time.<br /><br />The formula for other percentages is as follows, where:<br />Aw = Team A's winning percentage<br />Al = Team A's losing percentage (i.e. 1-Aw)<br />Bw = Team B's winning percentage<br />Bl = Team B's losing percentage<br /><br />Probability of A winning against B =<br />(Aw * Bl)/((Aw * Bl) + (Al * Bw))<br /><br />Using this formula, a .400 team will beat a .600 team 30.8% of the time (P=0.308). As you say, roughly 1/3.<br /><br />Being a bit more precise, the probability of the .400 team going 4-0 against a .600 team is 0.308^4 = 0.90%.Martin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-28591884468702467342011-04-07T07:04:01.946-07:002011-04-07T07:04:01.946-07:00I have one minor quibble - A true-talent .400 team...I have one minor quibble - A true-talent .400 team against a true-talent .600 team will go 4-0 slightly more than 1% of the time. I assume you took .400^4 and got 2.56% but a .400 team wins 40% of its games against average competition (.500). Raising the opponent's level to .600 means that a .400 team should actually be expected to win roughly 1/3 of the time. Over four games this equates to (1/3)^4 = 1/81 = 1.2%.da5d321a-611f-11e0-990c-000bcdcb2996https://openid.aol.com/opaque/da5d321a-611f-11e0-990c-000bcdcb2996noreply@blogger.comtag:blogger.com,1999:blog-684119683005319088.post-76786513042630287702010-06-24T22:24:19.788-07:002010-06-24T22:24:19.788-07:00Tom,
There's an interesting table in "The...Tom,<br />There's an interesting table in "The Wages of Wins" (Table 3.4, p.40) that shows the percentage of wins explained by relative payroll. The percentage was highest in the period immediately following the 1994 lock-out (32.5%), but lowest in the years previous (6.2%). Over the entire period 1988-2005, the explanation was in the 18% range. <br /><br />Somewhere else in the chapter they say something about the Yankees being an extreme outlier that has a profound influence on the model, and without the Yankees there is no relationship between payroll and wins at all.<br /><br />But with all that said, there has been some good critiques leveled at the methods used in "The Wages of Wins". I'd suggest starting here: http://sabermetricresearch.blogspot.com/2006/11/wages-of-wins-on-r-and-r-squared.htmlMartin Monkmanhttp://www.blogger.com/profile/05582544453619381290noreply@blogger.com