Bayes Ball

"Data science is the science and design of (1...

2019-06-03T08:54:33.132-07:00

"Data science is the science and design of (1) actively creating a question to investigate a hypothesis with data, (2) connecting that question with the collection of appropriate data and the application of appropriate methods, algorithms, computational tools or languages in a data analysis, and (3) communicating and making decisions based on new or already established knowledge derived from the data and data analysis. (p.2)"

This is an applied statistician, and has been for decades.

Thanks to both Charles and Unknown -- I have used ...

2017-03-27T18:30:46.089-07:00

Thanks to both Charles and Unknown -- I have used the req() function and the temporary error no longer appears. A simple and elegant solution.

Nice little Shiny app! As Charles says, this temp...

2017-03-27T03:59:51.825-07:00

Nice little Shiny app!

As Charles says, this temporary error state can be suppressed. A better solution than using is.null() would be to wrap the required input with the req() function which was introduced into shiny specifically to address this issue.

You can see https://shiny.rstudio.com/articles/req.html for more info.

Hello In your "team plot" tab (where se...

2017-03-27T02:58:37.284-07:00

Hello

In your "team plot" tab (where selectInput is created server side), you have an error message that appears for a second or so, complaining about "incorrect length (0)". It is an error that I also have in my apps and that I have been able to suppress with:

if (is.null(input$Name_of_selectInput_server_side)) {
NULL
} else {
plot script
}

Hope this is helpful.

Cheers
Charles

Hi Dan, thanks for this. Are you willing and able...

2016-11-30T08:00:54.484-08:00

Hi Dan, thanks for this. Are you willing and able to share your code, perhaps on github?

I'm a regular visitor to the R Graph Catalog linked in the body of the blog post, but more resources are always a welcome addition.

This is such a useful feature - thanks!

2016-11-24T06:32:12.043-08:00

This is such a useful feature - thanks!

I loved her book and have been trying to implement...

2016-11-17T19:32:43.329-08:00

I loved her book and have been trying to implement some of her graphs in R. Sometimes it proves harder than you would think, but still fun to try.

I was trying to do the same,after having seen Bob&...

2016-03-14T21:27:45.411-07:00

I was trying to do the same,after having seen Bob's email. but missed out on adding the gtable dependency. Thanks for showing how it is done.

I was trying to do the same,after having seen Bob&...

2016-03-14T21:10:08.903-07:00

I was trying to do the same,after having seen Bob's email. but missed out on adding the gtable dependency. Thanks for showing how it is done.

Hello Martin, I just finished reading your excelle...

2014-10-18T11:27:53.964-07:00

Hello Martin, I just finished reading your excellent book review about Marchi and Albert’s Analyzing Baseball Data with R (2013). Even for a stats novice like myself, it was a joy to read—and more so now that I’ve recently decided to relearn all of the knowledge I gleaned (but thereafter lost over the past several years) from a graduate quantitative methods & statistics course. Simultaneously, I’d like to start working w/ R—as SPSS is now out of my price range, which is fine b/c I doubt I’d recall any of it anyway. Since that graduate course was a painful experience (even though I scraped by w/ an A), I want to re-educate myself on R and statistics via baseball to hopefully make this more enjoyable. . . . Anyway, enough intro.: for someone at this blank slate stage, which baseball, stats., & R book would you recommend as the best starter title: Analyzing Baseball Data with R, Albert’s earlier R by Example, or perhaps some other even more rudimentary baseball & stats book? (Mind you, I can hardly remember what standard deviation means anymore.) Thanks for any advice!

Hi John, I've put copies of the National and A...

2013-11-30T12:59:45.498-08:00

Hi John, I've put copies of the National and American league spreadsheets (as .csv files) on Google Docs.
National League: https://drive.google.com/file/d/0B7t4wpcrwqkBdEt2UG4xNDJTdms/edit?usp=sharing
American League: https://drive.google.com/file/d/0B7t4wpcrwqkBdEt2UG4xNDJTdms/edit?usp=sharing

To create the files used the Lahman package in R, which is only current up to the 2012 season. I've updated the files manually with the 2013 data points.

Hello, I was wondering if you had a spreadsheet by...

2013-11-24T12:27:25.556-08:00

Hello, I was wondering if you had a spreadsheet by league of average number of runs per year.
thanks
John

You are such a wonderful statistical geek...I can ...

2013-10-04T07:15:06.432-07:00

You are such a wonderful statistical geek...I can learn so much from you :)

Martin, this infographic about predicting baseball...

2013-08-19T06:57:10.740-07:00

Martin, this infographic about predicting baseball may interest you. Feel free to share on your blog. The graphic is titled Predicting Baseball: Demystifying Bayes' Theorem. http://www.sports-management-degrees.com/baseball/

r2evans, thanks for this ... much appreciated. I&...

2013-07-21T06:46:00.835-07:00

r2evans, thanks for this ... much appreciated.

I've taken the liberty of including your code into the gist for this blog post, which is now here:
https://gist.github.com/MonkmanMH/6048590

I've been using similar plots recently, and ad...

2013-07-19T10:10:04.854-07:00

I've been using similar plots recently, and adapted panel.cor() to include Spearman's non-parametric test and the number of non-NA samples for each pair-wise comparison. The variably-sized text is good (and I use it periodically), but I chose against it in this adapted version since I was stacking more information.

Additionally, since several of the datasets I've been depicting contain lots of samples, I opt to jitter() them and/or use a color of "#00000055" for transparency, as it makes coincident data points much more apparent.

I like this kind of visualization technique. Thanks!

panel.cor <- function(x, y, digits=3, ..., text.cex, text.col='black') {
par(usr = c(0, 1, 0, 1))
numsamples <- sum(! is.na(x) & ! is.na(y))
r <- cor(x, y, use='complete.obs')
spearman <- cor.test(x, y, method='spearman', continuity=TRUE, exact=FALSE)

if (require(RColorBrewer, quietly=TRUE)) {
colbrew <- 'YlOrRd' ## 9 available colors
ndiv <- 5 ## can be up to 9+1=10 since first cut has no color
colors <- c(NA, brewer.pal(ndiv-1, 'YlOrRd'))
} else {
## if RColorBrewer is not available, need to define 'colors' manually
ndiv <- 4
colors <- c(NA, 'yellow', 'orange', 'red') ## for ndiv=4
}
## Could use c(0:ndiv/ndiv), but cut() looks at (0,0.2] so a
## p-value of 0, though highly unlikely, would break things.
## Using anything less than 0 side-steps this problem.
cuts <- c(-1, 1:ndiv/ndiv)
if (spearman$p.value <= 0.05)
polygon( c(-2,2,2,-2,-2), c(-2,-2,2,2,-2),
col=colors[ cut(abs(r), breaks=cuts, labels=FALSE) ])
mindig <- max(0.001, 1/10^digits)
if (spearman$p.value < mindig) {
spearman$p.value <- mindig
leq <- '<'
} else leq <- '='
## Can "arbitrarily" add other info to this list for stacked display.
labels <- list(sprintf('n = %d', numsamples),
sprintf(paste0('%0.', digits, 'f'), r),
sprintf(paste0('p %s %0.', digits, 'f'), leq, spearman$p.value))
nn <- length(labels)
## Ensure the text isn't too big for the square in height or width.
## 0.9 is just a factor to give a little bit of buffer.
if (missing(text.cex))
text.cex <- min(0.9/((nn+1) * strheight(labels[[1]]) * 1.3),
0.9/max(strwidth(labels)))
text(0.5, (nn:1)/(nn+1), labels, cex=text.cex, col=text.col, adj=0.5)
}

Right you are. Now corrected.

2013-03-30T12:30:08.206-07:00

Right you are. Now corrected.

*less aggressive.

2013-03-30T10:34:21.602-07:00

*less aggressive.

Peter, thanks -- very elegant indeed. I'll ed...

2013-02-14T07:53:15.828-08:00

Peter, thanks -- very elegant indeed. I'll edit the Gist to reflect this improvement.

On the R code, You asked for a more elegant way....

2013-02-11T16:30:42.829-08:00

On the R code,

You asked for a more elegant way. Lines 25 - 41 of your code could be replaced with simply:

LG_RPG <- aggregate(cbind(R, RA, G) ~ yearID + lgID, data = Teams, sum)

And then you don't even have to clean up the variable names!

Excellent post! The 1969 Padres weren't that f...

2013-02-03T21:42:45.063-08:00

Excellent post! The 1969 Padres weren't that far behind the 2010 Mariners in offensive ineptitude -- 71.13 vs 71.25. At least the Padres had an excuse -- they were an expansion team.

I don't know if there is an exact fit, but I d...

2012-12-29T09:49:27.652-08:00

I don't know if there is an exact fit, but I do like your take on baseball. I wonder if there is some overlap in your work and what we do over at Camden Depot. We are the Orioles' affiliate in ESPN's Sweetspot Network.

If you have ideas and interest, send me a line at camdendepot at gmail.

Cheers.

I used Excels "Data" "From Web"...

2012-08-13T10:35:54.158-07:00

I used Excels "Data" "From Web" import wizard and the data came in perfectly. The only mod I had to do was remove the titles at the top of each page break.

perhaps you'd be interested in my blog: errors...

2012-02-04T07:24:34.664-08:00

perhaps you'd be interested in my blog:
errorstatistics.com

although it doesn't talk about Bayes-ball or other sports, it does does about some of those hackneyed criticisms of statistical significance tests.
Mayo

Phil, this analysis suggests that the Phillies wil...

2011-05-02T12:13:34.979-07:00

Phil, this analysis suggests that the Phillies will end up 93-69 for the season. The 18 wins they already have are included in the 93 total.

The math behind the shortcut approach (which put them at 90 wins) is:
WINS: current wins + 69/2 = 18 + 34.5 = 52.5
GAMES: current games + 69 = 26 + 69 = 95

This gives a W/G percentage of .553, or 90-72 over the 162 game season. (This is slightly different from the more complex approach, but close enough for this purpose).

In both methods, as the sample size (number of games played) increases, the impact of the regression is reduced.

If the Phillies go to 36-16 (the same winning percentage but with twice as many games played), their predicted success on the season will jump to 94-68 games in the shortcut approach.

And if their record goes to 54-24 (same % but three times the games), the predicted record ends up at 98-64.