January 6, 2015

Run scoring trends: using Shiny to create dynamic charts and tables in R

Or, Retracing my steps

As I’ve been learning the functionality of Shiny, the web app for R, I have used the helpful tutorials available from the developers at RStudio. At some point, though, one needs to break out and develop one’s own application.  My Shiny app “MLB run scoring trends” can be found at (https://monkmanmh.shinyapps.io/MLBrunscoring_shiny/).

Note: this app is a work in progress! If you have any thoughts on how it might be improved, please leave a comment.
All of the files associated with this app, including the code, can be found on github.com, at MonkmanMH/MLBrunscoring_shiny.

This Shiny app is a return to my earlier analysis on run scoring trends in Major League Baseball, last seen in my blog post “Major League Baseball run scoring trends with R’s Lahman package”; see the “References” tab in the Shiny app for more). This project gave me the opportunity to update the underlying data, as well as to introduce some of the coding improvements I’ve learned along the way (notably the packages ggplot2 and dplyr.)

Some notable changes in the code:
  • In the original version (starting here), I treated each league separately, starting with subsetting (now, with dplr, filtering) the Lahman “Teams” table on the lgID variable into two separate data frames which were then used to separately generate the two charts. Now, with ggplot2, I have used the faceting to plot the two leagues, and given the reader the option of making that split or not. This is both more flexible from the reader’s point of view, and more efficient code.
  • In my original approach, the trend lines were generated using the loess function, embedded in a discrete object, and then added to the plot as a separately plotted line. By using ggplot2, a LOESS trendline can be quickly added to the plot call with the stat_smooth() option, a much more efficient approach.
  • The stat_smooth() makes it possible to adjust the degree of smoothing of the tend line through changes to the span specification. Originally this was hard-coded, but is now dynamic, controlled in the Shiny app through a slider widget.
  • The stat_smooth() also includes the option of showing a confidence interval. This is achieved through the level specification. For this, I used a set of radio buttons in the Shiny user interface. (I had initially tried a slider, but was not able to specify a set of pre-defined points for the confidence intervals.)
  • The start and end dates of the league plots are also user-controlled through a slider widget. You will notice that the date in the chart title changes along with the range of the plot.
Other things I learned:
  • Radio buttons return factors, even if they look numeric in the ui.r code. In order to get the values that are input by the user to work in the stat_smooth(), I wrapped them in as.numeric().
  • Tables made with dplyr don't render properly in the Shiny environment; the numbers are all there but the sort function generates an error. My solution to this was to wrap the table created by renderDataTable in the server file with as.data.frame.
  • I already knew that I was struggling to keep up with the changes in the R coding environment, but this exercise opened my eyes to even more potential opportunities. The latest version of Shiny ( as of 2015-01-06) has added a lot of new functionality, but I hadn’t realized the degree of integration with other visualization tools. This recent blog entry, “Goodbye static graphs, hello shiny, ggvis, rmarkdown” by Simon Jackman, gives some hints as to where an integrated analytic & reporting environment might go. Exciting stuff, indeed.