Goodreads Analysis

I just finished my 2017 Reading Challenge on Goodreads. My goal was to read 15 books this year. Poking around the site I discovered that I could export my data. I decided to have a look to see what my reading habits looked like, and since I was doing this for me, I decided to look at my wife’s data too. Dataset library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library(tidyr) library(ggplot2) library(lubridate) ## ## Attaching package: 'lubridate' ## The following object is masked from 'package:base': ## ## date library(hrbrthemes) books <- read. [Read More]

Association between Years at a Private School and Academic Achievement

The kids brought home their yearbooks this past week, and it was time to settle a question that I have had for a long time. Do kids who enter their private school at kindergarten do better or worse than the kids who enter later? Methods At this school there are two honor rolls (Headmaster’s List and Honor Roll). To get Headmaster’s List, a student must achieve a 3.5 grade point average with no grade below a B-. [Read More]

King Tides Citizen Science Project

I read in the newspaper that the University of Hawaii was recruiting “citizen scientists” to help document the impact of this week’s “king tides” on our coastline. I uploaded some photos to the website but also found that we could download the dataset. I took this as an opportunity to learn something about geographic information systems (GIS) and R, and the end result was a pretty nice map of all the places that were photographed this week. [Read More]
GIS 

Pitchers with More Saves than Strikeouts

On my favorite fantasy baseball podcast, the host mentioned the phenomenon of pitchers who had more saves than strikeouts. This seemed like it would be fairly uncommon since it seems that many closers are power pitchers (e.g., Aroldis Chapman). I decided to investigate further using the Lahman database. Methods I used the Lahman database again, which is perfect for answering this kind of questions. It has a Pitching table that contains season long stats for every pitcher from 1871 to 2015 (as of this post). [Read More]

Economical Iced Coffee

My wife and I like iced coffee but drinking it at Starbucks is expensive. I’ve been making it myself at home and it’s a reasonable facsimile at what I think is a much lower price. I decided to run the calculations to see how much we’re saving. Grounds I tried a bunch of different coffees to see what would work the best: whole bean Kirkland (Costco) brand, whole bean Kona coffee, ground Lion coffee (a local brand), ground Folgers, and ground Kirkland. [Read More]

Bank Shot Win Analysis

There’s a carnival game at the Iolani Fair called Bank Shot. You have to roll a ball on a ramp into a hole and win a prize. Get it in the green hole and you win tickets, and if you get it in the red hole, you win a big inflatable prize. I wondered what was the rate of hitting the red and green holes. Here’s a picture of the game layout. [Read More]
games 

AAP Annual Leadership Forum Consent Calendar Resolutions Less Likely to Make Top 10

Last month I went to the American Academy of Pediatrics’ Annual Leadership Forum. This is a meeting of local chapter leaders from across the country where we make resolutions for the AAP to act on. Some resolutions were passed quickly as a group because they were not controversial. This process is called the “consent calendar”. The other resolutions were debated, sometimes at much length, before being voted on individually. At the end of the meeting we came up with the top 10 resolutions that we wanted the AAP to work on this year. [Read More]
AAP 

Effect of Runs and Other Factors on Team Saves in Major League Baseball

From listening to my favorite fantasy baseball podcast it seems that there is a belief that teams that score lots of runs are not likely to produce saves. The idea seems to be that such teams would blow out their competitors and so have less opportunity for games to be saved. While it made sense, I wondered what the data said. library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library(ggplot2) library(Lahman) data("Teams") Teams <- tbl_df(Teams) Exploration of Saves over Time Saves did not become an official statistic until 1969 (source). [Read More]

Graphing Bean Sprout Respiration with the Hadleyverse

My wife’s high school biology students did a lab recently and had difficulty graphing the pooled data. One student allegedly took 2 hours to figure out how to do this in Excel. This seemed like it would be a nice exercise to try in the Hadleyverse. To replicate this plot, I had to take her students’ data in wide form, tidy it up, calculate means for each group and time, and then plot these. [Read More]

Baseball Team Season Save Leaders with the Lowest Totals since 2000

I was listening to my fantasy baseball podcast last week, and the conversation turned to the Phillies closer situation. One host suggested that the Phillies might not have any pitcher with more than 10 saves this season. The other host thought that this would be unlikely. They looked quickly at the data and found that this has happened a few times since 2000. I wondered if there was a way to look this up using the Lahman database in R. [Read More]