I’ve been happy with Hugo to publish my blog, and I thought it would be cool to use it to revamp my personal webpage at the University of Hawaii. It turned out it wasn’t as easy as I thought it would be.
I created a new site using blogdown and the hugo_identity_theme.
new_site(theme = "aerohub/hugo-identity-theme")) After playing around with the config.toml, I was ready to push the page to the university server.
[Read More]
Birthday Problem Part 2
My family went on a cruise this month for 10 days, and in the dining room, it seemed that the waiters sang Happy Birthday to someone every night. The room was pretty full of people, but it struck me that some people might be lying to get the waiters to sing to them. This called for an analysis. What was the probability of a roomful of people having birthdays on each of 10 days in a row?
[Read More]
Exploring Disabled List Data from Major League Baseball
I was interested in the duration of certain injuries for baseball players. It’s difficult to get data on this although I discovered a series of Google Sheets that documented disabled list (DL) stays over the years. I combined these sheets and cleaned these up manually. I entered these into R and crunched the data.
Methods I downloaded the data from Google and used the tidyverse tools to look at them.
[Read More]
Text Analysis of Security Now: tf-idf
In the Security Now podcast, the hosts Steve Gibson and Leo Laporte often cite episodes that explain certain basic topics. They can never remember which episode covered a certain topic (e.g., how TLS works). As a follow-up to my previous post about text mining with the tidytext package, I decided to use see if I could make it easier to create an index for the series using term frequency and inverse document frequency analysis (tf-idf).
[Read More]
Interisland vs. Overseas Electric Car Parking
The last time I went to the mainland I noticed that it seemed like there were a lot more electric cars parked in the overseas terminal parking lot than the interisland one. I wondered if it was a fluke or if there was a real difference. I collected some data to investigate!
Methods I had two trips back to back, my weekly interisland trip for work and my annual district meeting for my professional society.
[Read More]
Using Bookdown to create a Security Now Ebook
Recently I have been exploring the blogdown package to redo my website using Hugo, and it resurrected an old idea that I had. Security Now, one of my favorite podcasts, has transcriptions of each episode, and I thought it would be neat to put them into an e-book and read the old episodes that way. I tried using Calibre, but while it did work, I wasn’t able to figure out how to format it nicely.
[Read More]
AJ Mass’s Rule of Sevens
On the baseball podcast I listen to, guest host AJ Mass proposed that a rule of thumb for a top 20 pitcher would be someone with an ERA < 3.5 and a K/BB ratio of > 3.5 (3.5 + 3.5 = 7). I wondered if this was actually the case (i.e., how many pitchers achieve this mark each season).
Methods I turned as usual to the Lahman database and the tidyverse packages.
[Read More]
Converting from Jekyll to Hugo
I’ve decided to convert this website from Github Pages/Jekyll to Hugo. Th inspiration for this came when I saw the website that Roger Peng put up for his live data analysis screencasts, and there was a big Hugo logo on the bottom. After learning more about this and discovering the blogdown package by Yihui Xie that is based on Hugo, I decided to take the leap in converting my Jekyll Github Pages site to Hugo (hosted on Github Pages).
[Read More]
Text Analysis of Security Now
I wanted to try my hand at some text mining using the tidytext package, and I was wondering what I could look at. I love the podcast Security Now! by Steve Gibson and Leo Laporte, and I’ve always wondered if there was a way to build a good index of their podcast.
I used the tidyverse packages and tidytext by David Robinson. He and Julia Silge wrote the book from which most of these methods came from.
[Read More]
Justin Smoak and BABIP Trends
The podcast I listen to said something like, “I bet Justin Smoak must have one of the lowest BABIPs of anyone to this point in his career.” I thought it might be a good chance to explore this nebulous BABIP a little, what are the historical league and career trends in BABIP. Maybe 3 sentences are needed.
Or maybe a second paragraph. Or 2.
library(Lahman) library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library(ggplot2) library(hrbrthemes) BABIP is calculated as balls in play (hits minus home runs) divided by total at bats that caused these balls in play (at bats minus strikeouts minus home runs plus sacrifice flies).
[Read More]