Last summer I started running again after about 8 years off. I had run from my sophomore year of high school until the fall of 2008 when it seemed like I had a meniscus injury and I stopped running. I had run for 17 years, during which the longest break I had taken was maybe a few months. After I stopped, I always missed it, and when my son started training for a Scouting running achievement, I decided I would keep him company. Lo and behold my knee felt ok, so I figured I would keep going. I’ve been using the Runkeeper app to track my runs and decided to look at what the data showed.
suppressPackageStartupMessages(library(dplyr))
library(ggplot2)
suppressPackageStartupMessages(library(lubridate))
I downloaded the data from the Runkeeper website. It came as a zip file containing all my GPS tracks plus a processed csv file.
dat <- tbl_df(read.csv("../datasets/cardioActivities.csv"))
glimpse(dat)
## Observations: 89
## Variables: 14
## $ Activity.Id <fct> 4ff0c9dd-9713-4a29-98e2-8112c66e2d5c,...
## $ Date <fct> 2018-06-01 16:45:49, 2018-05-30 06:31...
## $ Type <fct> Running, Running, Rowing, Running, Cy...
## $ Route.Name <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ Distance..mi. <dbl> 10.00, 5.97, 3.82, 11.39, 8.61, 5.02,...
## $ Duration <fct> 1:32:29, 54:27, 30:08, 1:50:01, 36:10...
## $ Average.Pace <fct> 9:15, 9:08, 7:53, 9:40, 4:12, 9:30, 8...
## $ Average.Speed..mph. <dbl> 6.49, 6.57, 7.61, 6.21, 14.28, 6.32, ...
## $ Calories.Burned <dbl> 1334.0000, 801.0000, 369.6282, 1509.0...
## $ Climb..ft. <int> 142, 125, 0, 96, 0, 160, 59, 8, 160, ...
## $ Average.Heart.Rate..bpm. <int> NA, NA, NA, NA, NA, NA, 167, 152, 146...
## $ Friend.s.Tagged <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ Notes <fct> , , , , , , Time trial goal 10k in 54...
## $ GPX.File <fct> 2018-06-01-164549.gpx, 2018-05-30-063...
dat$Date <- ymd_hms(dat$Date)
Weekly Mileage
There was a lot of data there but I thought the first thing to start with would be my weekly mileage. I wanted to see how that changed over the last year.
dat %>%
# filter(Date < ymd(20180601)) %>%
filter(Type == "Running") %>%
mutate(Week_obs = week(Date), Year_obs = year(Date)) %>%
group_by(Year_obs, Week_obs) %>%
summarize(Mileage = sum(Distance..mi.)) %>%
ggplot(aes(x = Week_obs, y = Mileage)) +
facet_grid( ~ factor(Year_obs)) +
geom_point() +
labs(title = "Weekly Mileage",
x = "Week",
y = "Miles")
Monthly Mileage
In What I Talk about When I Talk about Running by Haruki Murakami, he talks about his monthly mileage, averaging 136-186 miles per month. My monthly averages weren’t as impressive as his.
dat %>%
filter(Date < ymd("20180601")) %>%
filter(Type == "Running") %>%
mutate(Month_obs = month(Date), Year_obs = year(Date)) %>%
group_by(Year_obs, Month_obs) %>%
summarize(Mileage = sum(Distance..mi.)) %>%
ggplot(aes(x = Month_obs, y = Mileage)) +
facet_grid( ~ factor(Year_obs)) +
geom_point() +
labs(title = "Monthly Mileage",
x = "Month",
y = "Miles")
Calories Burned
I tracked my weekly calories burned, expecting that these would be highly correlated with mileage.
dat %>%
filter(Type == "Running") %>%
mutate(Week_obs = week(Date), Year_obs = year(Date)) %>%
group_by(Year_obs, Week_obs) %>%
summarize(Calories = sum(Calories.Burned)) %>%
ggplot(aes(x = Week_obs, y = Calories)) +
facet_grid( ~ factor(Year_obs)) +
geom_point() +
labs(title = "Weekly Calories",
x = "Week",
y = "Total Calories")
Correlation between Calories and Miles
The app estimated calories per run. There was a linear correlation between calories and miles run, as displayed below.
dat %>% filter(Type == "Running") %>%
ggplot(aes(x = Distance..mi., y = Calories.Burned)) + geom_point() + geom_smooth(method = "lm") +
labs(title = "Correlation between Calories and Miles Run",
x = "Miles",
y = "Calories")
This was a very linear relationship, even at the very extremes of the distances run. I fit a linear model and determined that a rough estimate of the energy expenditure rate was 136 calories per mile.
x <- filter(dat, Type == "Running")
model1 <- lm(x$Calories.Burned ~ x$Distance..mi.)
summary(model1)
##
## Call:
## lm(formula = x$Calories.Burned ~ x$Distance..mi.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.844 -5.463 -1.716 4.585 44.856
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.6690 4.3352 -0.846 0.4
## x$Distance..mi. 136.1300 0.7617 178.715 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.91 on 78 degrees of freedom
## Multiple R-squared: 0.9976, Adjusted R-squared: 0.9975
## F-statistic: 3.194e+04 on 1 and 78 DF, p-value: < 2.2e-16