I was reviewing the coronavirus dashboard for the state of Hawaii. It has mobility data from anonymized cell phone records from Descartes Labs. The chart was so small that I couldn’t really read it, so I went to the source cited to see if I could see this better. At the source, they cite a Hawaii data project that looks at this data but something about the Hawaii data site isn’t working that great for the mobility visualization. I decided to see if I could write a script to visualize the Hawaii mobility data for myself.
library(tidyverse)
## ── Attaching packages ────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
I downloaded the mobility data directly from Github.
mobility <- readr::read_csv('https://raw.githubusercontent.com/descarteslabs/DL-COVID-19/master/DL-us-mobility-daterow.csv')
## Parsed with column specification:
## cols(
## date = col_date(format = ""),
## country_code = col_character(),
## admin_level = col_double(),
## admin1 = col_character(),
## admin2 = col_character(),
## fips = col_character(),
## samples = col_double(),
## m50 = col_double(),
## m50_index = col_double()
## )
timestamp()
## ##------ Mon Sep 7 14:10:12 2020 ------##
After a little finessing, I was able to create the following curve. There are a lot of little peaks and valleys in the curve.
mobility %>% filter(admin1 == "Hawaii") %>%
mutate(admin2 = replace_na(admin2, "State Overall")) %>%
ggplot(aes(x= date, y = m50_index, color = admin2)) +
geom_line()
To help make these a little smoother, I took the 7 day average which gives a bit more distinction between the lines.
mobility %>% filter(admin1 == "Hawaii") %>%
mutate(admin2 = replace_na(admin2, "State Overall")) %>%
group_by(admin2) %>%
mutate(rolling_mean_7d = rollmean(m50_index, 7, align = 'right', fill = NA)) %>%
ggplot(aes(x= date, y = rolling_mean_7d, color = admin2)) +
geom_line()
## Warning: Removed 30 row(s) containing missing values (geom_path).
For comparison to the number of cases per day, I pulled the NY Times data that I used in my first coronavirus post.
cases <- readr::read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
## Parsed with column specification:
## cols(
## date = col_date(format = ""),
## state = col_character(),
## fips = col_character(),
## cases = col_double(),
## deaths = col_double()
## )
I calculated the rolling 7 day mean as before.
cases %>% filter(state == "Hawaii") %>%
mutate(cases_yesterday = lag(cases)) %>%
mutate(new_cases = cases - cases_yesterday) %>%
mutate(rolling_mean_7d = rollmean(new_cases, 7, align = 'right', fill = NA))
## # A tibble: 185 x 8
## date state fips cases deaths cases_yesterday new_cases rolling_mean_7d
## <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2020-03-06 Hawa… 15 1 0 NA NA NA
## 2 2020-03-07 Hawa… 15 1 0 1 0 NA
## 3 2020-03-08 Hawa… 15 2 0 1 1 NA
## 4 2020-03-09 Hawa… 15 2 0 2 0 NA
## 5 2020-03-10 Hawa… 15 2 0 2 0 NA
## 6 2020-03-11 Hawa… 15 2 0 2 0 NA
## 7 2020-03-12 Hawa… 15 2 0 2 0 NA
## 8 2020-03-13 Hawa… 15 2 0 2 0 0.143
## 9 2020-03-14 Hawa… 15 4 0 2 2 0.429
## 10 2020-03-15 Hawa… 15 7 0 4 3 0.714
## # … with 175 more rows
Combining the Data
I was going to put these on the same graph, but after reading about why this can be misleading, I decided not to do it. To make it easier to compare, here are the two graphs in close proximity for visual comparison.
mobility %>% filter(fips == 15) %>%
mutate(rolling_mean_7d = rollmean(m50_index, 7, align = 'right', fill = NA)) %>%
ggplot(aes(x= date, y = rolling_mean_7d)) +
geom_line() +
ggtitle("Mobility as Percentage of Pre-COVID Baseline") +
ylab("Percentage of baseline, rolling 7 day mean")
## Warning: Removed 6 row(s) containing missing values (geom_path).
cases %>% filter(state == "Hawaii") %>%
mutate(cases_yesterday = lag(cases)) %>%
mutate(new_cases = cases - cases_yesterday) %>%
mutate(rolling_mean_7d = rollmean(new_cases, 7, align = 'right', fill = NA)) %>%
ggplot(aes(x = date, y = rolling_mean_7d)) + geom_line() +
ggtitle("New COVID-19 Cases, Rolling 7 Day Mean") +
ylab("New daily cases")
## Warning: Removed 7 row(s) containing missing values (geom_path).
It’s still a little hard to tell, but based on this, it seems that mobility is associated with an increase in infections, with the recent mobility declines being associated with a stable or falling new case count.
Licenses
Mobility data are licensed under a Creative Commons Attribution 4.0 International License, which requires attribution to “Descartes Labs.”
Cases data are from NY Times.