Coronavirus is all anyone can think about these days, and the New York Times has become a repository for USA case data for some reason. They have been publishing the data on Github for now.
Data Loading
I loaded the data directly from the NY Times repository. The repository has county and state lists but because I was interested in the different counties in Hawaii, I used the county file.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(scales)
covid <- readr::read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')
## Parsed with column specification:
## cols(
## date = col_date(format = ""),
## county = col_character(),
## state = col_character(),
## fips = col_character(),
## cases = col_double(),
## deaths = col_double()
## )
timestamp()
## ##------ Tue Apr 14 19:25:55 2020 ------##
Data pull time
Because the repository keeps getting updated but this post reflects the situation on 4/4/2020, I filtered all the data after this for 4/4/2020 and earlier.
glimpse(covid)
## Observations: 56,541
## Variables: 6
## $ date <date> 2020-01-21, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-24, 2…
## $ county <chr> "Snohomish", "Snohomish", "Snohomish", "Cook", "Snohomish", "O…
## $ state <chr> "Washington", "Washington", "Washington", "Illinois", "Washing…
## $ fips <chr> "53061", "53061", "53061", "17031", "53061", "06059", "17031",…
## $ cases <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
covid$county <- factor(covid$county)
covid$state <- factor(covid$state)
covid$fips <- as.integer(covid$fips)
Total Cases in USA as of 2020-04-04
Here is the total nationwide cases as of 2020-04-04.
x <- covid %>% filter(date == "2020-04-04") %>% select(cases)
sum(x$cases)
## [1] 310842
Total Cases by State
covid %>% filter(date == "2020-04-04") %>%
group_by(state) %>%
summarize(total = sum(cases)) %>%
arrange(-total)
## # A tibble: 55 x 2
## state total
## <fct> <dbl>
## 1 New York 114996
## 2 New Jersey 34124
## 3 Michigan 14225
## 4 California 13796
## 5 Louisiana 12492
## 6 Massachusetts 11736
## 7 Florida 11537
## 8 Illinois 10358
## 9 Pennsylvania 10110
## 10 Washington 6788
## # … with 45 more rows
covid %>% filter(date == "2020-04-04") %>%
group_by(state) %>%
summarize(total = sum(cases)) %>%
filter(total > median(total)) %>%
arrange(-total) %>%
ggplot(aes(x = reorder(state, total), y = total)) +
geom_bar(stat = 'identity') +
scale_y_continuous(trans=log10_trans()) +
coord_flip()

covid %>% filter(date == "2020-04-04") %>%
group_by(state) %>%
summarize(total = sum(cases)) %>%
filter(total < median(total)) %>%
arrange(-total) %>%
ggplot(aes(x = reorder(state, total), y = total)) +
geom_bar(stat = 'identity') +
scale_y_continuous(trans=log10_trans()) +
coord_flip()

Total Cases in Hawaii
In Hawaii, the number of cases has been growing steadily.
covid %>% filter(state == "Hawaii", date <= "2020-04-04") %>%
group_by(date) %>%
summarize(total = sum(cases)) %>%
arrange(-total)
## # A tibble: 30 x 2
## date total
## <date> <dbl>
## 1 2020-04-04 349
## 2 2020-04-03 317
## 3 2020-04-02 283
## 4 2020-04-01 256
## 5 2020-03-31 224
## 6 2020-03-30 199
## 7 2020-03-29 173
## 8 2020-03-28 150
## 9 2020-03-27 120
## 10 2020-03-26 106
## # … with 20 more rows
covid %>% filter(state == "Hawaii", date <= "2020-04-04") %>%
group_by(date) %>%
summarize(total = sum(cases)) %>%
ggplot(aes(x = date, y = total)) +
geom_point() +
geom_line()

Cases by County in Hawaii
Honolulu county has the most cases by far (as it has by far the largest population).
covid %>% filter(state == "Hawaii", date <= "2020-04-04") %>%
ggplot(aes(x = date, y = cases, color = county)) +
geom_point() +
geom_line()

Cases by County other than Honolulu
To see the trend for the other counties, it is best to remove Honolulu County.
covid %>% filter(state == "Hawaii" & county != "Honolulu", date <= "2020-04-04") %>%
ggplot(aes(x = date, y = cases, color = county)) +
geom_point() +
geom_line()

Change per Day
Here’s the change in our state per day. The growth rate in positive tests may still be accelerating.
covid %>% filter(state == "Hawaii", date <= "2020-04-04") %>%
group_by(date) %>%
summarize(total = sum(cases)) %>% mutate(day_before_total = lag(total)) %>%
mutate(change = total - day_before_total) %>%
filter(!is.na(change)) %>%
ggplot(aes(x = date, y = change)) +
geom_point() + geom_line() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
