I’ve been following COVID cases, deaths, and hospitalizations in Michigan & the US closely since the beginning of the pandemic. There’s lots of data to work with, which provides lots of opportunities for data analysis and visualization. This particular visualization is quite simple, and plots cases and deaths as a function of time. Whats unique about this plot is that it can show how deaths usually follow cases by 3 or so weeks.
The link for the data is scraped using Python, downloaded using R, cleaned/massaged in Python, and visualized in R’s Plotly interface
# Get link for data
from urllib.request import urlopen
from bs4 import BeautifulSoup
# Get data into pandas df
URL = "https://www.michigan.gov/coronavirus/0,9753,7-406-98163_98173---,00.html"
HTML = urlopen(URL).read().decode("utf-8")
start_index = HTML.find("shortdesc")
end_index = HTML.find("footerArea")
data = HTML[start_index:end_index]
soup = BeautifulSoup(data, features="html.parser")
links = [link.get('href') for link in soup.find_all('a')]
finallink = "https://michigan.gov" + \
[i for i in links if "by_Date" in i][0]
# Downlaod data
temp <- tempfile()
download.file(py$finallink, destfile = temp)
mi_data <- readxl::read_excel(temp)
This data set contains cumulative cases, new cases, cumulative deaths, and new deaths for each county for each date in roughly the last two years. For my visualization, I simply want total cases and deaths in the state, so the data is grouped by date and deaths and cases are summed.
# Clean data
mi_data = r.mi_data
agg_data = mi_data.groupby(["Date"], as_index=False).sum()
mi_cases_by_day = py$agg_data
plot_ly(
mi_cases_by_day,
x = ~Date,
y = ~Cases
)
mi_cases_by_day <- mi_cases_by_day %>%
mutate(
cases_ma = rollapply(Cases, 7, mean, align = "center", fill = 0),
deaths_ma = rollapply(Deaths, 7, mean, align = "center", fill = 0)
)
ay <- list(overlaying = "y", side = "right", title = "Deaths")
plot_ly(mi_cases_by_day,x = ~Date) %>%
# Cases
add_trace(y = ~Cases, alpha = .6, name = "Cases", type = "scatter",
color = I("coral1"), mode = 'markers') %>%
# Cases MA
add_lines(y = ~cases_ma, alpha = .8, name = "Cases MA", mode = 'markers',
color = I("coral1")) %>%
# Deaths
add_trace(name = "Deaths", yaxis = "y2", alpha = .15, y = ~Deaths, x = ~Date,
color = I("darkorchid1"), type = "scatter", mode = 'markers') %>%
# Deaths MA
add_lines(name = "Deaths MA", yaxis = "y2", y = ~deaths_ma, x = ~Date,
color = I("darkorchid1"), alpha = .8/4, mode = 'markers') %>%
layout(
title = "Michigan COVID Cases/Deaths<br>With 7-day Moving Average",
yaxis2 = ay, legend = list(y = .97, x = .6, bgcolor = 'rgba(0,0,0,0)'),
margin = list(r = 50, t = 50)
) %>%
rangeslider()
Storytelling/comparison drove what I was after. Here, I’m trying to show how COVID deaths follow COVID cases, and choices made in this visualization are made to show that comparison.
Nothing. I wish I could have made cases and deaths more distinguishable on the plot, but plotly created lots of frustrations in differentiating the two, while keeping the line and point color for each category the same.
As mentioned before, making more clear the difference between deaths and cases would have been nice.