Add data with a secondary y axis

Add data with a secondary y axis

Display two sets of data on the same chart when the data ranges are different, such as the confirmed cases of COVID-19 and deaths from COVID-19. COVID-19 is the disease caused by a new coronavirus (SARS-CoV-2) that the World Health Organisation (WHO) declared a pandemic in March 2020. This article will show how to create an interactive line chart with plotly, showing the changes in confirmed cases and deaths over time.

The data used in this article is retrieved from the Johns Hopkins University who have made the data available on GitHub. More information about COVID-19 and the coronavirus is available from Coronavirus disease (COVID-19) advice for the public. Plotly is an open-source graphing library for Python that produces interactive charts.



Load the data into a Pandas dataframe

The data is available on John Hopkins GitHub page. There are two sets of data to load the global confirmed cases and the global deaths.

  • time_series_covid19_deaths_global.csv
  • time_series_covid19_confirmed_global.csv

The data can be loaded directly from the GitHub page as in the code below. The data is cleaned to remove unwanted fields and group the data by country with the following function.

 1def load_clean_data(csv_path):
 2    df = pd.read_csv(csv_path)
 3
 4    # 1. Drop unwanted columns
 5    df.drop(['Province/State', 'Lat', 'Long'], axis = 1, inplace = True)
 6
 7    # 2. Group by Country
 8    df = df.groupby('Country/Region').sum()
 9
10    # 3. Transpose data to put dates in a single column
11    df = df.T
12    df.index = pd.to_datetime(df.index)
13
14    # 4. Remove the column name
15    df.rename_axis(None, axis = 1, inplace = True)
16
17    return df

Load the data for Confirmed cases and the data for Deaths from COVID-19.

 1
 2confirmed_df = load_clean_data('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
 3confirmed_df.shape
 4"""
 5(314, 191)
 6"""
 7
 8confirmed_df.iloc[[0,1,2,-3,-2,-1], [0,1,2,-3,-2,-1]]
 9"""
10            Afghanistan  Albania  Algeria  Yemen  Zambia  Zimbabwe
112020-01-22            0        0        0      0       0         0
122020-01-23            0        0        0      0       0         0
132020-01-24            0        0        0      0       0         0
142020-11-28        45966    36790    81212   2160   17589      9822
152020-11-29        46215    37625    82221   2177   17608      9822
162020-11-30        46498    38182    83199   2191   17647      9950
17"""
18
19
20deaths_df = load_clean_data('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
21deaths_df.shape
22"""
23(314, 191)
24"""
25
26deaths_df.iloc[[0,1,2,-3,-2,-1], [0,1,2,-3,-2,-1]]
27"""
28            Afghanistan  Albania  Algeria  Yemen  Zambia  Zimbabwe
292020-01-22            0        0        0      0       0         0
302020-01-23            0        0        0      0       0         0
312020-01-24            0        0        0      0       0         0
322020-11-28         1752      787     2393    615     357       275
332020-11-29         1763      798     2410    617     357       275
342020-11-30         1774      810     2431    619     357       276
35"""


Global confirmed cases of COVID-19

Show the total global number of confirmed cases of COVID-19 as the numbers changes over time. Plotly is used to display an interactive line chart. An annotation is added to show the current total number on the latest date.

 1def hov_text(ary_x, ary_y, name):
 2    '''Create the hover text for each data point
 3
 4    Keyword arguments:
 5    ary_x    -- list of all the x data points
 6    ary_y    -- list of all the y data points
 7    name     -- name to display before each value
 8
 9    return: A list of hover texts for the selected data
10            with format to display
11    '''
12    txt = [f'''
13<b>{ary_x[i].strftime('%B %d %Y')}</b><br><br>
14{name}: <b>{ary_y[i]:,.0F}</b><br>
15<extra></extra>
16''' for i in range(len(ary_x))]
17    return txt
 1total_confirmed = confirmed_df.sum(axis = 1)
 2
 3bg_color = 'rgba(208, 225, 242, 1.0)'
 4line_color = 'rgba(75, 152, 201, 1.0)'
 5grid_color = 'rgba(75, 152, 201, 0.3)'
 6
 7fig = go.Figure()
 8fig.add_trace(go.Scatter(x = total_confirmed.index,
 9                         y = total_confirmed,
10                         name = 'Confirmed',
11                         hovertemplate = hov_text(total_confirmed.index,
12                                                  total_confirmed,
13                                                  'Confirmed Cases')))
14
15fig.update_traces(
16    hoverinfo = 'text+name',
17    mode = 'lines'
18)
19
20fig.update_layout(
21    # Set figure title
22    title = dict(
23        text = f'<b>Global confirmed cases from COVID-19</b>',
24        xref = 'container',
25        yref = 'container',
26        x = 0.5,
27        y = 0.9,
28        xanchor = 'center',
29        yanchor = 'middle',
30        font = dict(family = 'Droid Sans', size = 28)
31    ),
32    # set legend
33    legend = dict(
34        orientation = 'h',
35        traceorder = 'normal',
36        font_size = 12,
37        x = 0.0,
38        y = -0.3,
39        xanchor = 'left',
40        yanchor = 'top'
41    ),
42    # set x-axis
43    xaxis = dict(
44        title = 'Date',
45        linecolor = line_color,
46        linewidth = 2,
47        gridcolor = grid_color,
48        showticklabels = True,
49        ticks = 'outside',
50    ),
51    # set y-axis
52    yaxis = dict(
53        title = 'Number of confirmed cases',
54        rangemode = 'tozero',
55        linecolor = line_color,
56        linewidth = 2,
57        gridcolor = grid_color,
58        showticklabels = True,
59        ticks = 'outside',
60    ),
61    showlegend = False,
62    # set the plot bacground color
63    plot_bgcolor = bg_color,
64    paper_bgcolor = bg_color,
65)
66
67# Add text for current date and total
68fig.add_annotation(
69    text = f"{total_confirmed.index[-1].strftime('%B %d')}<BR> {total_confirmed[-1]:,.0F}",
70    xref = 'paper',
71    yref = 'paper',
72    x = 0.5,
73    y = 1.0,
74    showarrow = False,
75    bgcolor = 'rgba(180, 210, 233, 1.0)',
76    borderpad = 10,
77    font = dict(family = 'Droid Sans', size = 28)
78)
79
80fig.show()

Global confirmed cases from COVID-19



Global deaths from COVID-19

Show the total global number of deaths from COVID-19 as the numbers changes over time. The code is similar to the last chart except the data source and titles.

 1total_deaths = deaths_df.sum(axis = 1)
 2
 3bg_color = 'rgba(208, 225, 242, 1.0)'
 4line_color = 'rgba(75, 152, 201, 1.0)'
 5grid_color = 'rgba(75, 152, 201, 0.3)'
 6
 7fig = go.Figure()
 8fig.add_trace(go.Scatter(x = total_deaths.index,
 9                         y = total_deaths,
10                         name = 'Deaths',
11                         hovertemplate = hov_text(total_deaths.index,
12                                                  total_deaths,
13                                                 'Deaths')))
14
15fig.update_traces(
16    hoverinfo = 'text+name',
17    mode = 'lines'
18)
19
20fig.update_layout(
21    # Set figure title
22    title = dict(
23        text = f'<b>Global deaths from COVID-19</b>',
24        xref = 'container',
25        yref = 'container',
26        x = 0.5,
27        y = 0.9,
28        xanchor = 'center',
29        yanchor = 'middle',
30        font = dict(family = 'Droid Sans', size = 28)
31    ),
32    # set legend
33    legend = dict(
34        orientation = 'h',
35        traceorder = 'normal',
36        font_size = 12,
37        x = 0.0,
38        y = -0.3,
39        xanchor = 'left',
40        yanchor = 'top'
41    ),
42    # set x-axis
43    xaxis = dict(
44        title = 'Date',
45        linecolor = line_color,
46        linewidth = 2,
47        gridcolor = grid_color,
48        showticklabels = True,
49        ticks = 'outside',
50    ),
51    # set y-axis
52    yaxis = dict(
53        title = 'Number of deaths',
54        rangemode = 'tozero',
55        linecolor = line_color,
56        linewidth = 2,
57        gridcolor = grid_color,
58        showticklabels = True,
59        ticks = 'outside',
60    ),
61    showlegend = False,
62    # set the plot bacground color
63    plot_bgcolor = bg_color,
64    paper_bgcolor = bg_color,
65)
66
67# Add text for current date and total
68fig.add_annotation(
69    text = f"{total_deaths.index[-1].strftime('%B %d')}<BR> {total_deaths[-1]:,.0F}",
70    xref = 'paper',
71    yref = 'paper',
72    x = 0.5,
73    y = 1.0,
74    showarrow = False,
75    bgcolor = 'rgba(180, 210, 233, 1.0)',
76    borderpad = 10,
77    font = dict(family = 'Droid Sans', size = 28)
78)
79
80fig.show()

Global deaths from COVID-19



Show Confirmed cases and deaths from COVID-19 on a single graph

Just showing the data for confirmed cases and deaths on the same chart does not work well as there is such a difference in scale between the number ranges. The number of deaths looks flat relative to the total number of cases, but these numbers are growing.

Global deaths from COVID-19

One solution is to plot the Deaths data using a second y axis so that each set of data will use the full vertical space available. These y axes are also marked with color associated with the data that uses the axis.

The code to generate this chart is wrapped up in a function that takes the two dataframes and the chart title and returns a plotly graph figure.

 1def plot_confirmed_and_deaths(df1, df2, region):
 2    bg_color = 'rgba(208, 225, 242, 1.0)'
 3    line_color = 'rgba(75, 152, 201, 1.0)'
 4    grid_color = 'rgba(75, 152, 201, 0.3)'
 5
 6    # Linechart with secondary y axis
 7    fig = make_subplots(specs=[[{"secondary_y": True}]])
 8
 9    fig.add_trace(
10        go.Scatter(
11            x = df1.index,
12            y = df1,
13            name = 'Confirmed Cases',
14            hovertemplate = hov_text(df1.index,
15                                     df1,
16                                     'Confirmed Cases')),
17        secondary_y = False)
18
19    fig.add_trace(
20        go.Scatter(
21            x = df2.index,
22            y = df2,
23            name = 'Deaths',
24            line = dict(dash = 'dashdot'),
25            hovertemplate = hov_text(df2.index,
26                                     df2,
27                                     'Deaths')),
28        secondary_y = True)
29
30    fig.update_layout(
31        # Set figure title
32        title = dict(
33            text = f'Confirmed cases and deaths from COVID-19<BR><b>{region}</b>',
34            xref = 'container',
35            yref = 'container',
36            x = 0.5,
37            y = 0.9,
38            xanchor = 'center',
39            yanchor = 'middle',
40            font = dict(family = 'Droid Sans', size = 28)
41        ),
42        # set legend
43        legend = dict(
44            orientation = 'v',
45            traceorder = 'normal',
46            font_size = 12,
47            x = 0.1,
48            y = 0.9,
49            xanchor = "left",
50            yanchor = "top"
51        ),
52        # set x-axis
53        xaxis = dict(
54            title = 'Date',
55            linecolor = line_color,
56            linewidth = 2,
57            gridcolor = grid_color,
58            showticklabels = True,
59            ticks = 'outside',
60        ),
61        # set y-axis
62        yaxis = dict(
63            title = 'Number of Confirmed Cases',
64            color = 'rgba(80, 80, 250, 1.0)',
65            rangemode = 'tozero',
66            linecolor = 'rgba(80, 80, 250, 1.0)',
67            linewidth = 2,
68            gridcolor = 'rgba(80, 80, 250, 0.3)',
69            showticklabels = True,
70            ticks = 'outside'
71        ),
72        # set the plot bacground color
73        plot_bgcolor = bg_color,
74        paper_bgcolor = bg_color,
75    )
76
77    # set the secondary y-axis
78    fig.update_yaxes(
79        title_text = 'Number of Deaths',
80        color = 'rgba(237, 34, 13, 1.0)',
81        range = [0, (df2[-1] * 1.08)],
82        linecolor = 'rgba(237, 34, 13, 1.0)',
83        linewidth = 2,
84        gridcolor = 'rgba(237, 34, 13, 0.3)',
85        showticklabels = True,
86        ticks = 'outside',
87        secondary_y = True
88    )
89
90    return fig

This plot_confirmed_and_deaths function is used to create the line plot showing both confirmed cases and deaths from COVID-19 globally.

1df1 = confirmed_df.sum(axis = 1)
2df2 = deaths_df.sum(axis = 1)
3
4f = plot_confirmed_and_deaths(df1, df2, 'Global')
5f.show()

Global deaths from COVID-19



Display data for selected country

The same function can be used to create a chart for any country.

1country = 'US'
2f = plot_confirmed_and_deaths(
3    confirmed_df[country],
4    deaths_df[country],
5    country)
6f.show()

Global deaths from COVID-19 in United States


Global deaths from COVID-19 in United Kingdom


Global deaths from COVID-19 in Ireland


Global deaths from COVID-19 in Brazil



Conclusion

Pandas dataframe is great for loading data from csv files and filtering the data to an area of interest quickly. Plotly is great to create interactive charts. It is informative to see the rise in the number of deaths from COVID-19 in relation to the rise in the total number of cases detected. This relationship is easier to see when the data can be shown on a single graph. A secondary y-axis is used to display the deaths as the scales are so different. All of the loading, cleaning and creation of the chart can be wrapped up in a couple of functions to create a pipeline to generate these charts for countries of interest. The interactive charts allows the viewer to zoom in on a particular date range and select or deselect data to focus on an area of interest.