Countries with highest confirmed cases of Covid-19 with Plotly

COVID-19 is the disease caused by a new coronavirus called SARS-CoV-2. World Health Organisation (WHO) first learned of the virus on 31 December 2019. The WHO declared the coronavirus outbreak a pandemic in March 2020. This article will show how to create an interactive line chart with plotly, showing the changes in confirmed cases over time.

The data used in this article is retrieved from the Johns Hopkins University who have made te data available on GitHub. More information about COVID-19 and the coronavirus is available from Coronavirus disease (COVID-19) advice for the public.

Plotly is an open-source graphing library for Python that produces interactive charts.



Load the data into a Pandas dataframe

The data is available on John Hopkins GitHub page. The data for the daily deaths from corona virus is in time_series_covid19_deaths_global.csv file. This file can either be downloaded and loaded into a dataframe or it can be loaded directly from GitHub as in the code below. Load and review the data.

 1confirmed_path = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
 2confirmed_df = pd.read_csv(confirmed_path)
 3confirmed_df.shape
 4"""
 5(271, 315)
 6"""
 7
 8confirmed_df.iloc[[0,1,2,-3,-2,-1], [0,1,2,-3,-2,-1]]
 9"""
10    Province/State Country/Region        Lat  11/25/20  11/26/20  11/27/20
110              NaN    Afghanistan  33.939110     45490     45716     45839
121              NaN        Albania  41.153300     34944     35600     36245
132              NaN        Algeria  28.033900     78025     79110     80168
14268            NaN          Yemen  15.552727      2124      2137      2148
15269            NaN         Zambia -13.133897     17535     17553     17569
16270            NaN       Zimbabwe -19.015438      9508      9623      9714
17"""


The data in the Country/Region is further broken down by Province/State for some countries such as China or United Kingdom. A single row is created for each country by grouping the data by Country/Region and summing up the number. the columns 'Province/State', 'Lat', 'Long' are removed from the dataset.

 1# Drop columns 'Province/State', 'Lat', 'Long'
 2confirmed_df.drop(['Province/State', 'Lat', 'Long'], axis = 1, inplace = True)
 3
 4country_df = confirmed_df.groupby('Country/Region').sum()
 5country_df.shape
 6"""
 7(271, 312)
 8"""
 9
10country_df.iloc[[0,1,2,-3,-2,-1], [0,1,2,-3,-2,-1]]
11"""
12    Country/Region  1/22/20  1/23/20  11/25/20  11/26/20  11/27/20
130      Afghanistan        0        0     45490     45716     45839
141          Albania        0        0     34944     35600     36245
152          Algeria        0        0     78025     79110     80168
16268          Yemen        0        0      2124      2137      2148
17269         Zambia        0        0     17535     17553     17569
18270       Zimbabwe        0        0      9508      9623      9714
19"""


Display confirmed cases in top ten countries on the latest date

Sort the data by the values in the last column in descending order. A bar chart is created using Plotly with the top ten values. This displays an interactive chart that displays the value for each country when the bar is hovered over.

 1Top_10 = (country_df
 2          .sort_values(by=country_df.columns[-1], ascending=False)
 3          [[country_df.columns[-1]]]
 4          .head(10))
 5bg_color = 'rgba(208, 225, 242, 1.0)'
 6
 7fig = go.Figure(
 8    data = [go.Bar(x = list(Top_10.iloc[:, 0]),
 9                   y = list(Top_10.index),
10                   hovertemplate = "%{y}: <br><br>Confirmed: %{x:,.0f}<extra></extra>",
11                   orientation = 'h')]
12)
13
14fig.update_layout(
15    # Set default font
16        font = dict(
17        family = "Droid Sans",
18        size = 16
19    ),
20    # Set figure title
21    title = dict(
22        text = "<b>Top ten countries with highest confirmed cases of covid-19</b>",
23        xref = 'container',
24        yref = 'container',
25        x = 0.5,
26        y = 0.91,
27        xanchor = 'center',
28        yanchor = 'middle',
29        font = dict(family = 'Droid Sans', size = 24)
30    ),
31    # set x-axis
32    xaxis = dict(
33        title = dict(
34            text = 'Number of confirmed cases',
35            font = dict(family = 'Droid Sans', size = 18)
36        ),
37        showgrid = False,
38        linecolor = bg_color,
39        linewidth = 2,
40        showticklabels = True,
41    ),
42    # set y-axis
43    yaxis = dict(
44        showgrid = False,
45        linecolor = bg_color,
46        linewidth = 4,
47    ),
48    # set the plot bacground color
49    plot_bgcolor = bg_color,
50    # set the hover background color
51    hoverlabel = dict(
52        bgcolor = 'rgba(75, 152, 201, 0.2)',
53        font_size = 16
54    ),
55    paper_bgcolor = bg_color,
56)
57
58# Add annotation for the latest date
59fig.add_annotation(
60    xref = "paper",
61    yref = "paper",
62    x = 0.9,
63    y = 0.2,
64    text = f"<b>{pd.to_datetime(Top_10.columns[0]).strftime('%b %d %Y')}</b>",
65    showarrow = False,
66    font = dict(family = 'Droid Sans', size = 40))
67
68fig.update_traces(marker_color = 'rgb(75, 152, 201)')
69fig['layout']['yaxis']['autorange'] = "reversed"
70
71fig.show()

Top ten countries with highest confirmed cases of COVID-19



Display confirmed cases in US

Display a line chart showing the changes in the number of confirmed case over time for a specific country. Start with the US as this has highest number of confirmed cases in the world. The index is converted to datetime format so the axis is displayed better.

 1us_data = country_df.loc['US']
 2# Convert the index to datetime format
 3us_data.index = pd.to_datetime(us_data.index)
 4
 5def hov_text(ary_x, ary_y):
 6    '''Create the hover text for each data point
 7
 8    Keyword arguments:
 9    ary_x    -- list of all the x data points
10    ary_y    -- list of all the y data points
11
12    return: A list of hover texts for the selected country
13            with format to display
14    '''
15    txt = [f'''
16<b>{ary_x[i].strftime('%b %d %Y')}</b><br><br>
17Confirmed Cases: <b>{ary_y[i]:,.0F}</b><br>
18<extra></extra>
19''' for i in range(len(ary_x))]
20    return txt
21
22bg_color = 'rgba(208, 225, 242, 1.0)'
23line_color = 'rgba(75, 152, 201, 1.0)'
24grid_color = 'rgba(75, 152, 201, 0.3)'
25
26fig = go.Figure()
27fig.add_trace(go.Scatter(x = us_data.index,
28                         y = us_data,
29                         name = 'US',
30                         hovertemplate = hov_text(us_data.index, us_data)))
31
32fig.update_traces(
33    hoverinfo = 'text+name',
34    mode = 'lines'
35)
36
37fig.update_layout(
38    # Set figure title
39    title = dict(
40        text = f'<b>Confirmed cases of COVID-19 in US</b>',
41        xref = 'container',
42        yref = 'container',
43        x = 0.5,
44        y = 0.9,
45        xanchor = 'center',
46        yanchor = 'middle',
47        font = dict(family = 'Droid Sans', size = 28)
48    ),
49    # set legend
50    legend = dict(
51        orientation = "h",
52        traceorder = 'normal',
53        font_size = 12,
54        x = 0.0,
55        y = -0.3,
56        xanchor = "left",
57        yanchor = "top"
58    ),
59    # set x-axis
60    xaxis = dict(
61        title = 'Date',
62        linecolor = line_color,
63        linewidth = 2,
64        gridcolor = grid_color,
65        showticklabels = True,
66        ticks = 'outside',
67    ),
68    # set y-axis
69    yaxis = dict(
70        title = 'Number of confirmed cases',
71        linecolor = line_color,
72        linewidth = 2,
73        gridcolor = grid_color,
74        showticklabels = True,
75        ticks = 'outside',
76    ),
77    # set the plot bacground color
78    plot_bgcolor = bg_color,
79    paper_bgcolor = bg_color,
80)
81
82fig.show()

Confirmed cases of COVID-19 in US



Create a function to display confirmed cases for any country

The code above can be wrapped up in a function to create a chart for any country specified.

 1def hov_text(ary_x, ary_y):
 2    '''Create the hover text for each data point
 3
 4    Keyword arguments:
 5    ary_x    -- list of all the x data points
 6    ary_y    -- list of all the y data points
 7
 8    return: A list of hover texts for the selected country
 9            with format to display
10    '''
11    txt = [f'''
12<b>{ary_x[i].strftime('%b %d %Y')}</b><br><br>
13Confirmed Cases: <b>{ary_y[i]:,.0F}</b><br>
14<extra></extra>
15''' for i in range(len(ary_x))]
16    return txt
17
18
19def plot_confirmed_cases_for_country(df, country):
20    sel_data = df.loc[country]
21    # Convert the index to datetime format
22    sel_data.index = pd.to_datetime(sel_data.index)
23
24    bg_color = 'rgba(208, 225, 242, 1.0)'
25    line_color = 'rgba(75, 152, 201, 1.0)'
26    grid_color = 'rgba(75, 152, 201, 0.3)'
27
28    fig = go.Figure()
29    fig.add_trace(go.Scatter(x = sel_data.index,
30                             y = sel_data,
31                             name = 'US',
32                             hovertemplate = hov_text(sel_data.index, sel_data)))
33
34    fig.update_traces(
35        hoverinfo = 'text+name',
36        mode = 'lines'
37    )
38
39    fig.update_layout(
40        # Set figure title
41        title = dict(
42            text = f'<b>Confirmed cases of COVID-19 in {country}</b>',
43            xref = 'container',
44            yref = 'container',
45            x = 0.5,
46            y = 0.9,
47            xanchor = 'center',
48            yanchor = 'middle',
49            font = dict(family = 'Droid Sans', size = 28)
50        ),
51        # set legend
52        legend = dict(
53            orientation = "h",
54            traceorder = 'normal',
55            font_size = 12,
56            x = 0.0,
57            y = -0.3,
58            xanchor = "left",
59            yanchor = "top"
60        ),
61        # set x-axis
62        xaxis = dict(
63            title = 'Date',
64            linecolor = line_color,
65            linewidth = 2,
66            gridcolor = grid_color,
67            showticklabels = True,
68            ticks = 'outside',
69        ),
70        # set y-axis
71        yaxis = dict(
72            title = 'Number of confirmed cases',
73            linecolor = line_color,
74            linewidth = 2,
75            gridcolor = grid_color,
76            showticklabels = True,
77            ticks = 'outside',
78        ),
79        # set the plot bacground color
80        plot_bgcolor = bg_color,
81        paper_bgcolor = bg_color,
82    )
83
84    return fig

Call this function with the dataframe and the country name to create the plot.

1fig = plot_confirmed_cases_for_country(country_df, 'China')
2fig.show()

Confirmed cases of COVID-19 in China



Display confirmed cases for top ten countries

Get the ten country names for the countries with the highest confirmed cases on the latest date.

1top_countries = (country_df
2                 .sort_values(by = country_df.columns[-1], ascending = False)
3                 .head(10)
4                 .index)
5
6"""
7['US',   'India',          'Brazil', 'France',    'Russia',
8'Spain', 'United Kingdom', 'Italy',  'Argentina', 'Colombia']
9"""

Transpose the data so that the date is the index and each row represents all the data for that date. Set the index to be datetime format.

 1wide_df = country_df.T
 2wide_df.index = pd.to_datetime(wide_df.index)
 3
 4wide_df.shape
 5"""
 6(311, 191)
 7"""
 8
 9wide_df.iloc[[0,1,2,-3,-2,-1], [0,1,2,-3,-2,-1]]
10"""
11Country/Region  Afghanistan  Albania  Algeria  Yemen  Zambia  Zimbabwe
122020-01-22                0        0        0      0       0         0
132020-01-23                0        0        0      0       0         0
142020-01-24                0        0        0      0       0         0
152020-11-25            45490    34944    78025   2124   17535      9508
162020-11-26            45716    35600    79110   2137   17553      9623
172020-11-27            45839    36245    80168   2148   17569      9714
18"""

The data is extracted for the top ten countries and displayed on a line chart.

 1def hov_text(ary_x, ary_y, country):
 2    '''Create the hover text for each data point
 3
 4    Keyword arguments:
 5    ary_x    -- list of all the x data points
 6    ary_y    -- list of all the y data points
 7    country  -- name of country
 8
 9    return: A list of hover texts for the selected country
10            with format to display
11    '''
12    txt = [f'''
13<b>{country}<b><br><br>
14Date: <b>{ary_x[i].strftime('%b %d %Y')}</b><br>
15Confirmed Cases: <b>{ary_y[i]:,.0F}</b><br>
16<extra></extra>
17''' for i in range(len(ary_x))]
18    return txt
19
20bg_color = 'rgba(208, 225, 242, 1.0)'
21line_color = 'rgba(75, 152, 201, 1.0)'
22grid_color = 'rgba(75, 152, 201, 0.3)'
23
24df = wide_df[top_countries]
25
26fig = go.Figure()
27for c in top_countries:
28    fig.add_trace(go.Scatter(x = df.index,
29                             y = df[c],
30                             name = c,
31                             hovertemplate = hov_text(df.index, df[c], c)))
32
33fig.update_traces(
34    hoverinfo = 'text+name',
35    mode = 'lines'
36)
37
38fig.update_layout(
39    # Set figure title
40    title = dict(
41        text = f'<b>Confirmed cases of COVID-19 in top ten countries</b>',
42        xref = 'container',
43        yref = 'container',
44        x = 0.5,
45        y = 0.9,
46        xanchor = 'center',
47        yanchor = 'middle',
48        font = dict(family = 'Droid Sans', size = 28)
49    ),
50    # set legend
51    legend = dict(
52        orientation = "h",
53        traceorder = 'normal',
54        font_size = 12,
55        x = 0.0,
56        y = -0.3,
57        xanchor = "left",
58        yanchor = "top"
59    ),
60    # set x-axis
61    xaxis = dict(
62        title = 'Date',
63        linecolor = line_color,
64        linewidth = 2,
65        gridcolor = grid_color,
66        showticklabels = True,
67        ticks = 'outside',
68    ),
69    # set y-axis
70    yaxis = dict(
71        title = 'Number of confirmed cases',
72        linecolor = line_color,
73        linewidth = 2,
74        gridcolor = grid_color,
75        showticklabels = True,
76        ticks = 'outside',
77    ),
78    showlegend = True,
79    # set the plot bacground color
80    plot_bgcolor = bg_color,
81    paper_bgcolor = bg_color,
82)
83
84fig.show()

Confirmed cases of COVID-19 for top ten countries



Conclusion

Pandas and Plotly are a powerful combination to load data, clean data and create interactive charts. Johns Hopkins University has done the hard work of collecting the COVID-19 data and making it available. The data can be loaded directly from the csv file in GitHub. Plotly is used to create bar charts and line charts to display information on countries with the highest confirmed cases of COVID-19.