Bar chart race showing changes in Covid-19 deaths

Bar chart race showing changes in Covid-19 deaths

COVID-19 is the disease caused by a new coronavirus called SARS-CoV-2. World Health Organisation (WHO) first learned of the virus on 31 December 2019. The WHO declared the coronavirus outbreak a pandemic in March 2020. This article will show how to create a bar chart race depicting the countries with the highest number of deaths from coronavirus as they change from day to day.

The data used in this article is retrieved from the Johns Hopkins University who have made the data available on GitHub. More information about COVID-19 and the coronavirus is available from Coronavirus disease (COVID-19) advice for the public.



Retrieve the data and load into dataframe

The data is available on John Hopkins GitHub page. The data for the daily deaths from corona virus is in time_series_covid19_deaths_global.csv file. This file can either be downloaded and loaded into a dataframe or it can be loaded directly from GitHub as in the code below. Load and review the data.

 1# raw csv files from Github
 2deaths_path = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
 3deaths_df = pd.read_csv(deaths_path)
 4
 5deaths_df.shape
 6"""
 7(312, 171)
 8"""
 9
10deaths_df.iloc[[0,1,2,3,-4,-3,-2,-1], [0,1,2,3,4,5,6,-3,-2,-1]]
11"""
12    Province/State  Country/Region        Lat       Long  1/22/20  1/23/20  1/24/20  11/23/20  11/24/20  11/25/20  
130              NaN     Afghanistan  33.939110  67.709953        0        0        0      1695      1712      1725  
141              NaN         Albania  41.153300  20.168300        0        0        0       716       735       743  
152              NaN         Algeria  28.033900   1.659600        0        0        0      2294      2309      2329  
163              NaN         Andorra  42.506300   1.521800        0        0        0        76        76        76  
17267            NaN  Western Sahara  24.215500 -12.885800        0        0        0         1         1         1  
18268            NaN           Yemen  15.552727  48.516388        0        0        0       609       609       611  
19269            NaN          Zambia -13.133897  27.849332        0        0        0       357       357       357  
20270            NaN        Zimbabwe -19.015438  29.154857        0        0        0       273       274       274  
21"""


Group the data by Country/Region

The data in the Country/Region is further broken down by Province/State for some countries such as China or United Kingdom. The data is grouped by Country/Region summing up the number to provide a single total for each country.

 1deaths_df[deaths_df['Country/Region'] == 'China'].head()
 2
 3"""
 4   Province/State Country/Region      Lat      Long  1/22/20  1/23/20  1/24/20  1/25/20  1/26/20  1/27/20  ...  11/16/20  11/17/20  11/18/20  11/19/20  11/20/20  11/21/20  11/22/20  11/23/20  11/24/20  11/25/20  
 558          Anhui          China  31.8257  117.2264        0        0        0        0        0        0  ...         6         6         6         6         6         6         6         6         6         6  
 659        Beijing          China  40.1824  116.4142        0        0        0        0        0        1  ...         9         9         9         9         9         9         9         9         9         9  
 760      Chongqing          China  30.0572  107.8740        0        0        0        0        0        0  ...         6         6         6         6         6         6         6         6         6         6  
 861         Fujian          China  26.0789  117.9874        0        0        0        0        0        0  ...         1         1         1         1         1         1         1         1         1         1  
 962          Gansu          China  35.7518  104.2861        0        0        0        0        0        0  ...         2         2         2         2         2         2         2         2         2         2  
10"""
11
12# Group by Country
13country_deaths_df = deaths_df.groupby(by = ['Country/Region'],
14                                      as_index = False).sum()
15country_deaths_df.shape
16"""
17(191, 312)
18"""

Map countries to specific colors

It is necessary to get a list of all of the top ten counties as they change over time in the dataset.

 1df = country_deaths_df.drop(['Lat', 'Long'], axis = 1)
 2countries = []
 3for c in df.columns[1:]:
 4    df_num = (df[['Country/Region', c]][df[c] > 0]
 5              .sort_values(by = c, ascending = False)
 6              .head(10))
 7    countries.extend(list(df_num['Country/Region']))
 8countries = list(set(countries))
 9
10len(countries)
11"""
1224
13"""

It is necessary to map a set of colors to this set of 24 countries to maintain consistent colors for each country. This is necessary as each bar chart created in the bar chart race is independent of the previous bar chart and the default would reuse the same ten colors. Display the 24 countries with their assigned color.

1cols = list(plt.cm.Dark2.colors + plt.cm.Set3.colors + plt.cm.tab10.colors)
2cols.remove(plt.cm.Set3.colors[8])
3color_dict = {x[0]:x[1] for x in zip(countries, cols)}

Colors for countries with highest deaths from Covid-19



Create bar chart for top 10 countries for latest day

Extract the data for the country list on the latest date. Sort the data by the number of deaths and create a bar chart of the top ten countries with the highest number of deaths. Add labels to the end of each bar showing the number of deaths for each country and add a label for the date.

 1top_df = (sel_df.iloc[:, [-1]]
 2          .sort_values(by = sel_df.columns[-1], ascending = False)
 3          .head(10))
 4fig, ax = plt.subplots(nrows = 1,
 5                       ncols = 1,
 6                       figsize = (10, 7),
 7                       facecolor = plt.cm.Blues(.2),
 8                       tight_layout = True)
 9bars = ax.barh(y = range(1, len(top_df.index) + 1),
10               tick_label = top_df.index,
11               width = top_df.iloc[: , 0],
12               color = [color_dict[col] for col in top_df.index])
13day = pd.to_datetime(top_df.columns[0]).strftime('%b %d %Y')
14ax.set_title(f'Top ten countries with highest deaths from Covid-19 - {day}',
15             fontsize = 'xx-large',
16             fontweight = 'bold')
17ax.set_ylim(10.8, 0.2)
18ax.set_facecolor(plt.cm.Blues(.2))
19ax.tick_params(labelsize = 'medium')
20ax.grid(True, axis = 'x', color=plt.cm.Blues(0.05))
21[spine.set_visible(False) for spine in ax.spines.values()]
22
23for bar in bars:
24    width = bar.get_width()
25    ax.annotate(f'{width:,.0F}',
26                xy = (width , bar.get_y() + bar.get_height() / 2),
27                xytext = (25, 0),
28                textcoords = "offset points",
29                fontsize = 'x-large',
30                fontweight = 'bold',
31                ha = 'left',
32                va = 'center')
33
34# Add large Date in bottom right on chart
35ax.annotate(pd.to_datetime(top_df.columns[0]).strftime('%b %d'),
36            xy = (1.05, 0.1),
37            xycoords='axes fraction',
38            fontsize = 40,
39            fontweight = 'bold',
40            ha = 'right',
41            va = 'bottom')
42
43plt.show()

Top ten countries with highest deaths from Covid-19 on November 23


Transpose the data to wide format and expand the data

The data is transposed and the date is set as the index so that each row represents the data for a particular date and can be shown in a bar chart.

 1# Transpose the data for the countries of interest
 2wide_df = sel_df.T[countries].copy()
 3
 4# Remove the column name
 5wide_df.rename_axis(None, axis=1, inplace=True)
 6
 7# Set index to datetime
 8wide_df.index = pd.to_datetime([f"{x}" for x in wide_df.index])
 9
10wide_df.iloc[[0,1,2,3,-4,-3,-2,-1], [0,1,2,3,-3,-2,-1]]
11"""
12            United Kingdom  Netherlands   Iran  Italy      US  Brazil   Peru
132020-01-22               0            0      0      0       0       0      0
142020-01-23               0            0      0      0       0       0      0
152020-01-24               0            0      0      0       0       0      0
162020-01-25               0            0      0      0       0       0      0
172020-11-21           54721         8946  44327  49261  255946  168989  35549
182020-11-22           55120         8967  44802  49823  256866  169183  35549
192020-11-23           55327         9021  45255  50453  257779  169485  35595
202020-11-24           55935         9111  45738  51306  259925  170115  35641
21"""

Expand the data to plot a smooth transition so the bars on the chart do not jump around, but slide smoothly.

 1expanded_df = wide_df.asfreq('4h')
 2
 3expanded_df.shape
 4"""
 5(1843, 24)
 6"""
 7
 8expanded_df.iloc[-8:, [0,1,2,3,-3,-2,-1]]
 9"""
10                     United Kingdom  Netherlands     Iran    Italy        US    Brazil     Peru  
112020-11-22 20:00:00             NaN          NaN      NaN      NaN       NaN       NaN      NaN  
122020-11-23 00:00:00         55327.0       9021.0  45255.0  50453.0  257779.0  169485.0  35595.0  
132020-11-23 04:00:00             NaN          NaN      NaN      NaN       NaN       NaN      NaN  
142020-11-23 08:00:00             NaN          NaN      NaN      NaN       NaN       NaN      NaN  
152020-11-23 12:00:00             NaN          NaN      NaN      NaN       NaN       NaN      NaN  
162020-11-23 16:00:00             NaN          NaN      NaN      NaN       NaN       NaN      NaN  
172020-11-23 20:00:00             NaN          NaN      NaN      NaN       NaN       NaN      NaN  
182020-11-24 00:00:00         55935.0       9111.0  45738.0  51306.0  259925.0  170115.0  35641.0  
19"""

Create ranking dataset to rank the countries in each row. The rank order is used to position the bars on the bar chart so that the color remains consistent for each country.

 1rank_df = expanded_df.rank(axis = 1, method = 'first', ascending = False)
 2
 3rank_df.shape
 4"""
 5(1843, 24)
 6"""
 7
 8rank_df.iloc[-8:, [0,1,2,3,-3,-2,-1]]
 9"""
10                     United Kingdom  Netherlands  Iran  Italy   US  Brazil  Peru  
112020-11-22 20:00:00             NaN          NaN   NaN    NaN  NaN     NaN   NaN  
122020-11-23 00:00:00             5.0         16.0   8.0    6.0  1.0     2.0  11.0  
132020-11-23 04:00:00             NaN          NaN   NaN    NaN  NaN     NaN   NaN  
142020-11-23 08:00:00             NaN          NaN   NaN    NaN  NaN     NaN   NaN  
152020-11-23 12:00:00             NaN          NaN   NaN    NaN  NaN     NaN   NaN  
162020-11-23 16:00:00             NaN          NaN   NaN    NaN  NaN     NaN   NaN  
172020-11-23 20:00:00             NaN          NaN   NaN    NaN  NaN     NaN   NaN  
182020-11-24 00:00:00             5.0         16.0   8.0    6.0  1.0     2.0  11.0  
19"""

Interpolate the results to create a smooth transition from one day to the next. Add incremental values every 4 hours between the given values.

 1expanded_df = expanded_df.interpolate()
 2rank_df = rank_df.interpolate()
 3
 4expanded_df.iloc[-8:, [0,1,2,3,-3,-2,-1]]
 5"""
 6                     United Kingdom  Netherlands     Iran         Italy             US         Brazil          Peru
 72020-11-22 20:00:00    55292.500000       9012.0  45179.5  50348.000000  257626.833333  169434.666667  35587.333333
 82020-11-23 00:00:00    55327.000000       9021.0  45255.0  50453.000000  257779.000000  169485.000000  35595.000000
 92020-11-23 04:00:00    55428.333333       9036.0  45335.5  50595.166667  258136.666667  169590.000000  35602.666667
102020-11-23 08:00:00    55529.666667       9051.0  45416.0  50737.333333  258494.333333  169695.000000  35610.333333
112020-11-23 12:00:00    55631.000000       9066.0  45496.5  50879.500000  258852.000000  169800.000000  35618.000000
122020-11-23 16:00:00    55732.333333       9081.0  45577.0  51021.666667  259209.666667  169905.000000  35625.666667
132020-11-23 20:00:00    55833.666667       9096.0  45657.5  51163.833333  259567.333333  170010.000000  35633.333333
142020-11-24 00:00:00    55935.000000       9111.0  45738.0  51306.000000  259925.000000  170115.000000  35641.000000
15"""

Remove duplicate ranks so that one country does not hide another when they have the exact same rank.

1# Remove any duplicate ranks from the same row
2while ((rank_df.where(~rank_df.apply(pd.Series.duplicated, axis=1), -1)) == -1).any(axis = None):
3    rank_df = rank_df.where(~rank_df.apply(pd.Series.duplicated, axis=1), rank_df*1.01)

Display a sample bar chart from the expanded dataframe using the ranking to order the bars. Add annotations to each bar to display the number of deaths. Add a large display of the current day, which will make the animation more understandable.

 1fig, ax = plt.subplots(nrows = 1,
 2                       ncols = 1,
 3                       figsize = (10, 7),
 4                       facecolor = plt.cm.Blues(.2),
 5                       tight_layout = True)
 6p = expanded_df.columns.map(len).max()
 7bar_num = 10
 8i = 1117
 9sel_df = expanded_df.iloc[:, list(rank_df.iloc[i] <= bar_num)]
10bars = ax.barh(y = rank_df.iloc[:, list(rank_df.iloc[i] <= bar_num)].iloc[i],
11               tick_label = [x.rjust(p, ' ') for x in sel_df.columns],
12               width = sel_df.iloc[i],
13               color = [color_dict[col] for col in sel_df.columns],
14               alpha = 0.8)
15
16plt.setp(ax.get_xticklabels(), fontsize='small')
17plt.setp(ax.get_yticklabels(), fontsize='medium', fontfamily = 'monospace')
18cur_day = expanded_df.index[i].strftime('%Y-%m')
19ax.set_title(f'Countries with highest deaths from Covid-19 on {cur_day}',
20             fontsize = 'xx-large',
21             fontweight = 'bold')
22ax.set_ylim(10.8, 0.2)
23ax.set_facecolor(plt.cm.Blues(.2))
24ax.grid(True, axis = 'x', color=plt.cm.Blues(.05))
25ax.set_axisbelow(True)
26[spine.set_visible(False) for spine in ax.spines.values()]
27
28for bar in bars:
29    width = bar.get_width()
30    ax.annotate(f'{width:,.0F}',
31                xy = (width , bar.get_y() + bar.get_height() / 2),
32                xytext = (25, 0),
33                textcoords = "offset points",
34                fontsize = 'large',
35                fontweight = 'bold',
36                ha = 'left',
37                va = 'center')
38
39plt.show()

Countries with highest deaths from Covid-19 on sample date



Display a sample of the bar charts for random dates to ensure the bar charts are displaying correctly with the correct ranking and countries maintain their color.

 1sample_num = 5
 2d_df = expanded_df.sample(n = sample_num, random_state = 12).sort_index()
 3r_df = rank_df.loc[d_df.index]
 4
 5fig, axs = plt.subplots(nrows = 1,
 6                        ncols = sample_num,
 7                        figsize = (15, 5),
 8                        facecolor = plt.cm.Blues(.2),
 9                        tight_layout = True)
10
11for i, ax in enumerate(axs.flatten()):
12    sel_df = d_df.iloc[:, list(r_df.iloc[i] <= bar_num)]
13    bars = ax.barh(y = r_df.iloc[:, list(r_df.iloc[i] <= bar_num)].iloc[i],
14                   tick_label = sel_df.columns,
15                   width = sel_df.iloc[i],
16                   color = [color_dict[col] for col in sel_df.columns],
17                   alpha = 0.8)
18
19    cur_day = d_df.index[i].strftime('%Y-%m-%d')
20    ax.set_title(f'{cur_day}',
21                 fontsize = 'large',
22                 fontweight = 'bold')
23    ax.set_ylim(10.8, 0.2)
24    ax.set_facecolor(plt.cm.Blues(.2))
25    ax.tick_params(labelsize = 'xx-small')
26    ax.grid(True, axis = 'x', color=plt.cm.Blues(.1))
27    ax.set_axisbelow(True)
28    [spine.set_visible(False) for spine in ax.spines.values()]
29
30    for bar in bars:
31        width = bar.get_width()
32        ax.annotate(f'{width:,.0F}',
33                    xy = (width , bar.get_y() + bar.get_height() / 2),
34                    xytext = (5, 0),
35                    textcoords = "offset points",
36                    fontsize = 'small',
37                    fontweight = 'bold',
38                    ha = 'left',
39                    va = 'center')
40plt.show()

Sample of bar charts for countries with highest deaths from Covid-19



Create the animation

The animation is created by generating a bar chart for each of the rows in the expanded dataframe using the rankings dataframe for bar positions. These images are then combined into sequence using the FuncAnimation function in Matplotlib.

 1def update(i):
 2    ax.clear()
 3
 4    p = expanded_df.columns.map(len).max()
 5    bar_num = 10
 6    sel_df = expanded_df.iloc[:, list(rank_df.iloc[i] <= bar_num)]
 7    bars = ax.barh(y = rank_df.iloc[:, list(rank_df.iloc[i] <= bar_num)].iloc[i],
 8                   tick_label = [x.rjust(p, ' ') for x in sel_df.columns],
 9                   width = sel_df.iloc[i],
10                   color = [color_dict[col] for col in sel_df.columns],
11                   alpha = 0.8)
12    plt.setp(ax.get_xticklabels(), fontsize='x-small')
13    plt.setp(ax.get_yticklabels(), fontsize='small', fontfamily = 'monospace')
14
15    cur_day = expanded_df.index[i].strftime('%Y-%b-%d')
16    ax.set_title(f'Deaths from Covid-19 - {cur_day}',
17                 fontsize = 'x-large',
18                 fontweight = 'bold',
19                 loc = 'center')
20    ax.set_ylim(10.8, 0.2)
21    ax.set_facecolor(plt.cm.Blues(.2))
22    ax.grid(True, axis = 'x', color=plt.cm.Blues(.1))
23    ax.set_axisbelow(True)
24    [spine.set_visible(False) for spine in ax.spines.values()]
25
26    for bar in bars:
27        width = bar.get_width()
28        ax.annotate(f'{width:,.0F}',
29                    xy = (width , bar.get_y() + bar.get_height() / 2),
30                    xytext = (10, 0),
31                    textcoords = "offset points",
32                    fontsize = 'small',
33                    ha = 'left',
34                    va = 'center')
35
36    # Add large Date in bottom right on chart
37    ax.annotate(expanded_df.index[i].strftime('%b %d'),
38                xy = (1.25, 0.1),
39                xycoords='axes fraction',
40                fontsize = 40,
41                fontweight = 'bold',
42                ha = 'right',
43                va = 'bottom')
44
45fig, ax = plt.subplots(figsize = (8, 4),
46                       facecolor = plt.cm.Blues(.2),
47                       dpi = 50,
48                       tight_layout = True)
49
50covid_anim = anim.FuncAnimation(
51    fig = fig,
52    func = update,
53    frames = len(expanded_df),
54    interval = 100)
55
56
57covid_anim.save('COVID_bar_chart_race_2020.gif')


Countries with highest deaths from Covid-19

The animation can also be converted to HTML5 video or saved as MP4.

1html = covid_anim.to_html5_video()
2
3covid_anim.save('COVID_bar_chart_race_2020.gif')

MP4 version available here - MP4 bar chart race for Countries with highest deaths from Covid-19



Conclusion

A bar chart race can be an effective way to visualise the increase in deaths from COVID-19 in different contries. There is a bit of work in creating the animated chart, which has been laid out in this article. The concept is straight forward, simply sequence through the data and display a bar chart for each day and the Matplotlib FuncAnimation is great for creating the animation. Some of the things to keep in mind are to implement intermediary data to give a smooth transition and to maintain consistent color for each country.