Plotly - Display an Interactive Chart
Pandas is great for searching, filtering and manipulating data and information can be visualised in charts easily using Matplotlib module. These charts are static and provide a snapshot of data that can be annotated to highlight certain data points. These are great for posters, slides and printed materials. It would be great if these charts could be more interactive as the world has moved to digital media and more data visualisations are viewed on screens. Plotly is used to recreate the under five mortality rate changes over time in an interactive chart.
Python modules used in this article
1# import modules
2import pandas as pd
3import numpy as np
4import shutil as sh
5import requests
6import os
7
8import plotly.express as px
9import plotly.graph_objects as go
10from plotly.subplots import make_subplots
Plotly
Plotly is an open-source graphing library for Python that produces interactive charts. This allows the user to hover over data points to display exact values; Zoom in on sections of a chart; deselect key data series to focus on other data and reset the chart to the original. Plotly can be used to create extremely rich visualisations with many user interactions.
Load the data into a dataframe
How to download and load the excel data is covered in "Pandas - Load data from Excel file and Display Chart". This article will focus on creating charts with Plotly from the Pandas dataframe. The following code loads the data from Excel and filters to the median values. The source file is available from Unicef datasets at under-five mortality rate.
1# Load the excel worksheet into a dataframe
2u5mr_df = pd.read_excel(
3 "/tmp/data/Under-five-mortality-rate_2020.xlsx",
4 sheet_name = 'Country estimates (both sexes)',
5 header = 14)
6
7# Drop the last two rows
8u5mr_df.drop(u5mr_df.tail(2).index, inplace = True)
9
10# Rename the columns to Years
11u5mr_df.columns = [x[:-2] if x.endswith('.5') else x for x in u5mr_df.columns]
12
13# Rename 'Uncertainty.Bounds*' column to 'Uncertainty.Bounds'
14u5mr_df = u5mr_df.rename(columns={'Uncertainty.Bounds*': 'Uncertainty.Bounds'})
15
16# Filter to the Median values
17u5mr_med_df = u5mr_df[u5mr_df['Uncertainty.Bounds'] == 'Median']
18
19# Review the data
20u5mr_med_df.iloc[[0,1,2,3,-4,-3,-2,-1], [0,1,2,3,4,5,6,-4,-3,-2,-1]]
21"""
22 ISO.Code Country.Name Uncertainty.Bounds 1950 1951 1952 1953 2016 2017 2018 2019
231 AFG Afghanistan Median NaN NaN NaN NaN 67.572190 64.940759 62.541196 60.269399
244 ALB Albania Median NaN NaN NaN NaN 9.419110 9.418052 9.525133 9.682407
257 DZA Algeria Median NaN NaN NaN NaN 24.792098 24.319482 23.805926 23.256168
2610 AND Andorra Median NaN NaN NaN NaN 3.369056 3.218925 3.085839 2.966929
27574 VNM Viet Nam Median NaN NaN NaN NaN 21.220796 20.843125 20.405423 19.935167
28577 YEM Yemen Median NaN NaN NaN NaN 56.823614 56.966430 58.460003 58.356138
29580 ZMB Zambia Median NaN NaN NaN 234.418232 66.510929 64.337901 63.294182 61.663465
30583 ZWE Zimbabwe Median NaN NaN NaN NaN 59.538505 58.234924 55.856832 54.612967
31"""
Show simple Plotly chart
This code gets the top ten countries with the highest Under Five Mortality Rates in 2019
and uses plotly.graph_objects
to create a simple barchart. While the layout of this
chart could made nicer, hovering over each of the bars displays the details of the country
and the U5MR for that country. The hover text displays external to the bar and
automatically switches to displaying inside bar when there is not enough room. There
is also a Plotly toolbar displayed on the top right of the chart. Selecting a region
inside the chart will zoon in on the selected area.
1Top_10 = u5mr_med_df.sort_values(by=u5mr_df.columns[-1], ascending=False)[["Country.Name", "2019"]].head(10)
2"""
3 Country.Name 2019
4376 Nigeria 117.202078
5481 Somalia 116.972096
6100 Chad 113.790418
797 Central African Republic 110.053912
8466 Sierra Leone 109.236528
9214 Guinea 98.802973
10487 South Sudan 96.229299
11316 Mali 94.035418
1255 Benin 90.286429
1379 Burkina Faso 87.542426
14"""
15
16import plotly.graph_objects as go
17fig = go.Figure(
18 data = [go.Bar(x = list(Top_10["2019"]),
19 y = list(Top_10["Country.Name"]),
20 orientation = "h")],
21 layout = go.Layout(
22 title = go.layout.Title(
23 text = "Top ten countries with highest Under Five Infant Mortality in 2019"
24 )
25 )
26)
27
28fig.show()
Simple horizontal bar chart showing top ten countries with highest Under 5 Mortality Rates in 2019
Show top ten U5MR bar Chart
The following code modifies the default bar chart to improve the display. The chart layout, colors and font are changed as well as the format of the hover text.
1Top_10 = u5mr_med_df.sort_values(by=u5mr_df.columns[-1], ascending=False)[["Country.Name", "2019"]].head(10)
2bg_color = 'rgba(208,225,242,1.0)'
3
4fig = go.Figure(
5 data = [go.Bar(x = list(Top_10["2019"]),
6 y = list(Top_10["Country.Name"]),
7 hovertemplate = "%{y}: <br><br>U5MR: %{x:.1f}<extra></extra>",
8 orientation = 'h')]
9)
10
11fig.update_layout(
12 # Set default font
13 font = dict(
14 family = "Droid Sans",
15 size = 16
16 ),
17 # Set figure title
18 title = dict(
19 text = "<b>Top ten countries with highest Under Five Mortality Rate in 2019</b>",
20 xref = 'container',
21 yref = 'container',
22 x = 0.5,
23 y = 0.91,
24 xanchor = 'center',
25 yanchor = 'middle',
26 font = dict(family = 'Droid Sans', size = 24)
27 ),
28 # set x-axis
29 xaxis = dict(
30 title = dict(
31 text = 'Under Five Mortality Rate (per 1000 live births)',
32 font = dict(family = 'Droid Sans', size = 18)
33 ),
34 showgrid = False,
35 linecolor = bg_color,
36 linewidth = 2,
37 showticklabels = True,
38 ),
39 # set y-axis
40 yaxis = dict(
41 showgrid = False,
42 linecolor = bg_color,
43 linewidth = 4,
44 ),
45 # set the plot bacground color
46 plot_bgcolor = bg_color,
47 # set the hover background color
48 hoverlabel = dict(
49 bgcolor = 'rgba(75,152,201,0.2)',
50 font_size = 16
51 ),
52 paper_bgcolor = bg_color,
53)
54
55fig.update_traces(marker_color = 'rgb(75,152,201)')
56fig['layout']['yaxis']['autorange'] = "reversed"
57
58fig.show()
Nicer bar chart showing top ten countries with highest Under 5 Mortality Rates in 2019
Show top ten U5MR over time
There is a limit to how beneficial an interactive chart is when it is just displaying the total numbers for the top ten countries in the bar chart above. The interactive chart is much more useful when looking at a multiple sets of data plotted over time such as the changes in the current top ten countries over time. The following line chart shows the changes in Under 5 Mortality Rate for countries with the highest rates in an interactive plotly chart. This is similar to a static chart that was previously created with Matplotlib. The advantages are that exact numbers fo a prticular can be seen by hovering over that point as well as hiding some countries to focus on countries of interest.
This function Extract the mortality rates for the top number of countries based on rates in the latest year.
1def get_top_countries(data_df, lowest = True, num = 10):
2 '''Extract the mortality rates for the top number of countries
3 based on rates in the latest year
4
5 Keyword arguments:
6 data_df -- dateframe of all the mortality rates for all the countries
7 lowest -- boolean flag to either contries with the lowest rate
8 or highest rate (default is True)
9 num -- number of countries to return (default is 10)
10
11 return: dataframe that has been transposed and filtered to the top n counbtries
12 '''
13 # Need to transpose the dataframe to use year as the x-axis
14 df = data_df.sort_values(by = data_df.columns[-1], ascending = lowest).head(10).T
15 df.reset_index(drop = False, inplace = True)
16
17 # Set the Country.Name as the heading for the columns
18 df.columns = df.iloc[np.where(df['index'] == 'Country.Name')[0][0]]
19
20 # Rename the Country.Name column to Year
21 df = df.rename(columns = {'Country.Name': 'Year'})
22
23 # Drop the rows that do not contain u5mr data
24 df = (df[df['Year']
25 .isin(['ISO.Code',
26 'Country.Name',
27 'Uncertainty.Bounds']) == False]
28 )
29 df.reset_index(drop = True, inplace = True)
30
31 return df
This function creates the Plotly line chart showing the changes in each country over time. It uses a separate function to format the hover text to make it easier to modify and maintain.
1def hov_text(ary_x, ary_y, country):
2 '''Create the hover text for each data point
3
4 Keyword arguments:
5 ary_x -- list of all the x data points
6 ary_y -- list of all the y data points
7 country -- name of country
8
9 return: A list of hover texts for the selected country
10 with format to display
11 '''
12 txt = [f'''
13 <b>{country}<b><br><br>
14 Year: <b>{ary_x[i]}</b><br>
15 U5MR: <b>{ary_y[i]:.1F}</b><br>
16 <extra></extra>
17 ''' for i in range(len(ary_x))]
18 return txt
19
20
21def create_top_case_chart(df, title):
22 '''Create a plotly line chart from the dataframe, which must contain
23 a column for 'Year' and the other columns as countries
24
25 Keyword arguments:
26 df -- dateframe of the mortality rates for all the countries
27 title -- text to be displayed as the title of the chart
28
29 return: a Plotly Graph Object figure
30 '''
31 bg_color = 'rgba(208, 225, 242, 1.0)'
32 line_color = 'rgba(75, 152, 201, 1.0)'
33 grid_color = 'rgba(75, 152, 201, 0.3)'
34
35 y_max = ((max(df.drop(['Year'], axis = 'columns').max()) // 100) + 1) * 100
36 y_min = 0
37 x_min = int(min(df['Year'])) - 2
38 x_max = int(max(df['Year'])) + 2
39
40 fig = go.Figure()
41 for c in df.drop(['Year'], axis = 'columns').columns:
42 fig.add_trace(go.Scatter(x = df.Year,
43 y = df[c],
44 name = c,
45 hovertemplate = hov_text(df.Year, df[c], c)))
46
47 fig.update_traces(
48 hoverinfo = 'text+name',
49 mode = 'lines'
50 )
51
52 fig.update_layout(
53 # Set figure title
54 title = dict(
55 text = f'<b>{title}</b>',
56 xref = 'container',
57 yref = 'container',
58 x = 0.5,
59 y = 0.9,
60 xanchor = 'center',
61 yanchor = 'middle',
62 font = dict(family = 'Droid Sans', size = 28)
63 ),
64 # set legend
65 legend = dict(
66 orientation = "h",
67 traceorder = 'normal',
68 font_size = 12,
69 x = 0.0,
70 y = -0.3,
71 xanchor = "left",
72 yanchor = "top"
73 ),
74 # set x-axis
75 xaxis = dict(
76 title = 'Year',
77 range = [x_min, x_max],
78 linecolor = line_color,
79 linewidth = 2,
80 gridcolor = grid_color,
81 showticklabels = True,
82 ticks = 'outside',
83 ),
84 # set y-axis
85 yaxis = dict(
86 title = 'Under Five Mortality Rate',
87 range = [y_min, y_max],
88 linecolor = line_color,
89 linewidth = 2,
90 gridcolor = grid_color,
91 showticklabels = True,
92 ticks = 'outside',
93 ),
94 showlegend = True,
95 # set the plot bacground color
96 plot_bgcolor = bg_color,
97 paper_bgcolor = bg_color,
98 )
99
100 return fig
Finally these functions are used to create the chart for the top ten countries.
1top_10_fig = create_top_case_chart(
2 get_top_countries(u5mr_med_df, False),
3 f'Changes in Under Five Mortality Rate <BR>for countries with the highest in {u5mr_df.columns[-1]}')
4top_10_fig.show()
Interactive chart showing changes in Under 5 Mortality Rate for countries with the highest rates
Show lower ten U5MR over time
It can take some time formatting the layout and colors of a chart to display just right. The good news is that the same functions can be used to display similar sets of data. The following code is all that is needed to get the ten countries with the current lowest Under Five Mortality Rates.
1lowest_10_fig = create_top_case_chart(
2 get_top_countries(u5mr_med_df, True),
3 f'Changes in Under Five Mortality Rate <BR>for countries with the lowest in {u5mr_df.columns[-1]}')
4lowest_10_fig.show()
Interactive chart showing changes in Under 5 Mortality Rate for countries with the lowest rates
Display data for specific countries
The same chart function can be used to display the changes on mortality rates over time. The function above to retrieve the top five is a little too specific and could possibly split into two functions - one to filter the data of interest and a second to transpose the data and prepare it for creating the chart.
1countries = [
2 'Canada', 'China', 'France', 'Iceland',
3 'India', 'Ireland', 'Mexico', 'Sweden',
4 'United Kingdom', 'United States of America'
5]
6
7select_df = u5mr_med_df[u5mr_med_df['Country.Name'].isin(countries)]
8
9# Need to transpose the dataframe to use year as the x-axis
10select_df = select_df.sort_values(by = select_df.columns[-1]).T
11select_df.reset_index(drop = False, inplace = True)
12
13# Set the Country.Name as the heading for the columns
14select_df.columns = select_df.iloc[np.where(select_df['index'] == 'Country.Name')[0][0]]
15
16# Rename the Country.Name column to Year
17select_df = select_df.rename(columns = {'Country.Name': 'Year'})
18
19# Drop the rows that do not contain u5mr data
20select_df = (select_df[select_df['Year']
21 .isin(['ISO.Code',
22 'Country.Name',
23 'Uncertainty.Bounds']) == False]
24 )
25select_df.reset_index(drop = True, inplace = True)
26
27select_fig = create_top_case_chart(
28 select_df,
29 f'Changes in Under Five Mortality Rate for selected countries')
30
31select_fig.show()
Interactive chart showing changes in Under 5 Mortality Rate for selected countries
Host the interactive chart in static web site
I've created dashboards using Dash, which creates nice interactive dashboards in flask-like apps. I'd like to give a shout out to Igor Gotlibovych for providing instructions on how to host plotly interactive charts on a Hugo static website - Including plotly figures in Hugo posts. Thank you for providing this information.
Conclusion
Matplotlib is great for creating charts in Python, but the results are static images of the data. Plotly is great for creating interactive charts that help bring the data to life. These charts can be customised in a number of ways to present data that users can hover over to see more information, hide some data by deselecting on the legend or zoom in on a section of the chart.