Display world map with country data

Geographical maps can be a great way to present a global view of where data applies to different countries and regions. There are a number of third-party tools and libraries that can be used to help create these maps such as Basemap, Cartopy as well as Geopandas and Geoplot. This article will show how to use Geopandas to display a world map showing countries with higher Under Five Mortality Rates in darker colors.



Load the data

Details on downloading and loading the Under Five Mortality Rate data into a dataframe is described in "Pandas - Load data from Excel file and Display Chart". The following loads the data from a local file and filters on the Median values for each country. The ISO.Code field contains the international standard three-letter code for countries in the world.

 1# Load the excel worksheet into a dataframe
 2u5mr_df = pd.read_excel(
 3    "/tmp/data/Under-five-mortality-rate_2020.xlsx",
 4    engine = "openpyxl",
 5    sheet_name = 'Country estimates (both sexes)',
 6    header = 14,
 7    nrows = 585)
 8
 9# Rename 'Uncertainty.Bounds*' column to 'Uncertainty.Bounds'
10u5mr_df = u5mr_df.rename(columns={'Uncertainty.Bounds*': 'Uncertainty.Bounds'})
11
12# Convert year column names to datetime
13u5mr_df.columns = [x[:-2] if x.endswith('.5') else x for x in u5mr_df.columns]# u5mr_df.columns = [pd.to_datetime(f'{x[:-2]}-12-31') if x.endswith('.5') else x for x in u5mr_df.columns]
14
15# Filter to the Median values
16u5mr_med_df = u5mr_df[u5mr_df['Uncertainty.Bounds'] == 'Median']
17
18# Review the data
19u5mr_med_df.iloc[[0,1,2,3,-4,-3,-2,-1], [0,1,2,3,12,-3,-2,-1]]
20
21"""
22(195, 73)
23    ISO.Code Country.Name Uncertainty.Bounds  1950        1959       2017       2018       2019
241        AFG  Afghanistan             Median   NaN         NaN  64.940759  62.541196  60.269399
254        ALB      Albania             Median   NaN         NaN   9.418052   9.525133   9.682407
267        DZA      Algeria             Median   NaN  240.344776  24.319482  23.805926  23.256168
2710       AND      Andorra             Median   NaN         NaN   3.218925   3.085839   2.966929
28574      VNM     Viet Nam             Median   NaN         NaN  20.843125  20.405423  19.935167
29577      YEM        Yemen             Median   NaN         NaN  56.966430  58.460003  58.356138
30580      ZMB       Zambia             Median   NaN  208.929172  64.337901  63.294182  61.663465
31583      ZWE     Zimbabwe             Median   NaN  155.256789  58.234924  55.856832  54.612967
32"""


Load Geopandas

Geopandas and Geoplot are required to create these charts. These can be installed with pip install geopandas and pip install geoplot. Geopandas requires the Geospatial Data Abstraction Library (GDAL) library to be installed - good instructions here on How to install GDAL. Geopandas has a dataset for the contours of all the countries, as well as some information like population estimates, that can be used to plot the world map and colour countries based on population. The following code loads the 'naturalearth_lowres' dataset and plots a map with countries colored based on country population, the darker greens represent higher population. This shows that China and India dominate this map.

 1import geopandas
 2import geoplot
 3import mapclassify
 4
 5world = geopandas.read_file(
 6    geopandas.datasets.get_path('naturalearth_lowres')
 7)
 8fig, ax = plt.subplots(figsize = (10,4), facecolor = plt.cm.Blues(.2))
 9fig.suptitle('Country Populations',
10             fontsize = 'xx-large',  
11             fontweight = 'bold')
12ax.set_facecolor(plt.cm.Blues(.2))
13world.plot(column = 'pop_est',
14           cmap = 'Greens',
15           ax = ax,
16           legend = True)
17
18plt.show()

Map showing country color based on population
Map showing country color based on population



Explore 'naturalearth_lowres' dataset

The 'naturalearth_lowres' dataset is loaded into a Geodataframe, which is a specialised form of a Pandas Dataframe. So all of the functions and properties of Dataframe apply to geodataframe. A geodataframe always contains one geoseries called geometry that holds spatial status.

In the world geodataframe loaded from naturalearth_lowres dataset there are 6 columns and 177 rows.

 1type(world)
 2"""
 3<class 'geopandas.geodataframe.GeoDataFrame'>
 4"""
 5
 6world.shape
 7"""
 8(177, 6)
 9"""
10
11world.columns
12"""
13Index(['pop_est', 'continent', 'name', 'iso_a3', 'gdp_md_est', 'geometry'], dtype='object')
14"""
15
16world.index
17"""
18RangeIndex(start=0, stop=177, step=1)
19"""
20
21world.iloc[[0,1,2,3,-4,-3,-2,-1], :]
22"""
23      pop_est      continent                 name iso_a3 gdp_md_est                                           geometry  
240      920938        Oceania                 Fiji    FJI     8374.0  MULTIPOLYGON (((180.00000 -16.06713, 180.00000...  
251    53950935         Africa             Tanzania    TZA   150600.0  POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...  
262      603253         Africa            W. Sahara    ESH      906.5  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...  
273    35623680  North America               Canada    CAN  1674000.0  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...  
28173    642550         Europe           Montenegro    MNE    10610.0  POLYGON ((20.07070 42.58863, 19.80161 42.50009...  
29174   1895250         Europe               Kosovo    -99    18490.0  POLYGON ((20.59025 41.85541, 20.52295 42.21787...  
30175   1218208  North America  Trinidad and Tobago    TTO    43570.0  POLYGON ((-61.68000 10.76000, -61.10500 10.890...  
31176  13026129         Africa             S. Sudan    SSD    20880.0  POLYGON ((30.83385 3.50917, 29.95350 4.17370, ...  
32"""

Show countries with population greater than 200 million

 1world[(world.pop_est > 200000000)]
 2"""
 3        pop_est      continent                      name iso_a3  gdp_md_est                                           geometry  
 44     326625791  North America  United States of America    USA  18560000.0  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...  
 58     260580739           Asia                 Indonesia    IDN   3028000.0  MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ...  
 629    207353391  South America                    Brazil    BRA   3081000.0  POLYGON ((-53.37366 -33.76838, -53.65054 -33.2...  
 798   1281935911           Asia                     India    IND   8721000.0  POLYGON ((97.32711 28.26158, 97.40256 27.88254...  
 8102   204924861           Asia                  Pakistan    PAK    988200.0  POLYGON ((77.83745 35.49401, 76.87172 34.65354...  
 9139  1379302771           Asia                     China    CHN  21140000.0  MULTIPOLYGON (((109.47521 18.19770, 108.65521 ...  
10"""

Plot the relative population of countries in Africa. This is done by filtering the dataframe on the continent of 'Africa'

 1fig, ax = plt.subplots(figsize = (6,5), facecolor = plt.cm.Blues(.2))
 2fig.suptitle('Africa Populations',
 3             fontsize = 'xx-large',  
 4             fontweight = 'bold')
 5ax.set_facecolor(plt.cm.Blues(.2))
 6ax = world[world.continent == 'Africa'].plot(
 7    column = 'pop_est',
 8    cmap = 'Greens',
 9    ax = ax,
10    legend = True)
11
12plt.show()

Map of Africa showing country color based on population

Map of Africa showing country color based on population



Display country color based on Under Five Mortality Rates in 2019

A global map of the Under Five Mortality Rates can be plotted by merging the Under five mortality rates dataframe with the world geodataframe. The merge is done on the ISO country code. There is a discrepancy in the dataframes in that there are 195 countries in the Under Five Mortality Rates dataframe and only 177 countries in the world geodataframe.

 1# Countries in ufmr dataframe that are not in the world geodataframe
 2u5mr_med_df[~u5mr_med_df['ISO.Code'].isin(list(world['iso_a3']))][['ISO.Code', 'Country.Name']]
 3"""
 4    ISO.Code                      Country.Name
 510       AND                           Andorra
 616       ATG               Antigua and Barbuda
 737       BHR                           Bahrain
 843       BRB                          Barbados
 985       CPV                        Cabo Verde
10112      COM                           Comoros
11118      COK                      Cook Islands
12151      DMA                          Dominica
13187      FRA                            France
14208      GRD                           Grenada
15271      KIR                          Kiribati
16313      MDV                          Maldives
17319      MLT                             Malta
18322      MHL                  Marshall Islands
19328      MUS                         Mauritius
20334      FSM  Micronesia (Federated States of)
21337      MCO                            Monaco
22358      NRU                             Nauru
23379      NIU                              Niue
24382      NOR                            Norway
25391      PLW                             Palau
26436      KNA             Saint Kitts and Nevis
27439      LCA                       Saint Lucia
28442      VCT  Saint Vincent and the Grenadines
29445      WSM                             Samoa
30448      SMR                        San Marino
31451      STP             Sao Tome and Principe
32463      SYC                        Seychelles
33469      SGP                         Singapore
34526      TON                             Tonga
35541      TUV                            Tuvalu
36"""

Show countries that are in the world geodataframe, but not in the Under Five Mortality Rates dataframe.

 1# list all countries in World not in ufmr
 2world[~world['iso_a3'].isin(list(u5mr_med_df['ISO.Code']))][['iso_a3', 'name']]
 3"""
 4    iso_a3                    name
 52      ESH               W. Sahara
 620     FLK            Falkland Is.
 721     -99                  Norway
 822     GRL               Greenland
 923     ATF  Fr. S. Antarctic Lands
1043     -99                  France
1145     PRI             Puerto Rico
12134    NCL           New Caledonia
13140    TWN                  Taiwan
14159    ATA              Antarctica
15160    -99               N. Cyprus
16167    -99              Somaliland
17174    -99                  Kosovo
18"""

There are five countries in the world dataframe with iso_a3 code set to "-99". Three of these are disputed or don't yet have an ISO designation. France and Norway have ISO codes of "FRA" and "NOR" respectively and are updated in the geodataframe with the following.

1# Update ISO codes for France and Norway
2world.loc[world.name == 'France', 'iso_a3'] = 'FRA'
3world.loc[world.name == 'Norway', 'iso_a3'] = 'NOR'

Merge the data from Under Five Mortality Rates with the world geodataframe. The merge is done with an left join on the ISO country code. This reduces the number of countries to 166 that have both mortality information and geometry information. Use of a 'left' join keeps all the countries in the original world geodataframe with the appropriate geometry data. The 11 countries (such as Greenland) that only appear in the world geodataframe will have NaN for all of the under five mortality rate data.

1u5mr_world_df = world.merge(u5mr_med_df,
2                            left_on = 'iso_a3',
3                            right_on = 'ISO.Code',
4                            how = 'left')
5u5mr_world_df.shape
6"""
7(177, 79)
8"""

Create a plot showing the under five mortality rates per country for year 2019. The data in the geopandas.geodataframe can be displayed in a number of ways and it can be confusing to know which parameter to set. This code creates the same plot in four different ways. The first three use the plot function on the geodataframe, which uses matplotlib to generate the plot. The first plot sets a scheme of 'quantiles' for the choropleth classification scheme, which colors the countries based on discrete intervals. When choropleth classification scheme of 'quantiles' is used the legend is of type matplotlib.pyplot.legend so the legend_kwds parameters are different. The default scheme is None, in which case the legend is of type matplotlib.pyplot.colorbar. This is used in chart 2 and 3, with the colorbar being changed from vertical to horizontal. Finally, the fourth plot is rendered using the choropleth function in geoplot module.

 1u5mr_year = u5mr_world_df['2019']
 2fig, axs = plt.subplots(
 3    nrows = 2,
 4    ncols = 2,
 5    figsize = (12,5),
 6    facecolor = plt.cm.Blues(.2))
 7fig.suptitle('National Under Five Mortality Rates in 2019',
 8             fontsize = 'xx-large',  
 9             fontweight = 'bold')
10for ax in axs.flatten():
11    ax.set_facecolor(plt.cm.Blues(.2))
12
13ax1 = axs[0][0]
14u5mr_world_df.plot(
15    ax = ax1,
16    color = 'white',
17    edgecolor = 'black'
18)
19
20u5mr_world_df.plot(
21    column = u5mr_year,
22    scheme = 'quantiles',
23    k = 6,
24    cmap = 'OrRd',
25    ax = ax1,
26    legend = True,
27    legend_kwds = {'title': "UFMR per 1000",
28                   'title_fontsize': 'small',
29                   'frameon': False,
30                   'loc': 'lower center',
31                   'bbox_to_anchor': (-0.2, 0.1, 0.5, 1),
32                   'fontsize': 'xx-small',
33                  },
34)
35[spine.set_visible(False) for spine in ax1.spines.values()]
36ax1.xaxis.set_visible(False)
37ax1.yaxis.set_visible(False)
38
39
40ax2 = axs[0][1]
41u5mr_world_df.plot(
42    ax = ax2,
43    color = 'white',
44    edgecolor = 'black'
45)
46
47u5mr_world_df.plot(
48    column = u5mr_year,
49    cmap = 'OrRd',
50    ax = ax2,
51    legend = True,
52    legend_kwds = {'label': "UFMR per 1000"},
53)
54[spine.set_visible(False) for spine in ax2.spines.values()]
55ax2.xaxis.set_visible(False)
56ax2.yaxis.set_visible(False)
57
58ax3 = axs[1][0]
59u5mr_world_df.plot(
60    ax = ax3,
61    color = 'white',
62    edgecolor = 'black'
63)
64
65u5mr_world_df.plot(
66    column = u5mr_year,
67    cmap = 'OrRd',
68    ax = ax3,
69    legend = True,
70    legend_kwds = {'label': "UFMR per 1000",
71                   'orientation': 'horizontal',
72                   'shrink': 0.7,
73                  },
74)
75[spine.set_visible(False) for spine in ax3.spines.values()]
76ax3.xaxis.set_visible(False)
77ax3.yaxis.set_visible(False)
78
79
80gplt.choropleth(
81    u5mr_world_df,
82    hue = u5mr_year,
83#     scheme = scheme,
84    cmap = 'OrRd',
85    ax = axs[1][1],
86    legend = True
87)
88
89plt.show()

Map showing country Under Five Mortality Rates in 2019
Map showing country Under Five Mortality Rates in 2019



Display country color based on Under Five Mortality Rates in 1985

A function can be created to wrap up the creation of a world map for a particular year based on option 2 above. A color of gray is added to handle missing data using misssing_kwds parameter. This is valuable when dealing with earlier years where there is no data available for many countries and displaying white or no color can be misleading.

 1def create_map_for_year(year, title):
 2    u5mr_year = u5mr_world_df[year]
 3    fig, ax = plt.subplots(
 4        nrows = 1,
 5        ncols = 1,
 6        figsize = (15,6),
 7        facecolor = plt.cm.Blues(.2))
 8    fig.suptitle(title,
 9                 fontsize = 'xx-large',  
10                 fontweight = 'bold')
11    ax.set_facecolor(plt.cm.Blues(.2))
12
13    u5mr_world_df.plot(
14        ax = ax,
15        color = 'white',
16        edgecolor = 'black'
17    )
18
19    u5mr_world_df.plot(
20        column = u5mr_year,
21        cmap = 'OrRd',
22        ax = ax,
23        legend = True,
24        legend_kwds = {'label': "UFMR per 1000",
25                      'shrink': 0.7},
26        missing_kwds = {'facecolor':'Gray'},
27    )
28    [spine.set_visible(False) for spine in ax.spines.values()]
29    ax.xaxis.set_visible(False)
30    ax.yaxis.set_visible(False)
31    return fig

Use the function to create a map for 1985.

1fig = create_map_for_year('1985', 'National Under Five Mortality Rates in 1985')
2plt.show()

Map showing country Under Five Mortality Rates in 1985
Map showing country Under Five Mortality Rates in 1985



Display changes over the decades

The function is modified to plot a map for an axis.

 1def plot_ax_for_year(ax, year):
 2    u5mr_year = u5mr_world_df[year]
 3    ax.set_title(year,
 4                 fontsize = 'xx-large',  
 5                 fontweight = 'bold')
 6    ax.set_facecolor(plt.cm.Blues(.2))
 7
 8    u5mr_world_df.plot(
 9        ax = ax,
10        color = 'white',
11        edgecolor = 'black'
12    )
13
14    u5mr_world_df.plot(
15        column = u5mr_year,
16        cmap = 'OrRd',
17        ax = ax,
18        legend = True,
19        legend_kwds = {'label': "UFMR per 1000",
20                      'shrink': 0.7},
21        missing_kwds = {'facecolor':'Gray'},
22    )
23    [spine.set_visible(False) for spine in ax.spines.values()]
24    ax.xaxis.set_visible(False)
25    ax.yaxis.set_visible(False)
26    return ax

Create a plot with maps through the decades

 1fig, axs = plt.subplots(
 2    nrows = 4,
 3    ncols = 2,
 4    figsize = (15,15),
 5    facecolor = plt.cm.Blues(.2))
 6fig.suptitle('Changes in national Under Five Mortality Rates over time',
 7             fontsize = 'xx-large',  
 8             fontweight = 'bold')
 9years = [f'{x}' for x in range(1950, 2020, 10)] + ['2019']
10for i, ax in enumerate(axs.flatten()):
11    ax = plot_ax_for_year(ax, years[i])
12fig.tight_layout(pad=2)
13plt.show()

Map showing country Under Five Mortality Rates changes from 1950 to 2019
Map showing country Under Five Mortality Rates changes from 1950 to 2019



Conclusion

Geographical maps are a great way to present a global view of where data applies to different countries and regions. Geopandas and Geoplot are used to plot world maps with countries colored based on data of interest. The world maps produced here show the changes over time on Under Five Mortality Rates. This helps visualise particular regions of the world that are consistently doing worse off in addressing child mortality.

Use of Matplotlib creates nice static images, but one drawback of these is precisely that they are static. This data could be more informative if either the maps could be more interactive or displayed as an animation.





Under-five mortality rate:

is the probability of dying between birth and exactly 5 years of age, expressed per 1,000 live births.