Exploring data with Swift TabularData and DataFrame
TabularData was introduced in iOS 15 and can be used to load data from csv or json format into a DataFrame to explore and manipulate. This is not Python Pandas, but it does provide some basic functionality for exploring data and is fast and efficient. In this article, the data on the deaths from Covid is loaded, grouped and sorted to see the countries with the highest death toll from Covid.
Load data into a Swift DataFrame from a CSV file
The csv file on the deaths from Covid-19 will be used to work with the Swift DataFrame. As shown previously in Countries with highest confirmed cases of Covid-19 with Plotly and Countries with highest deaths from covid the data on the global deaths from Covid-19 is available from Johns Hopkins University - who have made the data available on GitHub. More information about COVID-19 and the coronavirus is available from Coronavirus disease (COVID-19) advice for the public.
Swift DataFrame can load the latest data on deaths from Covid-19 directly from the csv file in github. DataFrame provides an initialisation to create the DataFame with the contents of a csv file - contentsOfCSVFile.
1import TabularData
2import Foundation
3
4let deaths_path = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
5
6let options = CSVReadingOptions(hasHeaderRow: true, delimiter: ",")
7guard let fileUrl = URL(string: deaths_path) else {
8 fatalError("Error creating Url")
9}
10
11var covidDeathsDf = try! DataFrame(
12 contentsOfCSVFile: fileUrl,
13 options: options)
14
15print("\(covidDeathsDf)")
Loading data into a Swift DataFrame from a csv file
Use FormattingOptions to see more or less data
Use FormattingOptions to configure how to display the TabularData.
1// Use FormattingOptions to see more or less data
2let formatingOptions = FormattingOptions(maximumLineWidth: 150,
3 maximumCellWidth: 10,
4 maximumRowCount: 30,
5 includesColumnTypes: false)
6print(covidDeathsDf.description(options: formatingOptions))
Use FormattingOptions to print out more or less data from the DataFrame
List out all of the Column Names
It can be useful to list the column names from the loaded DataFrame.
1print(covidDeathsDf.columns.map { col in col.name } )
2
3["Province/State", "Country/Region", "Lat", "Long", "1/22/20", "1/23/20", "1/24/20",
4"1/25/20", "1/26/20", "1/27/20", "1/28/20", "1/29/20", "1/30/20", "1/31/20",
5"2/1/20", "2/2/20", "2/3/20", "2/4/20", "2/5/20", "2/6/20", "2/7/20", "2/8/20",
6"2/9/20", "2/10/20", "2/11/20", "2/12/20", "2/13/20", "2/14/20", "2/15/20",
7"2/16/20", "2/17/20", "2/18/20", "2/19/20", "2/20/20", "2/21/20", "2/22/20",
8"2/23/20", "2/24/20", "2/25/20", "2/26/20", "2/27/20", "2/28/20", "2/29/20",
9"3/1/20", "3/2/20", "3/3/20", "3/4/20", "3/5/20", "3/6/20", "3/7/20", "3/8/20",
10"3/9/20", "3/10/20", "3/11/20", "3/12/20", "3/13/20", "3/14/20", "3/15/20",
11"3/16/20", "3/17/20", "3/18/20", "3/19/20", "3/20/20", "3/21/20", "3/22/20",
12"3/23/20", "3/24/20", "3/25/20", "3/26/20", "3/27/20", "3/28/20", "3/29/20",
13"3/30/20", "3/31/20", "4/1/20", "4/2/20", "4/3/20", "4/4/20", "4/5/20", "4/6/20",
14"4/7/20", "4/8/20", "4/9/20", "4/10/20", "4/11/20", "4/12/20", "4/13/20", "4/14/20",
15"4/15/20", "4/16/20", "4/17/20", "4/18/20", "4/19/20", "4/20/20", "4/21/20",
16"4/22/20", "4/23/20", "4/24/20", "4/25/20", "4/26/20", "4/27/20", "4/28/20",
17"4/29/20", "4/30/20", "5/1/20", "5/2/20", "5/3/20", "5/4/20", "5/5/20", "5/6/20",
18"5/7/20", "5/8/20", "5/9/20", "5/10/20", "5/11/20", "5/12/20", "5/13/20", "5/14/20",
19"5/15/20", "5/16/20", "5/17/20", "5/18/20", "5/19/20", "5/20/20", "5/21/20",
20"5/22/20", "5/23/20", "5/24/20", "5/25/20", "5/26/20", "5/27/20", "5/28/20",
21"5/29/20", "5/30/20", "5/31/20", "6/1/20", "6/2/20", "6/3/20", "6/4/20", "6/5/20",
22"6/6/20", "6/7/20", "6/8/20", "6/9/20", "6/10/20", "6/11/20", "6/12/20", "6/13/20",
23"6/14/20", "6/15/20", "6/16/20", "6/17/20", "6/18/20", "6/19/20", "6/20/20",
24"6/21/20", "6/22/20", "6/23/20", "6/24/20", "6/25/20", "6/26/20", "6/27/20",
25"6/28/20", "6/29/20", "6/30/20", "7/1/20", "7/2/20", "7/3/20", "7/4/20", "7/5/20",
26"7/6/20", "7/7/20", "7/8/20", "7/9/20", "7/10/20", "7/11/20", "7/12/20", "7/13/20",
27"7/14/20", "7/15/20", "7/16/20", "7/17/20", "7/18/20", "7/19/20", "7/20/20",
28"7/21/20", "7/22/20", "7/23/20", "7/24/20", "7/25/20", "7/26/20", "7/27/20",
29"7/28/20", "7/29/20", "7/30/20", "7/31/20", "8/1/20", "8/2/20", "8/3/20", "8/4/20",
30"8/5/20", "8/6/20", "8/7/20", "8/8/20", "8/9/20", "8/10/20", "8/11/20", "8/12/20",
31"8/13/20", "8/14/20", "8/15/20", "8/16/20", "8/17/20", "8/18/20", "8/19/20",
32"8/20/20", "8/21/20", "8/22/20", "8/23/20", "8/24/20", "8/25/20", "8/26/20",
33"8/27/20", "8/28/20", "8/29/20", "8/30/20", "8/31/20", "9/1/20", "9/2/20", "9/3/20",
34"9/4/20", "9/5/20", "9/6/20", "9/7/20", "9/8/20", "9/9/20", "9/10/20", "9/11/20",
35"9/12/20", "9/13/20", "9/14/20", "9/15/20", "9/16/20", "9/17/20", "9/18/20",
36"9/19/20", "9/20/20", "9/21/20", "9/22/20", "9/23/20", "9/24/20", "9/25/20",
37"9/26/20", "9/27/20", "9/28/20", "9/29/20", "9/30/20", "10/1/20", "10/2/20",
38"10/3/20", "10/4/20", "10/5/20", "10/6/20", "10/7/20", "10/8/20", "10/9/20",
39"10/10/20", "10/11/20", "10/12/20", "10/13/20", "10/14/20", "10/15/20", "10/16/20",
40"10/17/20", "10/18/20", "10/19/20", "10/20/20", "10/21/20", "10/22/20", "10/23/20",
41"10/24/20", "10/25/20", "10/26/20", "10/27/20", "10/28/20", "10/29/20", "10/30/20",
42"10/31/20", "11/1/20", "11/2/20", "11/3/20", "11/4/20", "11/5/20", "11/6/20",
43"11/7/20", "11/8/20", "11/9/20", "11/10/20", "11/11/20", "11/12/20", "11/13/20",
44"11/14/20", "11/15/20", "11/16/20", "11/17/20", "11/18/20", "11/19/20", "11/20/20",
45"11/21/20", "11/22/20", "11/23/20", "11/24/20", "11/25/20", "11/26/20", "11/27/20",
46"11/28/20", "11/29/20", "11/30/20", "12/1/20", "12/2/20", "12/3/20", "12/4/20",
47"12/5/20", "12/6/20", "12/7/20", "12/8/20", "12/9/20", "12/10/20", "12/11/20",
48"12/12/20", "12/13/20", "12/14/20", "12/15/20", "12/16/20", "12/17/20", "12/18/20",
49"12/19/20", "12/20/20", "12/21/20", "12/22/20", "12/23/20", "12/24/20", "12/25/20",
50"12/26/20", "12/27/20", "12/28/20", "12/29/20", "12/30/20", "12/31/20", "1/1/21",
51"1/2/21", "1/3/21", "1/4/21", "1/5/21", "1/6/21", "1/7/21", "1/8/21", "1/9/21",
52"1/10/21", "1/11/21", "1/12/21", "1/13/21", "1/14/21", "1/15/21", "1/16/21",
53"1/17/21", "1/18/21", "1/19/21", "1/20/21", "1/21/21", "1/22/21", "1/23/21",
54"1/24/21", "1/25/21", "1/26/21", "1/27/21", "1/28/21", "1/29/21", "1/30/21",
55"1/31/21", "2/1/21", "2/2/21", "2/3/21", "2/4/21", "2/5/21", "2/6/21", "2/7/21",
56"2/8/21", "2/9/21", "2/10/21", "2/11/21", "2/12/21", "2/13/21", "2/14/21", "2/15/21",
57"2/16/21", "2/17/21", "2/18/21", "2/19/21", "2/20/21", "2/21/21", "2/22/21",
58"2/23/21", "2/24/21", "2/25/21", "2/26/21", "2/27/21", "2/28/21", "3/1/21", "3/2/21",
59"3/3/21", "3/4/21", "3/5/21", "3/6/21", "3/7/21", "3/8/21", "3/9/21", "3/10/21",
60"3/11/21", "3/12/21", "3/13/21", "3/14/21", "3/15/21", "3/16/21", "3/17/21",
61"3/18/21", "3/19/21", "3/20/21", "3/21/21", "3/22/21", "3/23/21", "3/24/21",
62"3/25/21", "3/26/21", "3/27/21", "3/28/21", "3/29/21", "3/30/21", "3/31/21",
63"4/1/21", "4/2/21", "4/3/21", "4/4/21", "4/5/21", "4/6/21", "4/7/21", "4/8/21",
64"4/9/21", "4/10/21", "4/11/21", "4/12/21", "4/13/21", "4/14/21", "4/15/21",
65"4/16/21", "4/17/21", "4/18/21", "4/19/21", "4/20/21", "4/21/21", "4/22/21",
66"4/23/21", "4/24/21", "4/25/21", "4/26/21", "4/27/21", "4/28/21", "4/29/21",
67"4/30/21", "5/1/21", "5/2/21", "5/3/21", "5/4/21", "5/5/21", "5/6/21", "5/7/21",
68"5/8/21", "5/9/21", "5/10/21", "5/11/21", "5/12/21", "5/13/21", "5/14/21", "5/15/21",
69"5/16/21", "5/17/21", "5/18/21", "5/19/21", "5/20/21", "5/21/21", "5/22/21",
70"5/23/21", "5/24/21", "5/25/21", "5/26/21", "5/27/21", "5/28/21", "5/29/21",
71"5/30/21", "5/31/21", "6/1/21", "6/2/21", "6/3/21", "6/4/21", "6/5/21", "6/6/21",
72"6/7/21", "6/8/21", "6/9/21", "6/10/21", "6/11/21", "6/12/21", "6/13/21", "6/14/21",
73"6/15/21", "6/16/21", "6/17/21", "6/18/21", "6/19/21", "6/20/21", "6/21/21",
74"6/22/21", "6/23/21", "6/24/21", "6/25/21", "6/26/21", "6/27/21", "6/28/21",
75"6/29/21", "6/30/21", "7/1/21", "7/2/21", "7/3/21", "7/4/21", "7/5/21", "7/6/21",
76"7/7/21", "7/8/21", "7/9/21", "7/10/21", "7/11/21", "7/12/21", "7/13/21", "7/14/21",
77"7/15/21", "7/16/21", "7/17/21", "7/18/21", "7/19/21", "7/20/21", "7/21/21",
78"7/22/21", "7/23/21", "7/24/21", "7/25/21", "7/26/21", "7/27/21", "7/28/21",
79"7/29/21", "7/30/21", "7/31/21", "8/1/21", "8/2/21", "8/3/21", "8/4/21", "8/5/21",
80"8/6/21", "8/7/21", "8/8/21", "8/9/21", "8/10/21", "8/11/21", "8/12/21", "8/13/21",
81"8/14/21", "8/15/21", "8/16/21", "8/17/21", "8/18/21", "8/19/21", "8/20/21",
82"8/21/21", "8/22/21", "8/23/21", "8/24/21", "8/25/21", "8/26/21", "8/27/21",
83"8/28/21", "8/29/21", "8/30/21", "8/31/21", "9/1/21", "9/2/21", "9/3/21", "9/4/21",
84"9/5/21", "9/6/21", "9/7/21", "9/8/21", "9/9/21", "9/10/21", "9/11/21", "9/12/21",
85"9/13/21", "9/14/21", "9/15/21", "9/16/21", "9/17/21", "9/18/21", "9/19/21",
86"9/20/21", "9/21/21", "9/22/21", "9/23/21", "9/24/21", "9/25/21", "9/26/21",
87"9/27/21", "9/28/21", "9/29/21", "9/30/21", "10/1/21", "10/2/21", "10/3/21",
88"10/4/21", "10/5/21", "10/6/21", "10/7/21", "10/8/21", "10/9/21", "10/10/21",
89"10/11/21", "10/12/21", "10/13/21", "10/14/21", "10/15/21", "10/16/21", "10/17/21",
90"10/18/21", "10/19/21", "10/20/21", "10/21/21", "10/22/21", "10/23/21", "10/24/21",
91"10/25/21", "10/26/21", "10/27/21", "10/28/21", "10/29/21", "10/30/21", "10/31/21",
92"11/1/21", "11/2/21", "11/3/21", "11/4/21", "11/5/21", "11/6/21", "11/7/21",
93"11/8/21", "11/9/21", "11/10/21", "11/11/21", "11/12/21", "11/13/21", "11/14/21",
94"11/15/21", "11/16/21", "11/17/21", "11/18/21", "11/19/21", "11/20/21", "11/21/21",
95"11/22/21", "11/23/21", "11/24/21", "11/25/21", "11/26/21", "11/27/21", "11/28/21",
96"11/29/21", "11/30/21", "12/1/21", "12/2/21", "12/3/21", "12/4/21", "12/5/21",
97"12/6/21", "12/7/21", "12/8/21", "12/9/21", "12/10/21", "12/11/21", "12/12/21",
98"12/13/21", "12/14/21", "12/15/21", "12/16/21", "12/17/21", "12/18/21", "12/19/21",
99"12/20/21", "12/21/21", "12/22/21", "12/23/21", "12/24/21", "12/25/21", "12/26/21",
100"12/27/21", "12/28/21", "12/29/21", "12/30/21", "12/31/21", "1/1/22", "1/2/22",
101"1/3/22", "1/4/22", "1/5/22", "1/6/22", "1/7/22", "1/8/22", "1/9/22", "1/10/22",
102"1/11/22", "1/12/22", "1/13/22", "1/14/22", "1/15/22", "1/16/22", "1/17/22",
103"1/18/22", "1/19/22", "1/20/22", "1/21/22", "1/22/22", "1/23/22", "1/24/22",
104"1/25/22", "1/26/22", "1/27/22", "1/28/22", "1/29/22", "1/30/22", "1/31/22",
105"2/1/22", "2/2/22", "2/3/22", "2/4/22", "2/5/22", "2/6/22", "2/7/22", "2/8/22",
106"2/9/22", "2/10/22", "2/11/22", "2/12/22", "2/13/22", "2/14/22", "2/15/22",
107"2/16/22", "2/17/22", "2/18/22", "2/19/22", "2/20/22", "2/21/22", "2/22/22",
108"2/23/22", "2/24/22", "2/25/22", "2/26/22", "2/27/22", "2/28/22", "3/1/22", "3/2/22",
109"3/3/22", "3/4/22", "3/5/22", "3/6/22", "3/7/22", "3/8/22", "3/9/22", "3/10/22",
110"3/11/22", "3/12/22", "3/13/22", "3/14/22", "3/15/22", "3/16/22", "3/17/22",
111"3/18/22", "3/19/22", "3/20/22", "3/21/22", "3/22/22", "3/23/22", "3/24/22",
112"3/25/22", "3/26/22", "3/27/22", "3/28/22", "3/29/22", "3/30/22", "3/31/22",
113"4/1/22", "4/2/22", "4/3/22", "4/4/22", "4/5/22", "4/6/22", "4/7/22", "4/8/22",
114"4/9/22", "4/10/22", "4/11/22", "4/12/22", "4/13/22", "4/14/22", "4/15/22",
115"4/16/22", "4/17/22", "4/18/22", "4/19/22", "4/20/22", "4/21/22", "4/22/22",
116"4/23/22", "4/24/22", "4/25/22", "4/26/22", "4/27/22", "4/28/22", "4/29/22",
117"4/30/22"]
Load selected columns
Not all of the data in the csv file needs to be loaded. This is very useful when the data can contain hundreds of thousands of rows and columns, which can take some time to load. A lot of exploration and experimentation can be done quickly on a subset of the data. The column names can be specified using an array of strings and the rows can be be specified by row index. This code specifies the most recent columns and just loads the first 15 rows of data.
1import TabularData
2import Foundation
3
4let deaths_path = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
5
6guard let fileUrl = URL(string: deaths_path) else {
7 fatalError("Error creating Url")
8}
9
10let columnNames = ["Province/State",
11 "Country/Region",
12 "Lat",
13 "Long",
14 "4/25/22",
15 "4/26/22",
16 "4/27/22",
17 "4/28/22",
18 "4/29/22",
19 "4/30/22"]
20
21let options = CSVReadingOptions(hasHeaderRow: true, delimiter: ",")
22let covidDeathsDf = try! DataFrame(
23 contentsOfCSVFile: fileUrl,
24 columns: columnNames,
25 rows: 0..<20,
26 options: options)
27
28print("\(covidDeathsDf)")
The data type can also be specified for each column when loading data from a csv file. This has the advantage of throwing an error if the data does not match the expected type.
1let columnNames = ["Province/State",
2 "Country/Region",
3 "Lat",
4 "Long",
5 "4/25/22",
6 "4/26/22",
7 "4/27/22",
8 "4/28/22",
9 "4/29/22",
10 "4/30/22"]
11
12// Specify types for the data in the columns
13let columnTypes: [String : CSVType] = ["Province/State": .string,
14 "Country/Region": .string,
15 "Lat": .double,
16 "Long": .double,
17 "4/25/22": .integer,
18 "4/26/22": .integer,
19 "4/27/22": .integer,
20 "4/28/22": .integer,
21 "4/29/22": .integer,
22 "4/30/22": .integer]
23
24let options = CSVReadingOptions(hasHeaderRow: true, delimiter: ",")
25let covidDeathsDf = try! DataFrame(
26 contentsOfCSVFile: fileUrl,
27 columns: columnNames,
28 rows: 0..<20,
29 types: columnTypes,
30 options: options)
31
32print("\(covidDeathsDf)")
33
34
35"""
36┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
37┃ ┃ Province/State ┃ Country/Region ┃ Lat ┃ Long ┃ 4/25/22 ┃ 4/26/22 ┃ 4/27/22 ┃ 4/28/22 ┃ 4/29/22 ┃ 4/30/22 ┃
38┃ ┃ <String> ┃ <String> ┃ <Double> ┃ <Double> ┃ <Int> ┃ <Int> ┃ <Int> ┃ <Int> ┃ <Int> ┃ <Int> ┃
39┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
40│ 0 │ nil │ Afghanistan │ 33.93911 │ 67.709953 │ 7,683 │ 7,683 │ 7,683 │ 7,683 │ 7,683 │ 7,683 │
41│ 1 │ nil │ Albania │ 41.1533 │ 20.1683 │ 3,496 │ 3,496 │ 3,496 │ 3,496 │ 3,496 │ 3,496 │
42│ 2 │ nil │ Algeria │ 28.0339 │ 1.6596 │ 6,875 │ 6,875 │ 6,875 │ 6,875 │ 6,875 │ 6,875 │
43│ 3 │ nil │ Andorra │ 42.5063 │ 1.5218 │ 153 │ 153 │ 153 │ 153 │ 153 │ 153 │
44│ 4 │ nil │ Angola │ -11.2027 │ 17.8739 │ 1,900 │ 1,900 │ 1,900 │ 1,900 │ 1,900 │ 1,900 │
45│ 5 │ nil │ Antarctica │ -71.9499 │ 23.347 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │
46│ 6 │ nil │ Antigua and Barbuda │ 17.0608 │ -61.7964 │ 135 │ 135 │ 135 │ 136 │ 136 │ 136 │
47│ 7 │ nil │ Argentina │ -38.4161 │ -63.6167 │ 128,542 │ 128,542 │ 128,542 │ 128,542 │ 128,542 │ 128,542 │
48│ 8 │ nil │ Armenia │ 40.0691 │ 45.0382 │ 8,622 │ 8,622 │ 8,622 │ 8,622 │ 8,622 │ 8,622 │
49│ 9 │ Australian Capital Territory │ Australia │ -35.4735 │ 149.0124 │ 50 │ 50 │ 50 │ 50 │ 50 │ 51 │
50│ 10 │ New South Wales │ Australia │ -33.8688 │ 151.2093 │ 2,714 │ 2,724 │ 2,743 │ 2,749 │ 2,769 │ 2,774 │
51│ 11 │ Northern Territory │ Australia │ -12.4634 │ 130.8456 │ 40 │ 40 │ 42 │ 43 │ 43 │ 43 │
52│ 12 │ Queensland │ Australia │ -27.4698 │ 153.0251 │ 841 │ 850 │ 857 │ 863 │ 863 │ 868 │
53│ 13 │ South Australia │ Australia │ -34.9285 │ 138.6007 │ 332 │ 332 │ 339 │ 342 │ 346 │ 348 │
54│ 14 │ Tasmania │ Australia │ -42.8821 │ 147.3272 │ 48 │ 49 │ 49 │ 50 │ 50 │ 51 │
55│ 15 │ Victoria │ Australia │ -37.8136 │ 144.9631 │ 2,902 │ 2,932 │ 2,942 │ 2,948 │ 2,967 │ 2,967 │
56│ 16 │ Western Australia │ Australia │ -31.9505 │ 115.8605 │ 125 │ 135 │ 142 │ 142 │ 148 │ 148 │
57│ 17 │ nil │ Austria │ 47.5162 │ 14.5501 │ 18,054 │ 18,076 │ 18,100 │ 18,117 │ 18,138 │ 18,156 │
58│ 18 │ nil │ Azerbaijan │ 40.1431 │ 47.5769 │ 9,707 │ 9,707 │ 9,707 │ 9,707 │ 9,709 │ 9,709 │
59│ 19 │ nil │ Bahamas │ 25.025885 │ -78.035889 │ 789 │ 789 │ 789 │ 789 │ 789 │ 798 │
60└────┴──────────────────────────────┴─────────────────────┴───────────┴────────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
6120 rows, 10 columns
62"""
Use array of strings to specify the columns to load and array of numbers to limit to first 20 rows
Group data by a row
The data for Covid-19 contains fields for Country and State and it can be seen that there are many entries for some countries such as Australia. It is possible to combine all the data by a certain column using the grouped(by:) method. The following code groups the data by country and sums up the values for the latest column in the dataframe.
1import TabularData
2import Foundation
3
4let deaths_path = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
5
6guard let fileUrl = URL(string: deaths_path) else {
7 fatalError("Error creating Url")
8}
9
10let columnNames = ["Country/Region",
11 "4/30/22"]
12
13let columnTypes: [String : CSVType] = ["Country/Region": .string,
14 "4/30/22": .integer]
15
16let options = CSVReadingOptions(hasHeaderRow: true, delimiter: ",")
17let covidDeathsDf = try! DataFrame(
18 contentsOfCSVFile: fileUrl,
19 columns: columnNames,
20 types: columnTypes,
21 options: options)
22
23print("\(covidDeathsDf)")
24
25let countryTotals = covidDeathsDf
26 .grouped(by: "Country/Region")
27 .sums("4/30/22", Int.self, order: .descending)
28
29print("\(countryTotals)")
30
31"""
32┏━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
33┃ ┃ Country/Region ┃ sum(4/30/22) ┃
34┃ ┃ <String> ┃ <Int> ┃
35┡━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
36│ 0 │ US │ 993,712 │
37│ 1 │ Brazil │ 663,736 │
38│ 2 │ India │ 523,843 │
39│ 3 │ Russia │ 368,319 │
40│ 4 │ Mexico │ 324,334 │
41│ 5 │ Peru │ 212,810 │
42│ 6 │ United Kingdom │ 175,552 │
43│ 7 │ Italy │ 163,507 │
44│ 8 │ Indonesia │ 156,257 │
45│ 9 │ France │ 146,967 │
46│ 10 │ Iran │ 141,083 │
47│ 11 │ Colombia │ 139,797 │
48│ 12 │ Germany │ 135,461 │
49│ 13 │ Argentina │ 128,542 │
50│ 14 │ Poland │ 116,059 │
51│ 15 │ Ukraine │ 112,459 │
52│ 16 │ Spain │ 104,456 │
53│ 17 │ South Africa │ 100,363 │
54│ 18 │ Turkey │ 98,771 │
55│ 19 │ Romania │ 65,486 │
56┢╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍╍╍╍╍┪
57┇ ... ┇
58┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛
59198 rows, 2 columns
60"""
Grouping data by country and getting the total for each country
Rename columns in the Dataframe
Finally, it is possible to rename the column names in the dataframe.
1import TabularData
2import Foundation
3
4let deaths_path = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
5
6guard let fileUrl = URL(string: deaths_path) else {
7 fatalError("Error creating Url")
8}
9
10let columnNames = ["Country/Region",
11 "4/30/22"]
12
13let columnTypes: [String : CSVType] = ["Country/Region": .string,
14 "4/30/22": .integer]
15
16let options = CSVReadingOptions(hasHeaderRow: true, delimiter: ",")
17let covidDeathsDf = try! DataFrame(
18 contentsOfCSVFile: fileUrl,
19 columns: columnNames,
20 types: columnTypes,
21 options: options)
22
23var countryTotals = covidDeathsDf
24 .grouped(by: "Country/Region")
25 .sums("4/30/22", Int.self, order: .descending)
26
27countryTotals.renameColumn("Country/Region", to: "Country")
28countryTotals.renameColumn("sum(4/30/22)", to: "Total Deaths")
29
30let formatingOptions = FormattingOptions(maximumLineWidth: 100, includesColumnTypes: false)
31print("\(countryTotals.prefix(20).description(options: formatingOptions))")
Grouping data by country and displaying countries with the highest deaths from Covid-19
Countries with the highest number of deaths from Covid-19 as of April 30 2022
Country | Total Deaths | |
---|---|---|
0 | US | 993,712 |
1 | Brazil | 663,736 |
2 | India | 523,843 |
3 | Russia | 368,319 |
4 | Mexico | 324,334 |
5 | Peru | 212,810 |
6 | United Kingdom | 175,552 |
7 | Italy | 163,507 |
8 | Indonesia | 156,257 |
9 | France | 146,967 |
10 | Iran | 141,083 |
11 | Colombia | 139,797 |
12 | Germany | 135,461 |
13 | Argentina | 128,542 |
14 | Poland | 116,059 |
15 | Ukraine | 112,459 |
16 | Spain | 104,456 |
17 | South Africa | 100,363 |
18 | Turkey | 98,771 |
19 | Romania | 65,486 |
Conclusion
The TabularData framework provides functionality to easily load data into table format and manipulate the data to provide useful information from the raw data. This article loaded data from a csv file containing daily global deaths from Covid-19 for each country. The DataFrame is great for exploring data and it is easy to group, filter and sort data. The documentation on TabularData is rather sparse with function definitions and it would benefit from more (or any) sample code and a more detailed explanation of the available functionality.