Pandas - How to use iloc and loc

Pandas - How to use iloc and loc

Pandas dataframes can store very large amounts of data and iloc and loc can be used to slice and dice the data into manageable chunks.


Summary of iloc and loc functions

Select a row df.iloc[3] df.loc['data_5']
Select a column df.iloc[:, 3] df.loc[:, 'indigo']
Select a cell df.iloc[2, 3] df.loc['data_2_', 'indigo']
Select rows and columns df.iloc[[0,1,2,-3,-2,-1], [0,1,-2,-1]] df.loc[['data_2', 'data_5'], ['green', 'red']]
Select sequence of rows and columns df.iloc[2:3, 4:] df.loc['data_2':'data_5', 'red':'green']


Python modules used in this article

1# import modules
2import pandas as pd
3import numpy as np


iloc

iloc is integer index based method of selecting rows and columns from a Pandas dataframe. This can be used in a variety of ways to slice and dice a dataframe. iloc can take one or two parameters, the first parameter specifies the row index and the second specifies the column index. The entire row is returned when just one parameter is used.

There is also flexibility in the parameters passed in; either an integer; a list of integers; an array slice; or a callable function


Create dataframe with sample data

A dataframe was created with a sequence of numbers for the colors of the rainbow. The index is changed to be text data to illustrate that it is independent of the iloc integer number.

 1
 2# Create Dataframe
 3d = {}
 4for i,c in enumerate(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']):
 5    d[c] = [(10*i)+x for x in range(10)]
 6df = pd.DataFrame(d, index = [f'data_{x+1}' for x in range(10)])
 7
 8print(df.shape)
 9"""
10(10, 7)
11"""
12
13print(df)
14"""
15         red  orange  yellow  green  blue  indigo  violet
16data_1     0      10      20     30    40      50      60
17data_2     1      11      21     31    41      51      61
18data_3     2      12      22     32    42      52      62
19data_4     3      13      23     33    43      53      63
20data_5     4      14      24     34    44      54      64
21data_6     5      15      25     35    45      55      65
22data_7     6      16      26     36    46      56      66
23data_8     7      17      27     37    47      57      67
24data_9     8      18      28     38    48      58      68
25data_10    9      19      29     39    49      59      69
26"""


Retrieve a single row of data with iloc

A single row is returned by passing in the index of the row to retrieve. Note that the index is zero-based, so that iloc[3] returns the fourth row in the dataframe. Passing an index that is out of bounds of the dataframe will throw an error. The single row is returned as a series.

 1df.iloc[3]
 2
 3"""
 4red        3
 5orange    13
 6yellow    23
 7green     33
 8blue      43
 9indigo    53
10violet    63
11Name: data_4, dtype: int64
12"""
13
14print(type(df.iloc[3]))
15"""
16<class 'pandas.core.series.Series'>
17"""

Select a single row using iloc



Retrieve multiple rows of data with iloc

A list of row indexes can be passed in to select the selected rows.

1df.iloc[[3,4,8]]
2
3"""
4        red  orange  yellow  green  blue  indigo  violet
5data_4    3      13      23     33    43      53      63
6data_5    4      14      24     34    44      54      64
7data_9    8      18      28     38    48      58      68
8"""

The list of indexes can contain just a single index, which will return just the selected row in a dataframe.

 1df.iloc[[3]]
 2
 3"""
 4        red  orange  yellow  green  blue  indigo  violet
 5data_4    3      13      23     33    43      53      63
 6"""
 7
 8print(type(df.iloc[[3]]))
 9"""
10<class 'pandas.core.frame.DataFrame'>
11"""

Any valid array slice or list comprehension can be used to specify the list of row indexes to be selected.

 1# Array slice from 2 to 5
 2df.iloc[2:5]
 3"""
 4        red  orange  yellow  green  blue  indigo  violet
 5data_3    2      12      22     32    42      52      62
 6data_4    3      13      23     33    43      53      63
 7data_5    4      14      24     34    44      54      64
 8"""
 9
10
11# Array from 1 to the end in steps of 3
12df.iloc[1::3]
13"""
14        red  orange  yellow  green  blue  indigo  violet
15data_2    1      11      21     31    41      51      61
16data_5    4      14      24     34    44      54      64
17data_8    7      17      27     37    47      57      67
18"""
19
20
21# Array from 8 to the end
22df.iloc[8:]
23"""
24         red  orange  yellow  green  blue  indigo  violet
25data_9     8      18      28     38    48      58      68
26data_10    9      19      29     39    49      59      69
27"""
28
29# Array specifying the first and last 2 rows
30df.iloc[[0,1,-2,-1]]
31"""
32         red  orange  yellow  green  blue  indigo  violet
33data_1     0      10      20     30    40      50      60
34data_2     1      11      21     31    41      51      61
35data_9     8      18      28     38    48      58      68
36data_10    9      19      29     39    49      59      69
37"""

Select multiple rows using iloc



Retrieving data using Row and Column Index

Two parameters can be passed in to iloc to specify the row and the column indexes. The two parameters are separated by a comma "," and the entire column can be selected by using ":" for the first parameter.

 1# Select a single cell at row 3 and column 5
 2df.iloc[3, 5]
 3"""
 453
 5"""
 6
 7# Select all of column 5
 8df.iloc[:, 5]
 9"""
10data_1     50
11data_2     51
12data_3     52
13data_4     53
14data_5     54
15data_6     55
16data_7     56
17data_8     57
18data_9     58
19data_10    59
20Name: indigo, dtype: int64
21"""
22
23# Select all of column 2 and 5
24df.iloc[:, [2,5]]
25"""
26         yellow  indigo
27data_1       20      50
28data_2       21      51
29data_3       22      52
30data_4       23      53
31data_5       24      54
32data_6       25      55
33data_7       26      56
34data_8       27      57
35data_9       28      58
36data_10      29      59
37"""

Select single column using iloc



Retrieve a sub grid from a dataframe

iloc returns a sub grid from a dataframe when a range of rows and a range of columns are used. It can be useful to return the four corners of a dataframe when working with large datasets.

 1df.iloc[2:5, 2:5]
 2"""
 3        yellow  green  blue
 4data_3      22     32    42
 5data_4      23     33    43
 6data_5      24     34    44
 7
 8# select four corners
 9df.iloc[[0,1,2,-3,-2,-1], [0,1,-2,-1]]
10"""
11         red  orange  indigo  violet
12data_1     0      10      50      60
13data_2     1      11      51      61
14data_3     2      12      52      62
15data_8     7      17      57      67
16data_9     8      18      58      68
17data_10    9      19      59      69
18"""

Select multiple rows and columns using iloc



loc

loc is label-based method of selecting rows and columns from a Pandas dataframe. This can also be used in a variety of ways to slice and dice a dataframe. loc can take one parameter or two parameters corresponding to the labels of the row index and the column names. There is the same flexibility as iloc in the parameters passed in; either an index label; a list of index labels; an array slice; or a callable function.

Passing an index label that is not in the dataframe throws an error.

Retrieve Row data with loc

Passing a single parameter retrieves the entire row for index labels matching the parameter. The row is returned as a series.

 1df.loc['data_5']
 2"""
 3red        4
 4orange    14
 5yellow    24
 6green     34
 7blue      44
 8indigo    54
 9violet    64
10Name: data_5, dtype: int64
11"""


Retrieve multiple rows with loc

A list of index labels returns the corresponding rows. A list of just one index label returns a dataframe with just one row.

 1df.loc[['data_2', 'data_5']]
 2"""
 3        red  orange  yellow  green  blue  indigo  violet
 4data_2    1      11      21     31    41      51      61
 5data_5    4      14      24     34    44      54      64
 6"""
 7
 8# A list of just one index label
 9df.loc[['data_5']]
10"""
11        red  orange  yellow  green  blue  indigo  violet
12data_5    4      14      24     34    44      54      64
13"""

A slice object of index labels returns the corresponding rows, inclusive of the stop index (this is different to usual python slices). Specifying the index labels in the wrong order will return an empty dataframe.

 1# loc slice includes the start and stop index
 2df.loc['data_2':'data_5']
 3"""
 4        red  orange  yellow  green  blue  indigo  violet
 5data_2    1      11      21     31    41      51      61
 6data_3    2      12      22     32    42      52      62
 7data_4    3      13      23     33    43      53      63
 8data_5    4      14      24     34    44      54      64
 9"""
10
11# compare to iloc slice that does not include the stop index
12df.iloc[1:4]
13"""
14        red  orange  yellow  green  blue  indigo  violet
15data_2    1      11      21     31    41      51      61
16data_3    2      12      22     32    42      52      62
17data_4    3      13      23     33    43      53      63
18"""
19
20# Specifying the index labels in the wrong order will return an empty dataframe
21df.loc['data_5':'data_2']
22# equivalent to df.loc[[]]
23"""
24Empty DataFrame
25Columns: [red, orange, yellow, green, blue, indigo, violet]
26Index: []
27"""
28
29# All rows from data_8 to the end
30df.loc['data_8':]
31"""
32         red  orange  yellow  green  blue  indigo  violet
33data_8     7      17      27     37    47      57      67
34data_9     8      18      28     38    48      58      68
35data_10    9      19      29     39    49      59      69
36"""

loc can take a boolean array of the same length as the axis being sliced. This boolean array is usually created as the result of applying some other logic to the data set. This first example finds the rows where the value for 'yellow' is greater than 26. The logic to filter the data can get more complicated to narrow down to the data of interese once the logic creates a boolean array of the same length of the axis.

 1df['yellow'] > 26
 2"""
 3data_1     False
 4data_2     False
 5data_3     False
 6data_4     False
 7data_5     False
 8data_6     False
 9data_7     False
10data_8      True
11data_9      True
12data_10     True
13Name: yellow, dtype: bool
14"""
15
16# All rows where yellow values are greater than 26
17df.loc[df['yellow'] > 26]
18"""
19         red  orange  yellow  green  blue  indigo  violet
20data_8     7      17      27     37    47      57      67
21data_9     8      18      28     38    48      58      68
22data_10    9      19      29     39    49      59      69
23"""
24
25# All rows where yellow is a multiple of 2 and green is a multiple of 3
26df.loc[(df['yellow'] % 2 == 0) & (df['green'] % 3 == 0)]
27"""
28        red  orange  yellow  green  blue  indigo  violet
29data_1    0      10      20     30    40      50      60
30data_7    6      16      26     36    46      56      66
31"""


Retrieve Rows and Columns with loc

Similar to iloc, two parameters can be used for row labels and column labels. Use df.index to see a list of all the index labels and df.columns to see a list of all the column.

 1print(df.index)
 2"""
 3Index(['data_1', 'data_2', 'data_3', 'data_4', 'data_5', 'data_6', 'data_7',
 4       'data_8', 'data_9', 'data_10'],
 5      dtype='object')
 6"""
 7
 8print(df.columns)
 9"""
10Index(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'], dtype='object')
11"""
12
13# Select a single cell - data_8 for indigo
14df.loc['data_8', 'indigo']
15"""
1657
17"""

Select the entire column by passing : as the first parameter.

 1# select full column for indigo and red
 2df.loc[:, ['indigo', 'red']]
 3"""
 4         indigo  red
 5data_1       50    0
 6data_2       51    1
 7data_3       52    2
 8data_4       53    3
 9data_5       54    4
10data_6       55    5
11data_7       56    6
12data_8       57    7
13data_9       58    8
14data_10      59    9
15"""
16
17
18df.loc[['data_2', 'data_5'], ['green', 'red']]
19"""
20        green  red
21data_2     31    1
22data_5     34    4
23"""
24
25df.loc[(df['yellow'] % 2 == 0) & (df['green'] % 3 == 0), ['yellow', 'blue']]
26"""
27        yellow  blue
28data_1      20    40
29data_7      26    46
30"""

Select multiple rows and columns using loc



Conclusion

iloc and loc are very useful and powerful methods to quickly filter down data in a dataframe. Both are index-based for the rows and column-based for the columns. iloc retrieves data based on the index specified for the row and/or column, whereas loc is based on the values of the index and the values of the columns. iloc can be useful to sample data based on position in the dataframe without needing to know mush about the data. loc is more useful to filter down the data based on specific rows and column data.

Both methods can accept one or two parameters and return single cell content or a range of rows or columns or specific intersections of rows and columns.