Pandas - Dataframe Where function
The Where method of Pandas Dataframe is used to replace data in a dataframe where the condition is false. It can return a new dataframe or modify the data in the original dataframe, thereby saving memory and object creation.
The syntax of the Dataframe Where
function can be confusing. At a minumum it requires
just one parameter - the condition to apply to the dataframe, but it replaces all the
data where the condition is false. It can also be used with two parameters where the
first parameter is the condition and the second is a value or an operation to be
applied when the condition is false.
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Where
Pandas dataframe where function is used to replace data in a dataframe where the condition is false. It is not analogous to the Where clause in SQL databases, which is used to filter on data where the condition is true. It is easier to demonstrate with a couple of examples.
Where numbers are even
Create a condition to identify even numbers in the dataframe. The following code applies modulo 2 to each element in the dataframe and shows true where the result is equal to 0.
Create a dataframe.
1# Create Dataframe
2np.random.seed(2)
3df = pd.DataFrame(np.random.randint(0, 10, 21).reshape(-1, 7),
4 columns=['A', 'B', 'C', 'D', 'E', 'F', 'G'])
5
6"""
7 A B C D E F G
80 8 8 6 2 8 7 2
91 1 5 4 4 5 7 3
102 6 4 3 7 6 1 3
11"""
Create a condition to identify even numbers.
1cond = df % 2 == 0
2df % 2 == 0
3
4 A B C D E F G
50 True True True True True False True
61 False False True True False False False
72 True True False False True False False
Applying this condition in a where clause to the dataframe replaces the data where the
condition is false with NaN
. In this case where the condition is false is all the
odd numbers.
1df.where(cond)
2
3"""
4 A B C D E F G
50 8.0 8.0 6.0 2.0 8.0 NaN 2.0
61 NaN NaN 4.0 4.0 NaN NaN NaN
72 6.0 4.0 NaN NaN 6.0 NaN NaN
8"""
Subtract 1 from the original dataframe values to round down all the odd numbers to the nearest even number.
1df.where(cond, df-1)
2
3"""
4 A B C D E F G
50 8 8 6 2 8 6 2
61 0 4 4 4 4 6 2
72 6 4 2 6 6 0 2
8"""
Pandas - Use Where to replace odd numbers
Replace negative numbers in place with zero
It is frequently required as part of data cleaning to replace certain values with more
valid data. The following replaces negative numbers with zero. The inplace
parameter
is set to true to replace the values in the original dataframe.
1# Create a dataframe with some negative numbers
2np.random.seed(22)
3df2 = pd.DataFrame(np.random.randint(-20, 100, 21).reshape(-1, 7),
4 columns=['A', 'B', 'C', 'D', 'E', 'F', 'G'])
5
6df2
7"""
8 A B C D E F G
90 97 -16 24 44 80 82 98
101 87 64 -12 75 80 73 -2
112 -6 25 68 73 19 14 21
12"""
1df2.where(df2>=0, 0, inplace=True)
2
3df2
4"""
5 A B C D E F G
60 97 0 24 44 80 82 98
71 87 64 0 75 80 73 0
82 0 25 68 73 19 14 21
9"""
Pandas - Use Where to replace negative numbers
Replace duplicates in rows
The condition can get more complicated such as looking for duplicate values in a row.
The opposite of a condition can then be used by preceding the condition with the tilde
character ~
to replace where the condition is true. Technically, the where is still
replacing where the condition is false or in this case where the opposite of the
condition is false.
1# Create a new dataframe
2np.random.seed(22)
3df3 = pd.DataFrame(np.random.randint(1, 20, 21).reshape(-1, 7),
4 columns=['A', 'B', 'C', 'D', 'E', 'F', 'G'])
5
6df3
7"""
8 A B C D E F G
90 5 13 1 5 7 12 9
101 5 19 15 14 8 3 10
112 9 9 6 16 3 19 18
12"""
Create a condition to identify duplicates in each row of the dataframe. Use the opposite
of this condition to display a True
wherever there is a duplicate.
1# Create a condition to identify duplicates in a row
2cond = df3.apply(pd.Series.duplicated, axis = 1)
3df3.where(~cond, True)
4
5"""
6 A B C D E F G
70 5 13 1 True 7 12 9
81 5 19 15 14 8 3 10
92 9 True 6 16 3 19 18
10"""
Modify the duplicates by applying a factor to the original dataframe value.
1df3.where(~cond, df3*1.01)
2
3"""
4 A B C D E F G
50 5 13.00 1 5.05 7 12 9
61 5 19.00 15 14.00 8 3 10
72 9 9.09 6 16.00 3 19 18
8"""
Pandas - Use Where to modify duplicate numbers in each row
Conclusion
Pandas Dataframe Where
method is used to replace data in the dataframe where
the condition is false and is not used for filtering data like the Where
clause in SQL.
Filtering data based on conditions in rows and columns can be done with loc
and iloc
. This is more analagous to the Where clause in SQL and more
information can be found in Pandas - How to use iloc and loc
Where replaces all values in a dataframe that do not meet the where
condition either with NaN
or with the specified value(s).