Home » Pandas: How to Create Boolean Column Based on Condition

Pandas: How to Create Boolean Column Based on Condition

by Erma Khan

You can use the following basic syntax to create a boolean column based on a condition in a pandas DataFrame:

df['boolean_column'] = np.where(df['some_column'] > 15, True, False)

This particular syntax creates a new boolean column with two possible values:

  • True if the value in some_column is greater than 15.
  • False if the value in some_column is less than or equal to 15.

The following example shows how to use this syntax in practice.

Example: Create Boolean Column Based on Condition in Pandas

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [5, 17, 7, 19, 12, 13, 9, 24]})

#view DataFrame
print(df)

  team  points
0    A       5
1    A      17
2    A       7
3    A      19
4    B      12
5    B      13
6    B       9
7    B      24   

We can use the following code to create a new column called good_player that returns True if the value in the points column is greater than 15 or False otherwise:

import numpy as np

#create new boolean column based on value in points column
df['good_player'] = np.where(df['points'] > 15, True, False)

#view updated DataFrame
print(df)

  team  points  good_player
0    A       5        False
1    A      17         True
2    A       7        False
3    A      19         True
4    B      12        False
5    B      13        False
6    B       9        False
7    B      24         True

Notice that the new column called good_player only contains two values: True or False.

We can use the dtypes() function to verify that the new good_player column is indeed a boolean column:

#display data type of good_player column
df['good_player'].dtype

dtype('bool')

The new good_player column is indeed a boolean column.

Also note that you could return numeric values such as 1 and 0 instead of True and False if you’d like:

import numpy as np

#create new boolean column based on value in points column
df['good_player'] = np.where(df['points'] > 15, 1, 0)

#view updated DataFrame
print(df)

  team  points  good_player
0    A       5            0
1    A      17            1
2    A       7            0
3    A      19            1
4    B      12            0
5    B      13            0
6    B       9            0
7    B      24            1

The good_player column now contains a 1 if the corresponding value in the points column is greater than 15.

Otherwise, it contains a value of 0.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Select Rows by Multiple Conditions in Pandas
How to Create a New Column Based on a Condition in Pandas
How to Filter a Pandas DataFrame on Multiple Conditions

Related Posts