You can use the following methods to check if a string in a pandas DataFrame contains multiple substrings:
Method 1: Check if String Contains One of Several Substrings
df['string_column'].str.contains('|'.join(['string1', 'string2']))
Method 2: Check if String Contains Several Substrings
df['string_column'].str.contains(r'^(?=.*string1)(?=.*string2)')
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team' : ['Good East Team', 'Good West Team', 'Great East Team',
'Great West Team', 'Bad East Team', 'Bad West Team'],
'points' : [93, 99, 105, 110, 85, 88]})
#view DataFrame
print(df)
team points
0 Good East Team 93
1 Good West Team 99
2 Great East Team 105
3 Great West Team 110
4 Bad East Team 85
5 Bad West Team 88
Example 1: Check if String Contains One of Several Substrings
We can use the following syntax to check if each string in the team column contains either the substring “Good” or “East”:
#create new column that checks if each team name contains 'Good' or 'East'
df['good_or_east'] = df['team'].str.contains('|'.join(['Good', 'East']))
#view updated DataFrame
print(df)
team points good_or_east
0 Good East Team 93 True
1 Good West Team 99 True
2 Great East Team 105 True
3 Great West Team 110 False
4 Bad East Team 85 True
5 Bad West Team 88 False
The new good_or_east column returns the following values:
- True if team contains “Good” or “East”
- False if team contains neither “Good” nor “East”
Note: The | operator stands for “or” in pandas.
Example 2: Check if String Contains Several Substrings
We can use the following syntax to check if each string in the team column contains the substring “Good” and “East”:
#create new column that checks if each team name contains 'Good' and 'East'
df['good_and_east'] = df['team'].str.contains(r'^(?=.*Good)(?=.*East)')
#view updated DataFrame
print(df)
team points good_and_east
0 Good East Team 93 True
1 Good West Team 99 False
2 Great East Team 105 False
3 Great West Team 110 False
4 Bad East Team 85 False
5 Bad West Team 88 False
The new good_and_east column returns the following values:
- True if team contains “Good” and “East”
- False if team doesn’t contain “Good” and “East”
Notice that only one True value is returned since there is only one team name that contains the substring “Good” and the substring “East.”
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: Add Column from One DataFrame to Another
Pandas: Get Rows Which Are Not in Another DataFrame
Pandas: How to Check if Multiple Columns are Equal