Home » Pandas: Check if String Contains Multiple Substrings

Pandas: Check if String Contains Multiple Substrings

by Erma Khan

You can use the following methods to check if a string in a pandas DataFrame contains multiple substrings:

Method 1: Check if String Contains One of Several Substrings

df['string_column'].str.contains('|'.join(['string1', 'string2']))

Method 2: Check if String Contains Several Substrings

df['string_column'].str.contains(r'^(?=.*string1)(?=.*string2)')

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team' : ['Good East Team', 'Good West Team', 'Great East Team',
                             'Great West Team', 'Bad East Team', 'Bad West Team'],
                   'points' : [93, 99, 105, 110, 85, 88]})

#view DataFrame
print(df)

              team  points
0   Good East Team      93
1   Good West Team      99
2  Great East Team     105
3  Great West Team     110
4    Bad East Team      85
5    Bad West Team      88

Example 1: Check if String Contains One of Several Substrings

We can use the following syntax to check if each string in the team column contains either the substring “Good” or “East”:

#create new column that checks if each team name contains 'Good' or 'East'
df['good_or_east'] = df['team'].str.contains('|'.join(['Good', 'East']))

#view updated DataFrame
print(df)

              team  points  good_or_east
0   Good East Team      93          True
1   Good West Team      99          True
2  Great East Team     105          True
3  Great West Team     110         False
4    Bad East Team      85          True
5    Bad West Team      88         False

The new good_or_east column returns the following values:

  • True if team contains “Good” or “East”
  • False if team contains neither “Good” nor “East”

Note: The | operator stands for “or” in pandas.

Example 2: Check if String Contains Several Substrings

We can use the following syntax to check if each string in the team column contains the substring “Good” and “East”:

#create new column that checks if each team name contains 'Good' and 'East'
df['good_and_east'] = df['team'].str.contains(r'^(?=.*Good)(?=.*East)')

#view updated DataFrame
print(df)

              team  points  good_and_east
0   Good East Team      93           True
1   Good West Team      99          False
2  Great East Team     105          False
3  Great West Team     110          False
4    Bad East Team      85          False
5    Bad West Team      88          False

The new good_and_east column returns the following values:

  • True if team contains “Good” and “East”
  • False if team doesn’t contain “Good” and “East”

Notice that only one True value is returned since there is only one team name that contains the substring “Good” and the substring “East.”

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: Add Column from One DataFrame to Another
Pandas: Get Rows Which Are Not in Another DataFrame
Pandas: How to Check if Multiple Columns are Equal

Related Posts