You can use the following syntax to filter for rows that contain a certain string in a pandas DataFrame:
df[df["col"].str.contains("this string")]
This tutorial explains several examples of how to use this syntax in practice with the following DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'], 'conference': ['East', 'East', 'East', 'West', 'West', 'East'], 'points': [11, 8, 10, 6, 6, 5]}) #view DataFrame df team conference points 0 A East 11 1 A East 8 2 A East 10 3 B West 6 4 B West 6 5 C East 5
Example 1: Filter Rows that Contain a Specific String
The following code shows how to filter for rows in the DataFrame that contain ‘A’ in the team column:
df[df["team"].str.contains("A")] team conference points 0 A East 11 1 A East 8 2 A East 10
Only the rows where the team column contains ‘A’ are kept.
Example 2: Filter Rows that Contain a String in a List
The following code shows how to filter for rows in the DataFrame that contain ‘A’ or ‘B’ in the team column:
df[df["team"].str.contains("A|B")] team conference points 0 A East 11 1 A East 8 2 A East 10 3 B West 6 4 B West 6
Only the rows where the team column contains ‘A’ or ‘B’ are kept.
Example 3: Filter Rows that Contain a Partial String
In the previous examples, we filtered based on rows that exactly matched one or more strings.
However, if we’d like to filter for rows that contain a partial string then we can use the following syntax:
#identify partial string to look for keep= ["Wes"] #filter for rows that contain the partial string "Wes" in the conference column df[df.conference.str.contains('|'.join(keep))] team conference points 3 B West 6 4 B West 6
Only the rows where the conference column contains “Wes” are kept.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Drop Rows in Pandas DataFrame Based on Condition
How to Filter a Pandas DataFrame on Multiple Conditions
How to Use “NOT IN” Filter in Pandas DataFrame