Home » Pandas: Search for String in All Columns of DataFrame

Pandas: Search for String in All Columns of DataFrame

by Erma Khan

You can use the following syntax to search for a particular string in each column of a pandas DataFrame and filter for rows that contain the string in at least one column:

#define filter
mask = np.column_stack([df[col].str.contains(r"my_string", na=False) for col in df])

#filter for rows where any column contains 'my_string'
df.loc[mask.any(axis=1)]

The following example shows how to use this syntax in practice.

Example: Search for String in All Columns of Pandas DataFrame

Suppose we have the following pandas DataFrame that contains information about the first role and second role of various basketball players on a team:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'first_role': ['P Guard', 'P Guard', 'S Guard', 'S Forward',
                                  'P Forward', 'Center', 'Center', 'Center'],
                   'second_role': ['S Guard', 'S Guard', 'Forward', 'S Guard',
                                   'S Guard', 'S Forward', 'P Forward', 'P Forward']})

#view DataFrame
print(df)

  player first_role second_role
0      A    P Guard     S Guard
1      B    P Guard     S Guard
2      C    S Guard     Forward
3      D  S Forward     S Guard
4      E  P Forward     S Guard
5      F     Center   S Forward
6      G     Center   P Forward
7      H     Center   P Forward

The following code shows how to filter the pandas DataFrame for rows where the string “Guard” occurs in any column:

import numpy as np

#define filter
mask = np.column_stack([df[col].str.contains(r"Guard", na=False) for col in df])

#filter for rows where any column contains 'Guard'
df.loc[mask.any(axis=1)]

        player	first_role  second_role
0	A	P Guard	    S Guard
1	B	P Guard	    S Guard
2	C	S Guard	    Forward
3	D	S Forward   S Guard
4	E	P Forward   S Guard

Notice that each row in the resulting DataFrame contains the string “Guard” in at least one column.

You could also filter for rows where one of several strings occurs in at least one column by using the “OR” ( | ) operator in pandas.

For example, the following code shows how to filter for rows where either “P Guard” or “Center” occurs in at least one column:

import numpy as np

#define filter
mask = np.column_stack([df[col].str.contains(r"P Guard|Center", na=False) for col in df])

#filter for rows where any column contains 'P Guard' or 'Center'
df.loc[mask.any(axis=1)]

        player	first_role  second_role
0	A	P Guard	    S Guard
1	B	P Guard	    S Guard
5	F	Center	    S Forward
6	G	Center	    P Forward
7	H	Center	    P Forward

Notice that each row in the resulting DataFrame contains “P Guard” or “Center” in at least one column.

Note: It’s important to include the argument na=False within the contains() function or else you will encounter an error if NaN values are present in the DataFrame.

Additional Resources

The following tutorials explain how to perform other common filtering operations in pandas:

How to Filter a Pandas DataFrame by Column Values
How to Filter Pandas DataFrame Rows by Date
How to Filter a Pandas DataFrame on Multiple Conditions

Related Posts