Home » Pandas: How to Get Substring of Entire Column

Pandas: How to Get Substring of Entire Column

by Erma Khan

You can use the following basic syntax to get the substring of an entire column in a pandas DataFrame:

df['some_substring'] = df['string_column'].str[1:4]

This particular example creates a new column called some_substring that contains the characters from positions 1 through 4 in the string_column.

The following example shows how to use this syntax in practice.

Example: Get Substring of Entire Column in Pandas

Suppose we have the following pandas DataFrame that contains information about various basketball teams:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['Mavericks', 'Warriors', 'Rockets', 'Hornets', 'Lakers'],
                   'points': [120, 132, 108, 118, 106]})

#view DataFrame
print(df)

        team  points
0  Mavericks     120
1   Warriors     132
2    Rockets     108
3    Hornets     118
4     Lakers     106

We can use the following syntax to create a new column that contains the characters in the team column between positions 1 and 4:

#create column that extracts characters in positions 1 through 4 in team column
df['team_substring'] = df['team'].str[1:4]

#view updated DataFrame
print(df)

        team  points team_substring
0  Mavericks     120            ave
1   Warriors     132            arr
2    Rockets     108            ock
3    Hornets     118            orn
4     Lakers     106            ake

The new column called team_substring contains the characters in the team column between positions 1 and 4.

Note that if you attempt to use this syntax to extract a substring from a numeric column, you’ll receive an error:

#attempt to extract characters in positions 0 through 2 in points column
df['points_substring'] = df['points'].str[:2]

AttributeError: Can only use .str accessor with string values!

Instead, you must convert the numeric column to a string by using astype(str) first:

#extract characters in positions 0 through 2 in points column
df['points_substring'] = df['points'].astype(str).str[:2]

#view updated DataFrame
print(df)

        team  points points_substring
0  Mavericks     120               12
1   Warriors     132               13
2    Rockets     108               10
3    Hornets     118               11
4     Lakers     106               10

This time we’re able to successfully extract characters in positions 0 through 2 of the points column because we first converted it to a string.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: Check if String Contains Multiple Substrings
Pandas: How to Add String to Each Value in Column
Pandas: How to Select Columns Containing a Specific String

Related Posts