You can use the following basic syntax to get the substring of an entire column in a pandas DataFrame:
df['some_substring'] = df['string_column'].str[1:4]
This particular example creates a new column called some_substring that contains the characters from positions 1 through 4 in the string_column.
The following example shows how to use this syntax in practice.
Example: Get Substring of Entire Column in Pandas
Suppose we have the following pandas DataFrame that contains information about various basketball teams:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['Mavericks', 'Warriors', 'Rockets', 'Hornets', 'Lakers'],
'points': [120, 132, 108, 118, 106]})
#view DataFrame
print(df)
team points
0 Mavericks 120
1 Warriors 132
2 Rockets 108
3 Hornets 118
4 Lakers 106
We can use the following syntax to create a new column that contains the characters in the team column between positions 1 and 4:
#create column that extracts characters in positions 1 through 4 in team column
df['team_substring'] = df['team'].str[1:4]
#view updated DataFrame
print(df)
team points team_substring
0 Mavericks 120 ave
1 Warriors 132 arr
2 Rockets 108 ock
3 Hornets 118 orn
4 Lakers 106 ake
The new column called team_substring contains the characters in the team column between positions 1 and 4.
Note that if you attempt to use this syntax to extract a substring from a numeric column, you’ll receive an error:
#attempt to extract characters in positions 0 through 2 in points column
df['points_substring'] = df['points'].str[:2]
AttributeError: Can only use .str accessor with string values!
Instead, you must convert the numeric column to a string by using astype(str) first:
#extract characters in positions 0 through 2 in points column
df['points_substring'] = df['points'].astype(str).str[:2]
#view updated DataFrame
print(df)
team points points_substring
0 Mavericks 120 12
1 Warriors 132 13
2 Rockets 108 10
3 Hornets 118 11
4 Lakers 106 10
This time we’re able to successfully extract characters in positions 0 through 2 of the points column because we first converted it to a string.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: Check if String Contains Multiple Substrings
Pandas: How to Add String to Each Value in Column
Pandas: How to Select Columns Containing a Specific String