Home » How to Extract Number from String in Pandas

How to Extract Number from String in Pandas

by Erma Khan

You can use the following basic syntax to extract numbers from a string in pandas:

df['my_column'].str.extract('(d+)')

This particular syntax will extract the numbers from each string in a column called my_column in a pandas DataFrame.

Note: When using a regular expression, d represents “any digit” and + stands for “one or more.”

The following example shows how to use this function in practice.

Example: Extract Number from String in Pandas

Suppose we have the following pandas DataFrame that contains information about the sales of various products:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'product': ['A33', 'B34', 'A22', 'A50', 'C200', 'D7', 'A9', 'A13'],
                   'sales': [18, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print(df)

  product  sales
0     A33     18
1     B34     22
2     A22     19
3     A50     14
4    C200     14
5      D7     11
6      A9     20
7     A13     28

Suppose we would like to extract the number from each string in the product column.

We can use the following syntax to do so:

#extract numbers from strings in 'product' column
df['product'].str.extract('(d+)')

	0
0	33
1	34
2	22
3	50
4	200
5	7
6	9
7	13

The result is a DataFrame that contains only the numbers from each row in the product column.

For example:

  • The formula extracts 33 from the string A33 in the first row.
  • The formula extracts 34 from the string B34 in the first row.
  • The formula extracts 22 from the string A22 in the first row.

And so on.

If you’d like, you can also store these numerical values in a new column in the DataFrame:

#extract numbers from strings in 'product' column and store them in new column
df['product_numbers'] = df['product'].str.extract('(d+)')

#view updated DataFrame
print(df)

  product  sales product_numbers
0     A33     18              33
1     B34     22              34
2     A22     19              22
3     A50     14              50
4    C200     14             200
5      D7     11               7
6      A9     20               9
7     A13     28              13

The new column called product_numbers contains only the numbers from each string in the product column.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Sort DataFrame Based on String Column
Pandas: How to Remove Specific Characters from Strings
Pandas: Search for String in All Columns of DataFrame

Related Posts