You can use the following basic syntax to create a column in a pandas DataFrame if it doesn’t already exist:
df['my_column'] = df.get('my_column', df['col1'] * df['col2'])
This particular syntax creates a new column called my_column if it doesn’t already exist in the DataFrame and it is defined as the product of the existing columns col1 and col2.
The following example shows how to use this syntax in practice.
Example: Create Column in Pandas If It Doesn’t Exist
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'sales': [4, 6, 5, 8, 14, 13, 13, 12, 9, 8, 19, 14], 'price': [1, 2, 2, 1, 2, 4, 4, 3, 3, 2, 2, 3]}) #view DataFrame print(df) day sales price 0 1 4 1 1 2 6 2 2 3 5 2 3 4 8 1 4 5 14 2 5 6 13 4 6 7 13 4 7 8 12 3 8 9 9 3 9 10 8 2 10 11 19 2 11 12 14 3
Now suppose we attempt to add a column called price if it doesn’t already exist and define it as a column in which each value is equal to 100:
#attempt to add column called 'price'
df['price'] = df.get('price', 100)
#view updated DataFrame
print(df)
day sales price
0 1 4 1
1 2 6 2
2 3 5 2
3 4 8 1
4 5 14 2
5 6 13 4
6 7 13 4
7 8 12 3
8 9 9 3
9 10 8 2
10 11 19 2
11 12 14 3
Since a column called price already exists, pandas simply doesn’t add it to the DataFrame.
However, suppose we attempt to add a new column called revenue if it doesn’t already exist and define it as a column in which the values are the product of the sales and price columns:
#attempt to add column called 'revenue'
df['revenue'] = df.get('revenue', df['sales'] * df['price'])
#view updated DataFrame
print(df)
day sales price revenue
0 1 4 1 4
1 2 6 2 12
2 3 5 2 10
3 4 8 1 8
4 5 14 2 28
5 6 13 4 52
6 7 13 4 52
7 8 12 3 36
8 9 9 3 27
9 10 8 2 16
10 11 19 2 38
11 12 14 3 42
This revenue column is added to the DataFrame because it did not already exist.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Drop Rows in Pandas DataFrame Based on Condition
How to Filter a Pandas DataFrame on Multiple Conditions
How to Use “NOT IN” Filter in Pandas DataFrame