Often you may want to save a pandas DataFrame for later use without the hassle of importing the data again from a CSV file.
The easiest way to do this is by using to_pickle() to save the DataFrame as a pickle file:
df.to_pickle("my_data.pkl")
This will save the DataFrame in your current working environment.
You can then use read_pickle() to quickly read the DataFrame from the pickle file:
df = pd.read_pickle("my_data.pkl")
The following example shows how to use these functions in practice.
Example: Save and Load Pandas DataFrame
Suppose we create the following pandas DataFrame that contains information about various basketball teams:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
'points': [18, 22, 19, 14, 14, 11, 20, 28],
'assists': [5, 7, 7, 9, 12, 9, 9, 4],
'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})
#view DataFrame
print(df)
team points assists rebounds
0 A 18 5 11
1 B 22 7 8
2 C 19 7 10
3 D 14 9 6
4 E 14 12 6
5 F 11 9 5
6 G 20 9 9
7 H 28 4 12
We can use df.info() to view the data type of each variable in the DataFrame:
#view DataFrame info
print(df.info())
RangeIndex: 8 entries, 0 to 7
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 team 8 non-null object
1 points 8 non-null int64
2 assists 8 non-null int64
3 rebounds 8 non-null int64
dtypes: int64(3), object(1)
memory usage: 292.0+ bytes
None
We can use the to_pickle() function to save this DataFrame to a pickle file with a .pkl extension:
#save DataFrame to pickle file
df.to_pickle("my_data.pkl")
Our DataFrame is now saved as a pickle file in our current working environment.
We can then use the read_pickle() function to quickly read the DataFrame:
#read DataFrame from pickle file
df= pd.read_pickle("my_data.pkl")
#view DataFrame
print(df)
team points assists rebounds
0 A 18 5 11
1 B 22 7 8
2 C 19 7 10
3 D 14 9 6
4 E 14 12 6
5 F 11 9 5
6 G 20 9 9
7 H 28 4 12
We can use df.info() again to confirm that the data type of each column is the same as before:
#view DataFrame info
print(df.info())
RangeIndex: 8 entries, 0 to 7
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 team 8 non-null object
1 points 8 non-null int64
2 assists 8 non-null int64
3 rebounds 8 non-null int64
dtypes: int64(3), object(1)
memory usage: 292.0+ bytes
None
The benefit of using pickle files is that the data type of each column is retained when we save and load the DataFrame.
This provides an advantage over saving and loading CSV files because we don’t have to perform any transformations on the DataFrame since the pickle file preserves the original state of the DataFrame.
Additional Resources
The following tutorials explain how to fix other common errors in Python:
How to Fix KeyError in Pandas
How to Fix: ValueError: cannot convert float NaN to integer
How to Fix: ValueError: operands could not be broadcast together with shapes