Home » How to Use describe() Function in Pandas (With Examples)

How to Use describe() Function in Pandas (With Examples)

by Erma Khan

You can use the describe() function to generate descriptive statistics for a pandas DataFrame.

This function uses the following basic syntax:

df.describe()

The following examples show how to use this syntax in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
df

	team	points	assists	rebounds
0	A	25	5	11
1	A	12	7	8
2	B	15	7	10
3	B	14	9	6
4	B	19	12	6
5	C	23	9	5
6	C	25	9	9
7	C	29	4	12

Example 1: Describe All Numeric Columns

By default, the describe() function only generates descriptive statistics for numeric columns in a pandas DataFrame:

#generate descriptive statistics for all numeric columns
df.describe()

	points	        assists	   rebounds
count	8.000000	8.00000	   8.000000
mean	20.250000	7.75000	   8.375000
std	6.158618	2.54951	   2.559994
min	12.000000	4.00000	   5.000000
25%	14.750000	6.50000	   6.000000
50%	21.000000	8.00000	   8.500000
75%	25.000000	9.00000	   10.250000
max	29.000000	12.00000   12.000000

Descriptive statistics are shown for the three numeric columns in the DataFrame.

Note: If there are missing values in any columns, pandas will automatically exclude these values when calculating the descriptive statistics.

Example 2: Describe All Columns

To calculate descriptive statistics for every column in the DataFrame, we can use the include=’all’ argument:

#generate descriptive statistics for all columns
df.describe(include='all')

	team	points	    assists	rebounds
count	8	8.000000    8.00000	8.000000
unique	3	NaN	    NaN	        NaN
top	B	NaN	    NaN	        NaN
freq	3	NaN	    NaN	        NaN
mean	NaN	20.250000   7.75000	8.375000
std	NaN	6.158618    2.54951	2.559994
min	NaN	12.000000   4.00000	5.000000
25%	NaN	14.750000   6.50000	6.000000
50%	NaN	21.000000   8.00000	8.500000
75%	NaN	25.000000   9.00000	10.250000
max	NaN	29.000000   12.00000	12.000000

Example 3: Describe Specific Columns

The following code shows how to calculate descriptive statistics for one specific column in the pandas DataFrame:

#calculate descriptive statistics for 'points' column only
df['points'].describe()

count     8.000000
mean     20.250000
std       6.158618
min      12.000000
25%      14.750000
50%      21.000000
75%      25.000000
max      29.000000
Name: points, dtype: float64

The following code shows how to calculate descriptive statistics for several specific columns:

#calculate descriptive statistics for 'points' and 'assists' columns only
df[['points', 'assists']].describe()

	points	assists
count	8.000000	8.00000
mean	20.250000	7.75000
std	6.158618	2.54951
min	12.000000	4.00000
25%	14.750000	6.50000
50%	21.000000	8.00000
75%	25.000000	9.00000
max	29.000000	12.00000

You can find the complete documentation for the describe() function here.

Additional Resources

The following tutorials explain how to perform other common functions in pandas:

Pandas: How to Find Unique Values in a Column
Pandas: How to Find the Difference Between Two Rows
Pandas: How to Count Missing Values in DataFrame

Related Posts