Often you may want to select the rows of a pandas DataFrame based on their index value.
If you’d like to select rows based on integer indexing, you can use the .iloc function.
If you’d like to select rows based on label indexing, you can use the .loc function.
This tutorial provides an example of how to use each of these functions in practice.
Example 1: Select Rows Based on Integer Indexing
The following code shows how to create a pandas DataFrame and use .iloc to select the row with an index integer value of 4:
import pandas as pd import numpy as np #make this example reproducible np.random.seed(0) #create DataFrame df = pd.DataFrame(np.random.rand(6,2), index=range(0,18,3), columns=['A', 'B']) #view DataFrame df A B 0 0.548814 0.715189 3 0.602763 0.544883 6 0.423655 0.645894 9 0.437587 0.891773 12 0.963663 0.383442 15 0.791725 0.528895 #select the 5th row of the DataFrame df.iloc[[4]] A B 12 0.963663 0.383442
We can use similar syntax to select multiple rows:
#select the 3rd, 4th, and 5th rows of the DataFrame
df.iloc[[2, 3, 4]]
A B
6 0.423655 0.645894
9 0.437587 0.891773
12 0.963663 0.383442
Or we could select all rows in a range:
#select the 3rd, 4th, and 5th rows of the DataFrame
df.iloc[2:5]
A B
6 0.423655 0.645894
9 0.437587 0.891773
12 0.963663 0.383442
Example 2: Select Rows Based on Label Indexing
The following code shows how to create a pandas DataFrame and use .loc to select the row with an index label of 3:
import pandas as pd import numpy as np #make this example reproducible np.random.seed(0) #create DataFrame df = pd.DataFrame(np.random.rand(6,2), index=range(0,18,3), columns=['A', 'B']) #view DataFrame df A B 0 0.548814 0.715189 3 0.602763 0.544883 6 0.423655 0.645894 9 0.437587 0.891773 12 0.963663 0.383442 15 0.791725 0.528895 #select the row with index label '3' df.loc[[3]] A B 3 0.602763 0.544883
We can use similar syntax to select multiple rows with different index labels:
#select the rows with index labels '3', '6', and '9'
df.loc[[3, 6, 9]]
A B
3 0.602763 0.544883
6 0.423655 0.645894
9 0.437587 0.891773
The Difference Between .iloc and .loc
The examples above illustrate the subtle difference between .iloc an .loc:
- .iloc selects rows based on an integer index. So, if you want to select the 5th row in a DataFrame, you would use df.iloc[[4]] since the first row is at index 0, the second row is at index 1, and so on.
- .loc selects rows based on a labeled index. So, if you want to select the row with an index label of 5, you would directly use df.loc[[5]].
Additional Resources
How to Get Row Numbers in a Pandas DataFrame
How to Drop Rows with NaN Values in Pandas
How to Drop the Index Column in Pandas