Pandas is a popular Python library used for data analysis and manipulation. One of the core data structures in Pandas is the DataFrame, which allows you to store and manipulate tabular data in rows and columns.
Selecting columns from a Pandas DataFrame is an essential skill for any data analyst or data scientist working with Python. There are several different ways to select columns in a Pandas DataFrame, including:
- Selecting by column label or name
- Selecting by column index
- Column selection with loc and iloc
- Column slicing using [] operator
In this comprehensive guide, we will explore the various methods for selecting columns in Pandas DataFrames using Python.
Table of Contents
Open Table of Contents
- Overview of Pandas DataFrame Column Selection
- Selecting Columns by Label
- Selecting Columns by Index
- Using loc and iloc for Column Selection
- Column Selection by Slicing
- Using Column Names and Labels
- Selecting Columns by Data Type
- Dropping Columns in a DataFrame
- Additional Column Selection Methods
- Practice Examples
- Key Takeaways
- Conclusion
Overview of Pandas DataFrame Column Selection
A Pandas DataFrame contains columns that can be retrieved by their name or index position. Some key things to know:
- DataFrame columns have a label or name that is used to reference the column.
- Columns also have an integer index starting from 0.
- There are primary methods for column selection:
loc
- Select columns by label.iloc
- Select columns by integer index.[]
- Column slicing by label or integer index.
Let’s start by creating a simple Pandas DataFrame to demonstrate column selection:
import pandas as pd
data = {'Product': ['Widget', 'Gadget', 'Doohickey'],
'Price': [100, 200, 300],
'Stock': [10, 20, 30]}
df = pd.DataFrame(data)
print(df)
Product Price Stock
0 Widget 100 10
1 Gadget 200 20
2 Doohickey 300 30
This DataFrame has:
- 3 rows and 3 columns
- The column labels are: ‘Product’, ‘Price’, ‘Stock’
- The column indexes are: 0, 1, 2
Let’s now go over how to select columns from this sample DataFrame by label, index, slicing, and more.
Selecting Columns by Label
To select a column by its label or name, use the []
operator after the DataFrame.
For example:
# Select 'Product' column
product_col = df['Product']
print(product_col)
0 Widget
1 Gadget
2 Doohickey
Name: Product, dtype: object
The column label is passed as a string inside []
to extract the entire column as a Pandas Series.
You can select multiple columns by passing a list of column names:
# Select 'Product' and 'Price' columns
product_price = df[['Product', 'Price']]
print(product_price)
Product Price
0 Widget 100
1 Gadget 200
2 Doohickey 300
The DataFrame product_price
contains just the ‘Product’ and ‘Price’ columns selected by label.
Selecting Columns by Index
To select columns by their integer index, use the same []
selection passing integer indexes:
# Select column index 1
col_index1 = df.iloc[:,1]
print(col_index1)
0 100
1 200
2 300
Name: Price, dtype: int64
Here we select the column at index 1, which is the ‘Price’ column.
Multiple columns can be selected by passing a list of integer indexes:
# Select column indexes 0, 2
col_indexes = df.iloc[:,[0, 2]]
print(col_indexes)
Product Stock
0 Widget 10
1 Gadget 20
2 Doohickey 30
This returns the columns at index 0 and 2, ‘Product’ and ‘Stock’.
Using loc and iloc for Column Selection
Pandas provides the loc
and iloc
attributes on DataFrames for label and integer-based indexing.
loc for Label Column Selection
loc
allows selecting columns by their label using plain Python strings:
# Select 'Price' column
price_col = df.loc[:, 'Price']
print(price_col)
0 100
1 200
2 300
Name: Price, dtype: int64
For multiple columns, pass a list of labels:
# Select 'Product' and 'Stock' column
product_stock = df.loc[:, ['Product', 'Stock']]
print(product_stock)
Product Stock
0 Widget 10
1 Gadget 20
2 Doohickey 30
loc
provides an explicit way to retrieve columns by label.
iloc for Index Column Selection
For selecting by integer index, use iloc
:
# Select column index 0
product_col = df.iloc[:, 0]
print(product_col)
0 Widget
1 Gadget
2 Doohickey
Name: Product, dtype: object
Specify multiple column indexes in a list:
# Select column indexes 1, 2
cols = df.iloc[:, [1, 2]]
print(cols)
Price Stock
0 100 10
1 200 20
2 300 30
iloc
selects columns explicitly by integer index.
Column Selection by Slicing
You can slice a Pandas DataFrame to select consecutive columns using the [start:stop]
syntax.
For example:
# Select columns from index 1 to index 2
col_slice = df.iloc[:, 1:3]
print(col_slice)
Price Stock
0 100 10
1 200 20
2 300 30
This slices columns from index 1 to index 2, so selects ‘Price’ and ‘Stock’.
Slicing can also be used directly on the DataFrame:
# Slice columns 1 to 2
df_slice = df.iloc[:, 1:3]
print(df_slice)
Price Stock
0 100 10
1 200 20
2 300 30
Column slicing provides a way to get a subset of consecutive columns by index.
Using Column Names and Labels
When selecting columns, keep in mind:
- Use column names or labels to retrieve columns by a meaningful identifier.
- The column label is the
name
attribute on a Pandas Series. - Labels are fixed when the DataFrame is created.
For example:
print(df.columns)
Index(['Product', 'Price', 'Stock'], dtype='object')
Shows the column labels ‘Product’, ‘Price’, ‘Stock’.
We can rename columns and use the new labels:
df = df.rename(columns={'Price': 'Cost'})
print(df['Cost'])
0 100
1 200
2 300
Name: Cost, dtype: int64
Selecting columns by descriptive names using labels makes your code more readable and maintainable.
Selecting Columns by Data Type
You can also select columns based on their data types using the DataFrame select_dtypes
method.
For example, to select numeric columns:
numeric_cols = df.select_dtypes(include=[np.number])
print(numeric_cols)
Cost Stock
0 100 10
1 200 20
2 300 30
This selects the ‘Cost’ and ‘Stock’ columns which are numeric.
Other options like include=np.object
selects object columns and exclude
excludes dtypes.
Dropping Columns in a DataFrame
The DataFrame drop()
method allows you to drop or remove one or more columns.
For example:
# Drop 'Stock' column
df_dropped = df.drop(columns=['Stock'])
print(df_dropped)
Product Cost
0 Widget 100
1 Gadget 200
2 Doohickey 300
This drops the ‘Stock’ column by label.
You can drop columns by index, slice, or list of indexes too.
Additional Column Selection Methods
There are some additional ways to select columns in a Pandas DataFrame:
df.filter()
- Filter columns by label to subset rows.df.reindex()
- Reindex columns by labels to rearrange.df[bool_array]
- Boolean indexing to filter columns.
For example:
import numpy as np
# Boolean selection
bool_array = np.array([True, False, True])
selected = df.iloc[:, bool_array]
print(selected)
Product Stock
0 Widget 10
2 Doohickey 30
These provide additional options for more complex column selection scenarios.
Practice Examples
Let’s review some practice examples of selecting columns from a DataFrame by label, index, and slicing:
import pandas as pd
data = {'Product': ['Widget', 'Gadget', 'Gizmo'],
'Quantity': [100, 200, 300],
'Price': [10.5, 20.5, 30.5]}
products = pd.DataFrame(data)
# By label
print(products['Product'])
# By index
print(products.iloc[:, 1])
# Slicing
print(products.loc[:, 'Quantity':'Price'])
# loc and iloc
print(products.loc[:, ['Product', 'Quantity']])
print(products.iloc[:, [0, 2]])
These examples demonstrate the various selection methods in action on a sample DataFrame.
Key Takeaways
The key points about selecting Pandas DataFrame columns in Python covered in this guide:
- Use
[]
and column name, index to select columns loc
andiloc
explicitly select by label and index- Slice columns using
df.iloc[:, 1:3]
syntax - Select multiple columns by passing a list
- Use column labels and names for readable code
- Drop columns with
drop()
method
Selecting DataFrame columns by label, index, or slicing is a fundamental Pandas skill for data exploration and analysis. Using proper indexing and selection techniques goes a long way in Pandas.
Conclusion
This concludes our guide on how to select columns from a Pandas DataFrame by label, index, slicing and more using Python. The methods shown provide flexible ways to extract single or multiple columns in Pandas.
Mastering DataFrame column selection will enable you to efficiently access, query, and understand your data. These skills form the foundation for advanced DataFrame operations like join, merge, reshape, pivot, and more. Combining column selection with other Pandas techniques will give you the tools to wrangle data effectively for your Python data science and analytics projects.