Selecting Pandas DataFrame Columns by Label, Index, Slicing in Python

Pandas is a popular Python library used for data analysis and manipulation. One of the core data structures in Pandas is the DataFrame, which allows you to store and manipulate tabular data in rows and columns.

Selecting columns from a Pandas DataFrame is an essential skill for any data analyst or data scientist working with Python. There are several different ways to select columns in a Pandas DataFrame, including:

Selecting by column label or name
Selecting by column index
Column selection with loc and iloc
Column slicing using [] operator

In this comprehensive guide, we will explore the various methods for selecting columns in Pandas DataFrames using Python.

Open Table of Contents

Overview of Pandas DataFrame Column Selection
Selecting Columns by Label
Selecting Columns by Index
Using loc and iloc for Column Selection
- loc for Label Column Selection
- iloc for Index Column Selection
Column Selection by Slicing
Using Column Names and Labels
Selecting Columns by Data Type
Dropping Columns in a DataFrame
Additional Column Selection Methods
Practice Examples
Key Takeaways
Conclusion

Overview of Pandas DataFrame Column Selection

A Pandas DataFrame contains columns that can be retrieved by their name or index position. Some key things to know:

DataFrame columns have a label or name that is used to reference the column.
Columns also have an integer index starting from 0.
There are primary methods for column selection:
- loc - Select columns by label.
- iloc - Select columns by integer index.
- [] - Column slicing by label or integer index.

Let’s start by creating a simple Pandas DataFrame to demonstrate column selection:

import pandas as pd

data = {'Product': ['Widget', 'Gadget', 'Doohickey'],
        'Price': [100, 200, 300],
        'Stock': [10, 20, 30]}

df = pd.DataFrame(data)

print(df)

   Product  Price  Stock
0  Widget    100     10
1  Gadget    200     20
2  Doohickey  300     30

This DataFrame has:

3 rows and 3 columns
The column labels are: ‘Product’, ‘Price’, ‘Stock’
The column indexes are: 0, 1, 2

Let’s now go over how to select columns from this sample DataFrame by label, index, slicing, and more.

Selecting Columns by Label

To select a column by its label or name, use the [] operator after the DataFrame.

For example:

# Select 'Product' column
product_col = df['Product']

print(product_col)

0    Widget
1    Gadget
2    Doohickey
Name: Product, dtype: object

The column label is passed as a string inside [] to extract the entire column as a Pandas Series.

You can select multiple columns by passing a list of column names:

# Select 'Product' and 'Price' columns
product_price = df[['Product', 'Price']]

print(product_price)

     Product  Price
0    Widget    100
1    Gadget    200
2  Doohickey    300

The DataFrame product_price contains just the ‘Product’ and ‘Price’ columns selected by label.

Selecting Columns by Index

To select columns by their integer index, use the same [] selection passing integer indexes:

# Select column index 1
col_index1 = df.iloc[:,1]

print(col_index1)

0    100
1    200
2    300
Name: Price, dtype: int64

Here we select the column at index 1, which is the ‘Price’ column.

Multiple columns can be selected by passing a list of integer indexes:

# Select column indexes 0, 2
col_indexes = df.iloc[:,[0, 2]]

print(col_indexes)

     Product  Stock
0    Widget     10
1    Gadget     20
2  Doohickey     30

This returns the columns at index 0 and 2, ‘Product’ and ‘Stock’.

Using loc and iloc for Column Selection

Pandas provides the loc and iloc attributes on DataFrames for label and integer-based indexing.

loc for Label Column Selection

loc allows selecting columns by their label using plain Python strings:

# Select 'Price' column
price_col = df.loc[:, 'Price']

print(price_col)

0    100
1    200
2    300
Name: Price, dtype: int64

For multiple columns, pass a list of labels:

# Select 'Product' and 'Stock' column
product_stock = df.loc[:, ['Product', 'Stock']]

print(product_stock)

     Product  Stock
0    Widget     10
1    Gadget     20
2  Doohickey     30

loc provides an explicit way to retrieve columns by label.

iloc for Index Column Selection

For selecting by integer index, use iloc:

# Select column index 0
product_col = df.iloc[:, 0]

print(product_col)

0    Widget
1    Gadget
2    Doohickey
Name: Product, dtype: object

Specify multiple column indexes in a list:

# Select column indexes 1, 2
cols = df.iloc[:, [1, 2]]

print(cols)

   Price  Stock
0    100     10
1    200     20
2    300     30

iloc selects columns explicitly by integer index.

Column Selection by Slicing

You can slice a Pandas DataFrame to select consecutive columns using the [start:stop] syntax.

For example:

# Select columns from index 1 to index 2
col_slice = df.iloc[:, 1:3]

print(col_slice)

   Price  Stock
0    100     10
1    200     20
2    300     30

This slices columns from index 1 to index 2, so selects ‘Price’ and ‘Stock’.

Slicing can also be used directly on the DataFrame:

# Slice columns 1 to 2
df_slice = df.iloc[:, 1:3]

print(df_slice)

   Price  Stock
0    100     10
1    200     20
2    300     30

Column slicing provides a way to get a subset of consecutive columns by index.

Using Column Names and Labels

When selecting columns, keep in mind:

Use column names or labels to retrieve columns by a meaningful identifier.
The column label is the name attribute on a Pandas Series.
Labels are fixed when the DataFrame is created.

For example:

print(df.columns)

Index(['Product', 'Price', 'Stock'], dtype='object')

Shows the column labels ‘Product’, ‘Price’, ‘Stock’.

We can rename columns and use the new labels:

df = df.rename(columns={'Price': 'Cost'})

print(df['Cost'])

0    100
1    200
2    300
Name: Cost, dtype: int64

Selecting columns by descriptive names using labels makes your code more readable and maintainable.

Selecting Columns by Data Type

You can also select columns based on their data types using the DataFrame select_dtypes method.

For example, to select numeric columns:

numeric_cols = df.select_dtypes(include=[np.number])

print(numeric_cols)

   Cost  Stock
0   100     10
1   200     20
2   300     30

This selects the ‘Cost’ and ‘Stock’ columns which are numeric.

Other options like include=np.object selects object columns and exclude excludes dtypes.

Dropping Columns in a DataFrame

The DataFrame drop() method allows you to drop or remove one or more columns.

For example:

# Drop 'Stock' column
df_dropped = df.drop(columns=['Stock'])

print(df_dropped)

   Product  Cost
0  Widget   100
1  Gadget   200
2  Doohickey 300

This drops the ‘Stock’ column by label.

You can drop columns by index, slice, or list of indexes too.

Additional Column Selection Methods

There are some additional ways to select columns in a Pandas DataFrame:

df.filter() - Filter columns by label to subset rows.
df.reindex() - Reindex columns by labels to rearrange.
df[bool_array] - Boolean indexing to filter columns.

For example:

import numpy as np

# Boolean selection
bool_array = np.array([True, False, True])
selected = df.iloc[:, bool_array]

print(selected)

  Product  Stock
0  Widget     10
2  Doohickey   30

These provide additional options for more complex column selection scenarios.

Practice Examples

Let’s review some practice examples of selecting columns from a DataFrame by label, index, and slicing:

import pandas as pd

data = {'Product': ['Widget', 'Gadget', 'Gizmo'],
        'Quantity': [100, 200, 300],
        'Price': [10.5, 20.5, 30.5]}

products = pd.DataFrame(data)

# By label
print(products['Product'])

# By index
print(products.iloc[:, 1])

# Slicing
print(products.loc[:, 'Quantity':'Price'])

# loc and iloc
print(products.loc[:, ['Product', 'Quantity']])
print(products.iloc[:, [0, 2]])

These examples demonstrate the various selection methods in action on a sample DataFrame.

Key Takeaways

The key points about selecting Pandas DataFrame columns in Python covered in this guide:

Use [] and column name, index to select columns
loc and iloc explicitly select by label and index
Slice columns using df.iloc[:, 1:3] syntax
Select multiple columns by passing a list
Use column labels and names for readable code
Drop columns with drop() method

Selecting DataFrame columns by label, index, or slicing is a fundamental Pandas skill for data exploration and analysis. Using proper indexing and selection techniques goes a long way in Pandas.

Conclusion

This concludes our guide on how to select columns from a Pandas DataFrame by label, index, slicing and more using Python. The methods shown provide flexible ways to extract single or multiple columns in Pandas.

Mastering DataFrame column selection will enable you to efficiently access, query, and understand your data. These skills form the foundation for advanced DataFrame operations like join, merge, reshape, pivot, and more. Combining column selection with other Pandas techniques will give you the tools to wrangle data effectively for your Python data science and analytics projects.