Adding, Inserting, Removing, and Renaming Columns in Pandas

Pandas is a popular Python library used for data analysis and manipulation. One of Pandas’ most useful features is the ability to easily modify DataFrame columns. This allows developers to shape datasets to best suit their needs. In this comprehensive guide, we will explore the various methods for adding, inserting, removing, and renaming columns in Pandas DataFrames.

Open Table of Contents

Overview
Adding Columns
Modifying Columns
Inserting Columns
Removing Columns
Renaming Columns
Adding Columns Via Parameters
Inserting Columns Via Assigning Entire Rows
Concatenating DataFrames
Conclusion

Overview

A Pandas DataFrame is a two-dimensional, tabular data structure with labeled columns that can hold different data types like strings, numbers, booleans, etc. The columns in a DataFrame act like variables in Python. When analyzing and preparing data in Python, it is often necessary to add, delete or modify existing columns.

Pandas provides several methods to make these modifications efficiently without affecting the rest of the DataFrame. The key methods for column manipulation are:

df[column_name] = column_values - Add/modify columns by assignment
df.insert(loc, column_name, column_values) - Insert column at specified location
df.drop(columns=[column_names]) - Delete columns by name
df.rename(columns={old_name: new_name}) - Rename columns by specifying a mapping

In the following sections, we will explore the proper usage of each method with examples.

Adding Columns

New columns can be added to a Pandas DataFrame by simply assigning the new column with a name and values.

The basic syntax is:

df[new_column_name] = column_values

The column values can be a Python list, NumPy array, Pandas Series, or scalar value that is broadcast across all rows.

For example:

import pandas as pd

data = {'Name': ['John', 'Mary'], 'Age': [25, 27]}

df = pd.DataFrame(data)

# Add new column with scalar value
df['Country'] = 'United States'

# Add column with list
df['Hobby'] = ['Tennis', 'Hiking']

print(df)

   Name  Age           Country    Hobby
0  John   25  United States  Tennis
1  Mary   27  United States    Hiking

The new columns are appended to the right end of the DataFrame. The length of the new column values must match the length of the DataFrame, otherwise Pandas will raise an error.

We can also insert a column at a specific location using insert(), which will be covered later.

Modifying Columns

Existing columns in a DataFrame can be modified by simply assigning new values to the column:

df[column_name] = new_column_values

The new values must match the length of the DataFrame, similar to adding new columns.

For example:

df['Age'] = [24, 26] # Modify Age column

print(df)

   Name  Age           Country    Hobby
0  John   24  United States  Tennis
1  Mary   26  United States    Hiking

Columns can also be modified with scalar values:

df['Country'] = 'Canada' # Set all rows to Canada

Or using columnar operations like applying mathematical functions:

df['Age'] = df['Age'] + 1 # Increment Age by 1

Inserting Columns

The insert() method allows inserting a new column at a specified location in the DataFrame.

The syntax is:

df.insert(loc, column_name, column_values)

Where loc is the zero-indexed insertion location (the numeric index of the column before which the new column will be inserted).

For example:

new_col = [10, 20]

df.insert(1, 'Points', new_col)

print(df)

   Name  Points  Age        Country    Hobby
0  John     10   24  United States  Tennis
1  Mary     20   26  United States    Hiking

Here we inserted the ‘Points’ column with values [10, 20] at index position 1, between the ‘Name’ and ‘Age’ columns.

Inserting a column modifies the DataFrame in-place. The column index positions of existing columns will be shifted right by 1 after the insert location.

We can also insert multiple columns at once by passing a list of column names and values:

df.insert(1, ['Points', 'Score'], [[10, 20], [20, 30]])

This inserts two columns ‘Points’ and ‘Score’ at index 1.

Removing Columns

To remove one or more columns, use the drop() method on the DataFrame:

df.drop(columns=[column_names], inplace=True)

The columns parameter accepts the name of the column(s) to remove as a list.

Setting inplace=True will modify the DataFrame in-place, otherwise drop() will return a copy with the columns removed.

For example:

# Remove single column
df.drop(columns=['Points'], inplace=True)

# Remove multiple columns
df.drop(columns=['Country', 'Hobby'], inplace=True)

print(df)

   Name  Age
0  John   24
1  Mary   26

We can also remove columns by index position instead of name:

df.drop(columns=[0, 3], axis=1, inplace=True)

Here axis=1 indicates columns since DataFrames are two-dimensional.

The column index positions will be automatically shifted left after dropping columns.

Renaming Columns

The rename() method is used to rename one or more DataFrame column names.

The basic syntax is:

df.rename(columns={old_name: new_name}, inplace=True)

This specifies a dictionary mapping between the old and new column names.

For example:

df.rename(columns={'Name': 'First Name'}, inplace=True)

print(df)

  First Name   Age
0       John    24
1       Mary    26

We can rename multiple columns at once:

df.rename(columns={'Name': 'First Name', 'Age': 'Age Years'}, inplace=True)

The column names are modified in-place. The original DataFrame is changed.

We can also rename by index position instead of name:

df.rename(columns={0: 'First Name', 1: 'Age Years'}, inplace=True)

This can be useful when the original column names are missing or invalid.

The rename() method does not modify dtype or any values in the columns. It only changes the column labels.

Adding Columns Via Parameters

There are a few other ways to inject new columns when creating a Pandas DataFrame:

1. Column Parameter

The columns parameter can specify column names and values when constructing a DataFrame:

data = [[25, 'John'], [27, 'Mary']]

df = pd.DataFrame(data, columns=['Age', 'Name'])

print(df)

   Age Name
0   25 John
1   27 Mary

2. Using Dictionary

A dictionary passed into the DataFrame will create columns from the keys:

data = {'Age': [25, 27], 'Name': ['John', 'Mary']}

df = pd.DataFrame(data)

print(df)

   Age Name
0   25 John
1   27 Mary

3. Assign During Creation

We can also inject new columns by assignment when creating the DataFrame:

df = pd.DataFrame(data, columns=['Age', 'Name'])
df['Country'] = 'United States'

print(df)

   Age Name Country
0   25 John United States
1   27 Mary United States

Inserting Columns Via Assigning Entire Rows

In some cases, it is useful to insert an entire row with multiple columns at once. This can be done by:

Creating a new DataFrame from the row data
Assigning the new row to the index position

For example:

new_row = {'Name': 'Joe', 'Age': 22, 'Country': 'Canada'}

df_new = pd.DataFrame(new_row, index=[2])

df = df.append(df_new, ignore_index=True)

print(df)

   Age   Name    Country
0   25   John   United States
1   27   Mary   United States
2   22   Joe    Canada

Here we created a single row DataFrame df_new and appended it to the bottom of the original df. By passing ignore_index=True, Pandas will reindex the rows sequentially.

The same process can insert multiple rows by creating a multi-row DataFrame and appending.

Concatenating DataFrames

An alternative method to inject new columns is concatenating Pandas DataFrames using concat():

df1 = pd.DataFrame({'Age': [25, 27]})
df2 = pd.DataFrame({'Name': ['John', 'Mary']})

df = pd.concat([df1, df2], axis=1)

print(df)

   Age Name
0   25 John
1   27 Mary

The axis=1 specifies to concatenate column-wise, stacking df2 next to df1.

This allows assembling DataFrames created separately into a combined dataset with the desired columns.

We can pass ignore_index=True to reindex the rows when concatenating.

Conclusion

Pandas provides a versatile set of methods for adding, inserting, removing, and renaming columns in DataFrames. Mastering these column manipulation techniques enables wrangling tabular data in Python to best fit the needs of data science and analysis workflows.

In summary:

Use column assignment to add new columns or modify existing ones
Insert columns at specific positions with insert()
Remove columns by drop()
Rename column names with rename()
Add columns using columns parameter, dictionaries, or row appends
Concatenate DataFrames with concat()

With these tools, developers can shape Pandas DataFrames into the ideal schema for modeling, visualization, and machine learning tasks.