NumPy is a popular Python library used for scientific computing and working with multi-dimensional data arrays. One of the most common tasks when using NumPy arrays is adding or removing elements. NumPy provides several methods that allow you to easily append, insert, and delete elements from arrays without affecting the original data. Mastering these array modification techniques is key for efficiently pre-processing and cleaning data for analysis.
This comprehensive guide will demonstrate how to use numpy.append()
, numpy.delete()
, and numpy.insert()
to add and remove elements from NumPy arrays. We will cover basic usage, parameters, return values, and examples applying these functions in real-world scenarios.
Table of Contents
Open Table of Contents
Overview of NumPy Appending, Deleting, and Inserting
numpy.append()
The append()
method allows you to concatenate two or more arrays along a specified axis. This joins the arrays end-to-end and returns a new array with the combined elements.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr_appended = np.append(arr1, arr2)
print(arr_appended)
# [1 2 3 4 5 6]
numpy.delete()
The delete()
method allows you to remove or delete elements from an array by index position along a specified axis. This modifies the array in-place instead of returning a new array.
import numpy as np
arr = np.array([1, 2, 3, 4])
np.delete(arr, 1)
print(arr)
# [1 3 4]
numpy.insert()
The insert()
method allows you to insert elements into an array at a given index position along a specified axis. This also modifies the original array in-place.
import numpy as np
arr = np.array([1, 2, 4, 5])
np.insert(arr, 2, 3)
print(arr)
# [1 2 3 4 5]
Now let’s explore the full syntax, parameters, and examples of using these functions for modifying NumPy arrays.
numpy.append() Detail and Examples
The full syntax for numpy.append()
is:
numpy.append(arr, values, axis=None)
Where:
arr
is the array you want to append to.values
are the array or value(s) you want to append.axis
(optional) is the axis along which you want to append the values. By default it is None, which appends flattened values to the end of arr.
The return value is a new NumPy array with the concatenated elements.
Let’s look at some examples of appending arrays and values:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Append arr2 to arr1
arr_appended = np.append(arr1, arr2)
print(arr_appended)
# [1 2 3 4 5 6]
# Append a value to arr1
arr_appended = np.append(arr1, 100)
print(arr_appended)
# [ 1 2 3 100]
We can also append arrays along an axis. For example, appending rows:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
# Append new row
arr2 = np.array([[7, 8, 9]])
arr_appended = np.append(arr1, arr2, axis=0)
print(arr_appended)
# [[1 2 3]
# [4 5 6]
# [7 8 9]]
Appending columns by specifying axis=1
:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
arr_appended = np.append(arr1, arr2, axis=1)
print(arr_appended)
# [[1 2 5 6]
# [3 4 5 6]]
numpy.delete() Detail and Examples
The full syntax for numpy.delete()
is:
numpy.delete(arr, obj, axis=None)
Where:
arr
is the array you want to delete fromobj
are the indices or slices to deleteaxis
(optional) is the axis along which to delete elements. By default it is flattened.
Unlike append()
, delete()
modifies the input array in-place rather than returning a new array.
Let’s go through some examples of deleting by index position or slice:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Delete index 1
np.delete(arr, 1)
print(arr)
# [1 3 4 5]
arr = np.array([1, 2, 3, 4, 5])
# Delete slice
np.delete(arr, [1, 3])
print(arr)
# [1 3 5]
We can also delete along an axis like rows or columns:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Delete second row
np.delete(arr, 1, axis=0)
# [[1 2 3]
# [7 8 9]]
# Delete first column
np.delete(arr, 0, axis=1)
# [[2 3]
# [5 6]
# [8 9]]
numpy.insert() Detail and Examples
The full syntax for numpy.insert()
is:
numpy.insert(arr, obj, values, axis=None)
Where:
arr
is the input array to insert intoobj
is the index to insert atvalues
are the values to insertaxis
(optional) is axis along which to insert, default is flattened
The insert()
method also modifies the array in-place rather than returning a new array.
Let’s insert some values:
import numpy as np
arr = np.array([1, 2, 4, 5])
# Insert at index 2
np.insert(arr, 2, 3)
print(arr)
# [1 2 3 4 5]
# Insert multiple values
np.insert(arr, [2, 3], [100, 200])
print(arr)
# [ 1 2 100 3 200 4 5]
We can also insert along an axis like rows or columns:
arr = np.array([[1, 2], [5, 6]])
# Insert row
np.insert(arr, 1, [[3, 4]], axis=0)
# [[1 2]
# [3 4]
# [5 6]]
# Insert column
np.insert(arr, 1, [[10], [20]], axis=1)
# [[ 1 10 2]
# [ 3 20 4]
# [ 5 20 6]]
Practical Examples and Use Cases
Now let’s go through some practical examples of how you can use these functions for real-world data processing and analysis.
Adding a Row to a DataFrame
Pandas DataFrames are built on top of NumPy arrays. We can use numpy.append()
to add a new row to a DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A','B'])
new_row = np.array([[5, 6]])
df = pd.DataFrame(np.append(df.values, new_row, axis=0), columns=df.columns)
print(df)
A B
0 1 2
1 3 4
2 5 6
Deleting Outliers from Data
We can use numpy.delete()
to remove outlier values that fall outside an expected range. This helps clean up the data before analysis.
import numpy as np
data = np.array([1.1, 5.5, -100, 7.4, 9.0])
max_limit = 9.0
min_limit = -5.0
# Find indices of outliers
outlier_indices = np.where((data > max_limit) | (data < min_limit))
# Delete outliers
np.delete(data, outlier_indices)
print(data)
# [ 1.1 5.5 7.4]
Inserting Missing Values
We can fill in missing values by inserting NaN
or other placeholder values using numpy.insert()
:
import numpy as np
data = np.array([1.1, 2.2, 3.3, 4.4])
# Find indices where missing
missing_indices = np.where(data == 0)
# Insert NaN
np.insert(data, missing_indices, np.nan)
print(data)
# [ 1.1 2.2 3.3 nan 4.4]
Appending Chunks of Data
When loading large datasets, we may need to append chunks of data together as we parse through the files. numpy.append()
provides an efficient way to concatenate these chunks into the full dataset:
import numpy as np
full_data = np.empty(shape=(0,3))
for chunk in data_chunks:
full_data = np.append(full_data, chunk, axis=0)
print(full_data.shape)
# (10000, 3)
Adding Columns to NumPy Arrays
We can add new columns to a 2D NumPy array by appending entire new columns along axis 1:
arr = np.array([[1, 2, 3], [4, 5, 6]])
new_col = np.array([[7], [8]])
arr = np.append(arr, new_col, axis=1)
print(arr)
# [[1 2 3 7]
# [4 5 6 8]]
This allows easily extending NumPy arrays as new data comes in.
Performance Considerations
When adding and removing elements from arrays, be mindful of the performance implications:
append()
andinsert()
create new arrays, which takes more memory and time. Modify in place withdelete()
when possible.- Adding or removing rows is slower than columns, since it requires copying the full rows.
- Use
insert()
sparingly, as moving large arrays is expensive. - Pre-allocating the array size if known can optimize
append()
performance by avoiding extend operations.
Also watch out for unintended consequences like inserts/deletes changing the original array. Copy the array first if needed.
Conclusion
In this guide, we looked at how to use NumPy’s append()
, insert()
, and delete()
functions to add and remove elements from arrays - key skills for manipulating numerical data sets for analysis.
Key takeaways include:
numpy.append()
concatenates arrays and appends values along an axisnumpy.insert()
inserts values into a specific index or position in an arraynumpy.delete()
removes elements by index or slice from an array- The insert and delete methods modify arrays in-place
- Examples of appending rows/columns, removing outliers, filling missing data, and more
Mastering efficient array modifications like these will help you wrangle and preprocess data to feed into NumPy, SciPy and machine learning workflows. The NumPy API provides a versatile toolkit to slice and dice data arrays as needed.