NumPy is a fundamental Python package for scientific computing and data analysis. It provides an efficient multidimensional array object called ndarray
that allows fast mathematical operations on arrays of data. One of the most common data manipulation tasks is joining and splitting these NumPy arrays. This guide will provide a comprehensive overview of the key functions to concatenate and split NumPy arrays in Python - np.concatenate()
and np.split()
.
We will cover the following topics in-depth with example code snippets:
Table of Contents
Open Table of Contents
Overview of NumPy Arrays
NumPy arrays are the building blocks of numerical computing in Python. Unlike Python lists, NumPy arrays are homogeneous in data type, fast, and memory-efficient for large data sets.
Some key properties of NumPy arrays:
- Homogeneous data types: All elements in an array have the same data type unlike Python lists.
- Fixed size: An array has a fixed size at creation unlike Python lists which can grow dynamically.
- Fast mathematical operations: NumPy arrays allow faster element-wise operations like addition, multiplication, etc. without Python for-loops.
- Multidimensional: Arrays can have 1, 2, or more dimensions. 1D array = vector, 2D array = matrix.
Let’s create a simple 1D array:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)
# [1 2 3 4]
The key difference between Python lists and NumPy arrays is that arrays are restricted to having elements of the same data type while lists can have elements of different data types.
Joining Arrays using concatenate()
np.concatenate()
joins 1D or multidimensional arrays along a specified axis into a single array. It is one of the most commonly used functions for combining NumPy arrays.
The syntax for basic concatenation is:
np.concatenate((arr1, arr2, arr3), axis=0)
Where arr1
, arr2
, arr3
are the arrays to be joined and axis
specifies the axis along which concatenation occurs.
Basic 1D Concatenation Along Different Axes
For 1D arrays, we can concatenate along axis 0:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
concat_arr = np.concatenate((arr1, arr2))
print(concat_arr)
# [1 2 3 4 5 6]
This stacks arr2
horizontally after arr1
, returning a new 1D array.
For 2D arrays, the axis
parameter allows concatenation along rows (axis 0) or columns (axis 1).
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concat_1 = np.concatenate((arr1, arr2), axis=0)
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
concat_2 = np.concatenate((arr1, arr2), axis=1)
# [[1 2 5 6]
# [3 4 7 8]]
Concatenating 3 or More Arrays
To join more than 2 arrays, pass them as a tuple:
arr1 = np.array([1, 2])
arr2 = np.array([3, 4])
arr3 = np.array([5, 6])
concat_arr = np.concatenate((arr1, arr2, arr3))
print(concat_arr)
# [1 2 3 4 5 6]
This extends to higher dimensional arrays as well:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
concat_arr = np.concatenate((arr1, arr2), axis=0)
print(concat_arr)
# [[1 2]
# [3 4]
# [5 6]]
Concatenating Arrays with Different Dimensions
For concatenate to work, all the input arrays must have the same number of dimensions. If not, it will raise a ValueError
.
For example:
arr1 = np.array([1, 2])
arr2 = np.array([[3, 4], [5, 6]])
np.concatenate((arr1, arr2))
# ValueError: all the input arrays must have same number of dimensions
To fix this, you can reshape the arrays to have the same number of dimensions before concatenating:
arr1 = np.array([1, 2])
arr2 = np.array([[3, 4],
[5, 6]])
arr1 = arr1.reshape(1, 2)
concat_arr = np.concatenate((arr1, arr2), axis=0)
print(concat_arr)
# [[1 2]
# [3 4]
# [5 6]]
Concatenating Stacked Arrays
For stacked sequences, use np.vstack()
or np.hstack()
instead of concatenate.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
stack_h = np.hstack((arr1, arr2))
# [1 2 3 4 5 6]
arr3 = np.array([7, 8, 9])
stack_v = np.vstack((arr1, arr2, arr3))
# [[1 2 3]
# [4 5 6]
# [7 8 9]]
vstack()
stacks arrays vertically (row-wise) while hstack()
stacks them horizontally (column-wise).
Splitting Arrays using split()
np.split()
divides an array into multiple sub-arrays along a specified axis. The syntax is:
np.split(array, indices_or_sections, axis)
Where:
array
is the array to splitindices_or_sections
specifies how to splitaxis
is the axis along which to split, default is 0
Let’s look at different ways to split arrays:
Splitting Along a Given Axis
Split an array into 2 parts along axis 0:
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, 2)
print(split_arr)
# [array([1, 2, 3]), array([4, 5, 6])]
For 2D arrays, you can split along rows (axis 0) or columns (axis 1):
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
row_split = np.split(arr, 2, axis=0)
# [array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]])]
col_split = np.split(arr, 2, axis=1)
# [array([[1], [3], [5], [7]]), array([[2], [4], [6], [8]])]
Specifying Number of Split Sections
We can also specify the number of sections to split the array into using an integer:
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, 3)
print(split_arr)
# [array([1, 2]), array([3, 4]), array([5, 6])]
Here the array is divided into 3 equal-sized parts.
Splitting Into Arrays of Equal Shape
Use np.array_split()
instead to split into arrays of equal shape by passing the number of splits:
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.array_split(arr, 3)
print(split_arr)
# [array([1, 2]), array([3, 4]), array([5, 6])]
This ensures the sub-arrays have equal shape, ignoring exact indices.
Use Cases and Applications
Joining and splitting NumPy arrays is useful in many common scenarios:
Combining Data from Multiple Sources
import numpy as np
data1 = np.genfromtxt('data1.csv', delimiter=',')
data2 = np.genfromtxt('data2.csv', delimiter=',')
full_data = np.concatenate((data1, data2), axis=0)
Splitting Data into Training and Test Sets
from sklearn.model_selection import train_test_split
data = np.arange(10).reshape((5, 2))
train, test = train_test_split(data, test_size=0.33)
Reshaping Arrays by Joining and Splitting
arr = np.arange(9).reshape(3,3)
row_arr = np.split(arr, 3, axis=0)
concat_arr = np.concatenate(row_arr, axis=1)
Many more applications like combining image data, audio samples, time series data, etc.
Performance Comparisons to Python Lists
NumPy array operations are much faster than Python lists due to optimized C and Fortran backends.
Let’s concatenate two 1D arrays with 1 million elements:
import numpy as np
import time
arr1 = np.arange(1000000)
arr2 = np.arange(1000000)
start = time.time()
arr3 = np.concatenate([arr1, arr2])
print("NumPy runtime:", time.time() - start)
# NumPy runtime: 0.009985446939086914
start = time.time()
arr4 = arr1.tolist() + arr2.tolist()
print("List runtime:", time.time() - start)
# List runtime: 0.9321310520172119
NumPy is around 100x faster than Python lists for this operation. The performance gains are even larger on bigger arrays.
Common Errors and Solutions
Here are some common errors faced while using concatenate()
and split()
, along with fixes:
Error:
ValueError: all the input arrays must have same number of dimensions
Fix: Reshape arrays to have same number of dimensions before concatenating
Error:
ValueError: array split does not result in an equal division
Fix: Use np.array_split()
instead to split into equal shapes
Error:
AxisError: axis 1 is out of bounds for array of dimension 1
Fix: Specify axis=0
for 1D arrays
Error:
ValueError: not enough values to unpack (expected 3, got 2)
Fix: Make sure number of arrays matches split sections in np.split()
Conclusion
In this comprehensive guide, we covered how to use np.concatenate()
and np.split()
to join and divide NumPy arrays along given axes. Manipulating array data using these functions is fast, flexible, and avoids slow Python loops.
Key points to remember:
concatenate()
joins arrays along an axis into a single arraysplit()
divides an array into multiple sub-arrays along an axis- Specify
axis=0
to concatenate row-wise andaxis=1
for column-wise - Set
number of splits
orsplit indices
to control how the array is divided - Use
vstack()
,hstack()
to stack arrays vertically or horizontally - Reshape arrays to match dimensions before concatenating
- Prefer array operations over Python lists for performance
With this knowledge, you can now efficiently join and split array data for tasks like combining data sources, transforming array shapes, training/testing splits and more. The practices discussed will help you write fast, robust NumPy code in Python.