Skip to content

Joining and Splitting NumPy Arrays: A Comprehensive Guide to `concatenate()` and `split()`

Updated: at 02:44 AM

NumPy is a fundamental Python package for scientific computing and data analysis. It provides an efficient multidimensional array object called ndarray that allows fast mathematical operations on arrays of data. One of the most common data manipulation tasks is joining and splitting these NumPy arrays. This guide will provide a comprehensive overview of the key functions to concatenate and split NumPy arrays in Python - np.concatenate() and np.split().

We will cover the following topics in-depth with example code snippets:

Table of Contents

Open Table of Contents

Overview of NumPy Arrays

NumPy arrays are the building blocks of numerical computing in Python. Unlike Python lists, NumPy arrays are homogeneous in data type, fast, and memory-efficient for large data sets.

Some key properties of NumPy arrays:

Let’s create a simple 1D array:

import numpy as np

arr = np.array([1, 2, 3, 4])
print(arr)
# [1 2 3 4]

The key difference between Python lists and NumPy arrays is that arrays are restricted to having elements of the same data type while lists can have elements of different data types.

Joining Arrays using concatenate()

np.concatenate() joins 1D or multidimensional arrays along a specified axis into a single array. It is one of the most commonly used functions for combining NumPy arrays.

The syntax for basic concatenation is:

np.concatenate((arr1, arr2, arr3), axis=0)

Where arr1, arr2, arr3 are the arrays to be joined and axis specifies the axis along which concatenation occurs.

Basic 1D Concatenation Along Different Axes

For 1D arrays, we can concatenate along axis 0:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

concat_arr = np.concatenate((arr1, arr2))
print(concat_arr)
# [1 2 3 4 5 6]

This stacks arr2 horizontally after arr1, returning a new 1D array.

For 2D arrays, the axis parameter allows concatenation along rows (axis 0) or columns (axis 1).

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

concat_1 = np.concatenate((arr1, arr2), axis=0)
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

concat_2 = np.concatenate((arr1, arr2), axis=1)
# [[1 2 5 6]
#  [3 4 7 8]]

Concatenating 3 or More Arrays

To join more than 2 arrays, pass them as a tuple:

arr1 = np.array([1, 2])
arr2 = np.array([3, 4])
arr3 = np.array([5, 6])

concat_arr = np.concatenate((arr1, arr2, arr3))
print(concat_arr)
# [1 2 3 4 5 6]

This extends to higher dimensional arrays as well:

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])

concat_arr = np.concatenate((arr1, arr2), axis=0)
print(concat_arr)
# [[1 2]
#  [3 4]
#  [5 6]]

Concatenating Arrays with Different Dimensions

For concatenate to work, all the input arrays must have the same number of dimensions. If not, it will raise a ValueError.

For example:

arr1 = np.array([1, 2])
arr2 = np.array([[3, 4], [5, 6]])

np.concatenate((arr1, arr2))
# ValueError: all the input arrays must have same number of dimensions

To fix this, you can reshape the arrays to have the same number of dimensions before concatenating:

arr1 = np.array([1, 2])
arr2 = np.array([[3, 4],
                 [5, 6]])

arr1 = arr1.reshape(1, 2)

concat_arr = np.concatenate((arr1, arr2), axis=0)
print(concat_arr)

# [[1 2]
#  [3 4]
#  [5 6]]

Concatenating Stacked Arrays

For stacked sequences, use np.vstack() or np.hstack() instead of concatenate.

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

stack_h = np.hstack((arr1, arr2))
# [1 2 3 4 5 6]

arr3 = np.array([7, 8, 9])
stack_v = np.vstack((arr1, arr2, arr3))
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

vstack() stacks arrays vertically (row-wise) while hstack() stacks them horizontally (column-wise).

Splitting Arrays using split()

np.split() divides an array into multiple sub-arrays along a specified axis. The syntax is:

np.split(array, indices_or_sections, axis)

Where:

Let’s look at different ways to split arrays:

Splitting Along a Given Axis

Split an array into 2 parts along axis 0:

arr = np.array([1, 2, 3, 4, 5, 6])

split_arr = np.split(arr, 2)
print(split_arr)
# [array([1, 2, 3]), array([4, 5, 6])]

For 2D arrays, you can split along rows (axis 0) or columns (axis 1):

arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])

row_split = np.split(arr, 2, axis=0)
# [array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]])]

col_split = np.split(arr, 2, axis=1)
# [array([[1], [3], [5], [7]]), array([[2], [4], [6], [8]])]

Specifying Number of Split Sections

We can also specify the number of sections to split the array into using an integer:

arr = np.array([1, 2, 3, 4, 5, 6])

split_arr = np.split(arr, 3)
print(split_arr)
# [array([1, 2]), array([3, 4]), array([5, 6])]

Here the array is divided into 3 equal-sized parts.

Splitting Into Arrays of Equal Shape

Use np.array_split() instead to split into arrays of equal shape by passing the number of splits:

arr = np.array([1, 2, 3, 4, 5, 6])

split_arr = np.array_split(arr, 3)
print(split_arr)
# [array([1, 2]), array([3, 4]), array([5, 6])]

This ensures the sub-arrays have equal shape, ignoring exact indices.

Use Cases and Applications

Joining and splitting NumPy arrays is useful in many common scenarios:

Combining Data from Multiple Sources

import numpy as np

data1 = np.genfromtxt('data1.csv', delimiter=',')
data2 = np.genfromtxt('data2.csv', delimiter=',')

full_data = np.concatenate((data1, data2), axis=0)

Splitting Data into Training and Test Sets

from sklearn.model_selection import train_test_split

data = np.arange(10).reshape((5, 2))

train, test = train_test_split(data, test_size=0.33)

Reshaping Arrays by Joining and Splitting

arr = np.arange(9).reshape(3,3)

row_arr = np.split(arr, 3, axis=0)
concat_arr = np.concatenate(row_arr, axis=1)

Many more applications like combining image data, audio samples, time series data, etc.

Performance Comparisons to Python Lists

NumPy array operations are much faster than Python lists due to optimized C and Fortran backends.

Let’s concatenate two 1D arrays with 1 million elements:

import numpy as np
import time

arr1 = np.arange(1000000)
arr2 = np.arange(1000000)

start = time.time()
arr3 = np.concatenate([arr1, arr2])
print("NumPy runtime:", time.time() - start)
# NumPy runtime: 0.009985446939086914

start = time.time()
arr4 = arr1.tolist() + arr2.tolist()
print("List runtime:", time.time() - start)
# List runtime: 0.9321310520172119

NumPy is around 100x faster than Python lists for this operation. The performance gains are even larger on bigger arrays.

Common Errors and Solutions

Here are some common errors faced while using concatenate() and split(), along with fixes:

Error:

ValueError: all the input arrays must have same number of dimensions

Fix: Reshape arrays to have same number of dimensions before concatenating

Error:

ValueError: array split does not result in an equal division

Fix: Use np.array_split() instead to split into equal shapes

Error:

AxisError: axis 1 is out of bounds for array of dimension 1

Fix: Specify axis=0 for 1D arrays

Error:

ValueError: not enough values to unpack (expected 3, got 2)

Fix: Make sure number of arrays matches split sections in np.split()

Conclusion

In this comprehensive guide, we covered how to use np.concatenate() and np.split() to join and divide NumPy arrays along given axes. Manipulating array data using these functions is fast, flexible, and avoids slow Python loops.

Key points to remember:

With this knowledge, you can now efficiently join and split array data for tasks like combining data sources, transforming array shapes, training/testing splits and more. The practices discussed will help you write fast, robust NumPy code in Python.