NumPy’s ndarray object is a powerful N-dimensional array object that enables efficient numerical computing in Python. As one of NumPy’s core data structures, understanding ndarrays is crucial for effective use of the NumPy library and Python programming for scientific computing and data analysis applications. This guide provides a comprehensive overview of NumPy ndarrays, including creation, attributes, indexing and slicing, various operations, broadcasting, array manipulations, comparisons, input/output, and more.
Table of Contents
Open Table of Contents
Introduction to NumPy ndarrays
The NumPy ndarray (N-dimensional array object) is an efficient container for homogeneous data types. Arrays allow vectorized operations that are fast and concise compared to Python lists and tuples. Key attributes of NumPy arrays include:
- N-dimensional - arrays can have arbitrary dimensions, allowing storage of multidimensional data.
- Homogeneous data - all array elements must be of the same Python data type.
- Fixed size - the shape (number of elements) is defined at creation.
- Fast mathematical operations - optimized for numerical operations without Python for-loops.
- Broadcastable operations - arithmetic operations can be vectorized and applied to the entire array.
Below is a simple example of creating a 1-dimensional NumPy array:
import numpy as np
arr = np.array([1, 2, 3])
print(arr)
# Output
[1 2 3]
Numpy arrays provide substantial performance and productivity benefits for computing with numeric data compared to Python lists.
Creating NumPy Arrays
There are several ways to create NumPy arrays:
From Python Lists
Convert Python lists and tuples directly into arrays:
import numpy as np
py_list = [1, 2, 3]
arr = np.array(py_list)
py_tuple = (4, 5, 6)
arr = np.array(py_tuple)
Multidimensional arrays can be created by passing nested Python sequences:
py_matrix = [[1, 2], [3, 4]]
arr = np.array(py_matrix)
# 2D array
print(arr)
[[1 2]
[3 4]]
With NumPy Functions
Use NumPy functions like np.zeros
, np.ones
, np.full
, np.arange
, etc. to create arrays:
np.zeros(2) # 1D array of 2 zeros
np.ones((2, 3)) # 2D array with 2 x 3 ones
np.full((3, 2), 99) # 3x2 array filled with 99
np.arange(5) # 1D array from 0 to 5 (like range)
np.linspace(0, 1, 5) # 1D array of 5 evenly divided values
Reading Arrays From Disk
Build arrays from data in files using np.loadtxt
, np.genfromtxt
, etc:
arr = np.loadtxt('data.txt')
arr = np.genfromtxt('data.csv', delimiter=',')
Array Attributes
NumPy arrays have various attributes that provide information about the data:
ndarray.shape
- Tuple of array dimensionsndarray.dtype
- Data type of array elementsndarray.size
- Total number of array elementsndarray.ndim
- Number of array dimensions
For example:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # (2, 3)
print(arr.dtype) # int64
print(arr.size) # 6
print(arr.ndim) # 2
Other attributes like itemsize
, nbytes
, etc provide additional details.
Array Indexing and Slicing
NumPy arrays can be indexed and sliced like Python lists, but extended for N dimensions:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Indexing
print(arr[1, 2]) # 6
# Slicing
print(arr[:, 1]) # [2 5 8]
print(arr[1:3, :]) # 2D array of rows 1 and 2
Important! Array slicing returns a view instead of copy. Changes to the slice also modify original array. Explicit copying is required if needed.
Arrays can also be boolean indexed based on conditional filters:
filter = arr > 5
arr[filter] # [6 7 8 9]
Array Operations
NumPy makes array operations fast and convenient. Common mathematical operations are overloaded as vectorized element-wise operations:
arr = np.array([1, 2, 3])
arr + 2 # [3 4 5]
arr - 1 # [0 1 2]
arr * 10 # [10 20 30]
arr / 2 # [0.5 1 1.5]
Other operations like trigonometric, exponential, etc are similarly overloaded:
np.sin(arr)
np.log(arr)
np.abs(arr)
Matrix operations use @
for dot product:
matrix_a @ matrix_b
Benefits: No slow Python loops needed! Operations are fast and applied element-wise.
Broadcasting
Broadcasting allows vectorized operations between arrays of different shapes. NumPy expands dimensions of smaller arrays to “broadcast” along larger array:
a = np.array([1, 2, 3]) # Shape (3,)
b = np.array([[10], [20], [30]]) # Shape (3, 1)
a + b # Shape (3, 3) with broadcasting
"""
[[11 12 13]
[21 22 23]
[31 32 33]]
"""
Rules:
- Dimensions are expanded from left to right.
- Arrays must have equal final dimensions.
- Copies are avoided where possible.
Broadcasting prevents slow for-loops and enables fast vectorized calculations.
Array Manipulations
NumPy provides various manipulation methods like sorting, reshaping, joining, splitting, appending, etc:
arr = np.random.randint(10, size=6) # One dimensional
arr.sort() # In-place sorting
arr = arr.reshape(2, 3) # Reshape to two-dimensional
arr = np.vstack([arr, arr]) # Stack arrays vertically
arr = np.hstack([arr, arr]) # Stack arrays horizontally
arr = np.append(arr, [11, 12]) # Append new values
lower, upper = np.split(arr, 2) # Split array at index 2
Other functions like concatenate
, delete
, insert
, etc provide more flexibility.
Array Comparisons
Element-wise comparisons produce boolean arrays:
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([3, 2, 3, 4])
arr1 == arr2 # [False True True True]
arr1 < arr2 # [ True False False False]
arr1 != arr2 # [ True False True False]
Can directly use comparison operators like <
, >
, ==
, etc.
Logic operators like &
(and), |
(or) are also overloaded for arrays.
Input and Output
Converting arrays to and from other formats:
To/from NumPy
np.save('file.npy', arr)
- Save array to disk in binary .npy formatarr = np.load('file.npy')
- Load .npy file into array
To Python
arr.tolist()
- Convert array to normal Python list
From CSV
np.savetxt('file.csv', arr, delimiter=",")
- Save array to CSV filearr = np.loadtxt('file.csv', delimiter=",")
- Load CSV into array
Display
print(arr)
- Print array contents (truncates for large arrays)np.set_printoptions(threshold=np.inf)
- Print full ndarrays without truncation
Advanced Topics
This provides an overview of ndarray basics. NumPy has many advanced features:
- Array methods like
sum
,mean
,std
,min
,max
for math operations - Element-wise and matrix-multiplication dot products
- Axis concepts and reductions over dimensions
- Broadcasting rules
- Performance profiling with
numpy.prof
- Linear algebra, random sampling, FFT, etc
- And much more!
Summary
Key points about NumPy ndarrays:
- N-dimensional array for fast numerical computing
- Vectorized operations improve performance
- Indexing and slicing for data access
- Methods to reshape, join, split, sort arrays
- Fast I/O with binary, CSV, and other formats
- Built-in math, logic, comparisons, etc
- Broadcasted operations on differently sized arrays
Ndarrays are essential for any Python programmer working with data and NumPy is a must-have library for array programming. This guide covers the basics, but there is much more to explore!