NumPy is a fundamental package for numeric computing in Python. It provides powerful N-dimensional array objects and tools for working with these arrays efficiently and productively. This comprehensive guide will introduce you to the key features of NumPy and how to leverage its capabilities for numeric data processing in Python.
Table of Contents
Open Table of Contents
- Overview of NumPy
- The NumPy N-dimensional Array Object
- Array Creation Functions
- Array Indexing and Slicing
- Broadcasting
- Universal Array Functions
- Array Aggregations
- Array Reshaping and Transpose
- Array Concatenation and Splitting
- Linear Algebra
- Random Sampling with np.random
- Reading and Writing Array Data
- Summary
Overview of NumPy
NumPy (Numerical Python) is an open source Python library that provides multi-dimensional array objects called ndarray
, derived datatypes, and a collection of routines for fast operations on arrays. Some of the key features of NumPy include:
- N-dimensional array object
- Vectorized array operations
- Broadcasting functions
- Linear algebra, Fourier transform, and random number capabilities
NumPy arrays provide a grid-like structure to store homogenous data and are faster and more compact than Python lists. NumPy offers simplified syntax for common mathematical operations like arithmetic, slicing, broadcasting, aggregations, comparisons on array elements.
Here are some common uses of NumPy:
- Math and scientific computing with arrays
- Manipulate matrix and linear algebra operations
- Data analysis and machine learning
- Image and signal processing
- Database operations on array-based datasets
To use NumPy, you need to install it via pip
first:
pip install numpy
Now let’s explore some of the main features of NumPy in detail.
The NumPy N-dimensional Array Object
The foundation of NumPy is the ndarray
object for multi-dimensional arrays. These arrays are fixed size and contain elements of the same type.
To create a simple 1D array:
import numpy as np
arr = np.array([1, 2, 3])
print(arr)
# Output: [1 2 3]
The array dimensions describe the shape of the array. We can inspect the shape like this:
print(arr.shape)
# Output: (3,)
This array has one axis with 3 elements. For a 2D array with 3 rows and 2 columns:
arr_2d = np.array([[1, 2], [3, 4], [5, 6]])
print(arr_2d)
# Output:
# [[1 2]
# [3 4]
# [5 6]]
print(arr_2d.shape)
# Output: (3, 2)
We can also define the data type of the array elements like this:
float_arr = np.array([1.1, 2.2, 3.5], dtype=np.float32)
NumPy supports common data types like float, int, bool, string, datetime64, etc.
Array Creation Functions
NumPy provides various functions to create new arrays based on existing data.
np.zeros and np.ones
Create arrays filled with 0’s or 1’s:
np.zeros((2, 3))
# Output:
# array([[0., 0., 0.],
# [0., 0., 0.]])
np.ones((3, 4))
# Output:
# array([[1., 1., 1., 1.],
# [1., 1., 1., 1.],
# [1., 1., 1., 1.]])
np.full
Create a constant array:
np.full((3, 3), 7)
# Output:
# array([[7, 7, 7],
# [7, 7, 7],
# [7, 7, 7]])
np.arange
Returned evenly spaced values within a specified interval:
np.arange(5, 20, 2)
# Output: array([ 5, 7, 9, 11, 13, 15, 17, 19])
np.linspace
Return evenly spaced numbers over a specified interval with num
samples:
np.linspace(0, 10, 5)
# Output: array([ 0., 2.5, 5., 7.5, 10.])
There are many other helper routines like np.random.rand()
, np.identity()
, etc. Refer to the NumPy documentation for more details.
Array Indexing and Slicing
NumPy arrays can be indexed and sliced like Python lists. For example:
arr = np.array([1, 2, 3, 4])
# Indexing
print(arr[0]) # 1
# Slicing
print(arr[1:3]) # [2 3]
For multidimensional arrays, you can provide a tuple of indices/slices to select elements:
arr_2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(arr_2d[1, 2]) # 6
print(arr_2d[:, 1]) # [2 5 8]
NumPy also provides powerful broadcasting features for array operations.
Broadcasting
Broadcasting allows vectorized operations on arrays of different shapes. The smaller array is broadcast to match the larger array so that they have compatible shapes.
For example:
a = np.array([[1,2,3]]) # Shape (1, 3)
b = np.array([10, 20, 30]) # Shape (3,)
a + b
# Output:
# array([[11, 22, 33]])
Here (3,) array b
is broadcast to (1, 3) to match a
for the addition.
Broadcasting follows these rules:
- Dimensions of size 1 are stretched to match array with longer shape.
- Arrays with same shapes are used directly.
- After stretching, the final arrays must match.
Understanding broadcasting allows vectorized operations on arrays of different dimensions, avoiding slow Python loops.
Universal Array Functions
NumPy provides vectorized versions of many mathematical operations called universal array functions or ufuncs. These operate element-wise on arrays.
For example:
a = np.array([1, 2, 4])
np.sqrt(a)
# Output: array([1. , 1.41421356, 2. ])
Here np.sqrt()
calculates element-wise square root. Other useful ufuncs include np.exp
, np.sin
, np.add
, np.greater
, etc.
Many ufuncs also take an out
parameter to store the output in an existing array rather than create a new one:
out = np.zeros(3)
np.power(a, 2, out)
print(out)
# Output: array([1, 4, 16])
This is more efficient as it avoids allocating new memory.
Array Aggregations
NumPy provides common aggregation functions like sum
, mean
, std
, min
, max
to aggregate array values:
arr = np.array([[1, 2], [3, 4]])
print(np.min(arr)) # 1
print(np.max(arr)) # 4
print(np.sum(arr)) # 10
print(np.mean(arr)) # 2.5
print(np.std(arr)) # 1.118033988749895
We can also specify the axis along which to compute the aggregations:
print(arr.sum(axis=0)) # [4 6]
print(arr.min(axis=1)) # [1 3]
Array Reshaping and Transpose
The shape of an array can be modified without changing the data:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.reshape(3, 2))
# Output:
# [[1 2]
# [3 4]
# [5 6]]
The transpose()
method swaps axes:
print(arr.transpose())
# [[1 4]
# [2 5]
# [3 6]]
Array Concatenation and Splitting
NumPy provides operations like np.concatenate
, np.stack
, np.hstack
, np.vstack
etc. to combine arrays:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
np.concatenate((a, b), axis=0)
# Output:
# [[1 2]
# [3 4]
# [5 6]]
np.hstack((a, b)) # Horizontal stack
# Output:
# [[1 2 5 6]]
np.vstack((a, b)) # Vertical stack
# Output:
# [[1 2]
# [3 4]
# [5 6]]
Similarly, np.split
, np.hsplit
, np.vsplit
can be used to split arrays.
Linear Algebra
NumPy provides tools for linear algebra operations on arrays:
a = np.array([[1,1], [0,1]])
b = np.array([2,2])
x = np.linalg.solve(a, b)
print(x)
# Output: array([2., 1.])
# Solve ax = b
Other linear algebra capabilities include matrix eigendecomposition, determinants, vector/matrix norms, matrix multiplication etc.
Random Sampling with np.random
NumPy’s random module np.random
provides various functions for random number generation and sampling from different statistical distributions:
np.random.rand(2, 3) # Uniform distribution
# Output:
# array([[0.69646919, 0.28613933, 0.22685145],
# [0.55131477, 0.71946897, 0.4236548 ]])
np.random.randn(2, 3) # Standard normal distribution
np.random.randint(1, 10, 5) # Random ints
np.random.choice([1, 2, 3], 5) # Random sample
This allows easily generating test data, sampling from simulations, and many other use cases.
Reading and Writing Array Data
NumPy provides convenience functions to read data from files into arrays and write array data to files:
data = np.genfromtxt('data.csv', delimiter=',')
arr = np.array([[1, 2], [3, 4]])
np.save('arr.npy', arr)
arr_reloaded = np.load('arr.npy')
It supports various file formats like CSV, JSON, Numpy binary .npy
, etc.
Summary
In this guide, we looked at some of the key aspects of NumPy for numeric data processing in Python:
- Creating N-dimensional array objects for efficient data storage and vectorized operations.
- Array indexing, slicing, and broadcasting features.
- Ufuncs for element-wise array computations like
sqrt
,sin
,exp
. - Aggregation methods like
sum
,mean
,min
,max
. - Tools for linear algebra, random sampling, IO with arrays.
NumPy is a foundational package for scientific computing, data analysis, and machine learning applications in Python. Mastering NumPy enables you to work efficiently with large datasets in Python.
Check out the official NumPy user guide and reference documentation for more details on all available functionality. The NumPy API is quite extensive, so focus on the essential parts relevant to your specific data tasks.