Skip to content

NumPy: Mathematical and Statistical Operations in Python

Updated: at 03:37 AM

NumPy is a fundamental Python package for scientific computing and data analysis. It provides efficient implementation of multidimensional arrays and matrices along with a large collection of high-level mathematical functions and operators to operate on these arrays. NumPy is extremely useful for performing mathematical, statistical, and logical operations on arrays efficiently without writing loops.

This comprehensive guide will provide an overview of NumPy and how to leverage its capabilities for mathematical and statistical computations in Python. We will cover the key features of NumPy arrays, vectorization, broadcasting, universal functions (ufuncs), aggregation, masking, sorting, random number generation, linear algebra, statistics, and more. Code examples are provided to illustrate the functionality.

Table of Contents

Open Table of Contents

Introduction

NumPy aims to provide an efficient multidimensional array and matrix manipulation facility for Python while retaining compatibility with its built-in arrays. Some of the key features of NumPy include:

The ndarray provided by NumPy forms the central data structure for many other Python scientific computing packages like SciPy, Matplotlib, Pandas, scikit-learn, TensorFlow, and more. Understanding NumPy arrays and mathematical operations is essential for effective data analysis and machine learning with Python.

Let’s explore the essential NumPy capabilities for performing mathematical and statistical computations on arrays.

Importing NumPy

To start using NumPy, we first need to import the numpy package:

import numpy as np

The conventional alias np is used for the numpy module to make the code more concise.

Creating NumPy Arrays

The fundamental object of NumPy is the homogeneous multidimensional ndarray array. These arrays are fixed-size with elements stored contiguously in memory. We can create new arrays from lists or tuples using the np.array() method:

vector = np.array([1, 2, 3])

matrix = np.array([[1, 2], [3, 4]])

The array’s dtype (data type) is inferred from the input data but can also be explicitly specified:

int_array = np.array([1, 2, 3], dtype=np.int32)

float_array = np.array([1.1, 2.2, 3.3], dtype=np.float64)

Useful array creation functions like zeros(), ones(), full(), arange(), linspace(), etc. are also provided for generating arrays populated with specific values.

Multi-dimensional arrays can be created by passing in nested Python structures like lists of lists. The dimensions and shape of an array can be accessed through its ndim and shape attributes:

three_d_array = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print(three_d_array.ndim) # 3
print(three_d_array.shape) # (2, 2, 2)

Array Indexing and Slicing

NumPy arrays facilitate flexible indexing and slicing with basic and advanced indexing capabilities. We can access elements at specific indices, obtain sections and subsets of the array, and assign new values.

Basic slicing syntax is similar to Python lists:

array = np.array([1, 2, 3, 4, 5])

# Get first 3 elements
array[:3]

# Get last 3 elements
array[2:]

Individual elements can be accessed via integers array indices:

array[0] # 1
array[2] # 3

NumPy also provides full slicing, stride slicing, boolean indexing, and more:

two_d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Get inner 2x2 sub-array
two_d_array[1:3, 1:3]

# Stride slicing to extract diagonals
two_d_array[[0,1,2], [0,1,2]]

# Boolean indexing
two_d_array[two_d_array > 2]

Assigning new values via indexing modifies the array inplace:

array[0] = 9 # Change first element to 9

Broadcasted Operations

When performing operations between NumPy arrays, the smaller array is broadcasted across the larger array so that they have compatible shapes. This allows vectorized operations without explicit looping.

For example, adding a scalar value to a ndarray:

array = np.array([[1, 2], [3, 4]])

array + 5
# [[6 7]
# [8 9]]

The scalar value 5 is broadcasted and added to each element. This works for any operation between scalars or 1D arrays with larger arrays.

We can also leverage broadcasting to vectorize operations between arrays:

array1 = np.array([1, 2, 3])
array2 = np.array([0, 2, 4])

array1 + array2
# [1 4 7]

The smaller array’s dimensions are stretched to fit the larger array, eliminating the need to loop over elements.

Universal Array Functions

NumPy provides a large set of vectorized universal array functions called ufuncs that perform element-wise operations on arrays. This allows efficient mathematical operations without Python loops.

For example:

array = np.array([1, 2, 3, 4])

np.sqrt(array) # Square root of each element
np.exp(array) # Exponential of each element
np.sin(array) # Sine of each element

These work with scalars or multiple array arguments:

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

np.maximum(x, y) # Element-wise maximum
# [4, 5, 6]

NumPy provides ufuncs for arithmetic, comparison, trigonometric, statistical, linear algebra and other operations.

Array Aggregations

NumPy has built-in functions to compute aggregations over array elements like sum(), mean(), std(), var(), min(), max() etc.

For example:

array = np.array([1, 3, 4, 7, 5])

array.mean() # 4.0
array.std() # 2.1213203435596424
array.min() # 1
array.max() # 7

These can also be applied along specific axes of multidimensional arrays:

two_d_array = np.array([[1, 3],
                        [5, 7]])

two_d_array.sum(axis=0) # [6 10]
two_d_array.min(axis=1) # [1 5]

Mathematical and Statistical Functions

In addition to universal functions, NumPy has a large library of vectorized mathematical and statistical functions that operate on entire arrays:

x = np.arange(5)

np.power(x, 3) # x^3

np.square(x) # x^2

np.log(x) # ln(x)

np.median(x)

np.corrcoef(x) # correlation matrix

These provide efficient implementations of commonly used mathematical formulas, norms, products, regression, etc. without explicit loops.

NumPy random module provides various distributions and methods for random sampling - useful for simulations and probabilistic modeling:

from numpy import random

samples = random.normal(size=1000) # Gaussian

random.binomial(n=10, p=0.5, size=10) # Binomial

Linear Algebra

NumPy has a linalg module for linear algebra operations on arrays. This includes methods for:

For example:

import numpy.linalg as linalg

x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

dot_product = linalg.dot(x, y) # Standard matrix product

eigenvalues = linalg.eig(x)

This makes NumPy very useful for applied linear algebra.

Sorting Arrays

NumPy arrays can be sorted in-place along specified axes using sort() and argsort() methods:

unsorted_array = np.array([3, 1, 2])

sorted_array = np.sort(unsorted_array)
# [1 2 3]

# Get array indices that would sort an array
sort_indices = np.argsort(unsorted_array)

For 2D arrays, we can sort along rows or columns:

two_d_array = np.array([[5, 2], [4, 1]])

sorted_rows = np.sort(two_d_array, axis=0)
# [[4 1]
# [5 2]]

sorted_cols = np.sort(two_d_array, axis=1)
# [[2 5]
# [1 4]]

Masked Arrays

Masked arrays provide a way to handle missing or invalid data in NumPy. Masks can be applied to hide values in computations where needed.

We create masked arrays using np.ma.masked_array():

data = np.array([1, 2, 3, -999, 4])
mask = np.ma.masked_array(data, mask=[0, 0, 0, 1, 0])

print(mask)
# [1 2 3 -- 4]

The masked value is ignored in computations:

print(mask.mean()) # 2.5
print(mask.sum()) # 7

We can access the underlying masked data with mask.data and mask.mask.

Reshaping and Transposing Arrays

The shape of arrays can be modified without copying any data using reshape() and newaxis:

array = np.array([1, 2, 3, 4])

array.reshape(2, 2)
# [[1 2]
# [3 4]]

array[np.newaxis, :] # Adds new axis
# [[1 2 3 4]]

transpose() switches index order to permute axes:

array = np.arange(6).reshape(2, 3)

array.transpose()
# [[0 3]
# [1 4]
# [2 5]]

Reading and Writing Array Data

NumPy provides utilities to read and write array data to disk efficiently in binary format. This can be done with:

Large arrays can be mapped to files on disk with np.memmap without fully loading them into memory.

Conclusions

The NumPy package enables efficient mathematical and statistical computations on arrays in Python without for loops. Key capabilities include:

NumPy is fundamental for building mathematical and scientific applications with Python. Using its array-oriented computing tools can help optimize code and achieve orders of magnitude speedups over loops. This guide provided an overview of the core functionality - refer to the official NumPy documentation and resources for more details.