NumPy is a fundamental Python package for scientific computing and data analysis. It provides efficient implementation of multidimensional arrays and matrices along with a vast library of high-level mathematical functions to operate on these arrays.

One of the most widely used features of NumPy is aggregates - methods to summarize ndarray objects by applying various statistical operations on the array elements. These include sum, mean, median, minimum, maximum, and standard deviation.

In this guide, we will provide a comprehensive overview of using NumPy aggregations in Python. We will cover the following topics:

## Table of Contents

## Open Table of Contents

- Overview of NumPy Aggregations
- Importing NumPy
- Creating NumPy Arrays
- NumPy Sum
- NumPy Mean
- NumPy Median
- NumPy Minimum and Maximum
- NumPy Standard Deviation
- Weighted Aggregations
- Aggregates for Boolean Arrays
- Accumulate Aggregates With
`reduce`

- Comparison to Built-in sum() and min()/max()
- Aggregations on Pandas Dataframes
- Conclusion

## Overview of NumPy Aggregations

NumPy aggregates allow you to condense arrays into useful summary statistics with a single method call. This enables concise data exploration and analysis.

The main aggregation functions provided by NumPy are:

`np.sum`

- Calculates the sum of array elements.`np.mean`

- Computes the arithmetic mean or average.`np.median`

- Finds the median or middle value of the data.`np.min`

- Gets the minimum element of the array.`np.max`

- Returns the maximum element.`np.std`

- Calculates the standard deviation.

These work on both single-dimensional and multi-dimensional arrays. Additional related functions like `np.prod`

, `np.cumsum`

, etc. are also available.

Aggregations are computed along a specified axis of the array by default. However, the `numpy.all`

and `numpy.any`

methods aggregate over the entire array.

## Importing NumPy

Before using NumPy aggregates, NumPy needs to be imported:

```
import numpy as np
```

The convention is to import NumPy with `np`

as the alias.

## Creating NumPy Arrays

The aggregates are applied to NumPy arrays. Let’s create a sample 1D array:

```
arr = np.array([5, 2, 9, 10, 15])
```

For multi-dimensional data, arrays of higher rank are used. For example:

```
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
```

The aggregates work consistently on arrays of any shape or size.

## NumPy Sum

The `np.sum`

method sums up all the elements in the array. For 1D arrays:

```
print(np.sum(arr))
# Output: 41
```

For 2D arrays, it sums along a particular axis:

```
print(np.sum(arr_2d, axis=0))
# Output: [5 7 9]
print(np.sum(arr_2d, axis=1))
# Output: [6 15]
```

The first computes column-wise sums while the second calculates row-wise sums.

We can also compute the total sum over all array elements:

```
print(np.sum(arr_2d))
# Output: 21
```

## NumPy Mean

The `np.mean`

aggregate calculates the arithmetic mean or average:

```
print(np.mean(arr))
# Output: 8.2
```

For 2D arrays:

```
print(np.mean(arr_2d, axis=0))
# Output: [2.5 3.5 4.5]
print(np.mean(arr_2d, axis=1))
# Output: [2. 5.]
```

This computes means across rows and columns.

The overall mean is given by:

```
print(np.mean(arr_2d))
# Output: 3.5
```

## NumPy Median

The median or middle value of the data is obtained using `np.median`

:

```
print(np.median(arr))
# Output: 9
```

For 2D arrays:

```
print(np.median(arr_2d, axis=0))
# Output: [2.5 3. 5.]
print(np.median(arr_2d, axis=1))
# Output: [2. 5.]
```

Medians along rows and columns are computed.

The overall median is:

```
print(np.median(arr_2d))
# Output: 3.5
```

## NumPy Minimum and Maximum

`np.min`

and `np.max`

return the minimum and maximum elements:

```
print(np.min(arr))
# Output: 2
print(np.max(arr))
# Output: 15
```

For 2D arrays:

```
print(np.min(arr_2d, axis=0))
# Output: [1 2 3]
print(np.max(arr_2d, axis=1))
# Output: [3 6]
```

Axis-wise minimums and maximums are computed.

The overall extrema are given by:

```
print(np.min(arr_2d))
# Output: 1
print(np.max(arr_2d))
# Output: 6
```

## NumPy Standard Deviation

The standard deviation using `np.std`

indicates how dispersed the data is:

```
print(np.std(arr))
# Output: 4.55
```

Applied to 2D arrays:

```
print(np.std(arr_2d, axis=0))
# Output: [1.73 1.73 1.73]
print(np.std(arr_2d, axis=1))
# Output: [0.82 1.73]
```

We get standard deviations for each column and row.

The overall standard deviation is:

```
print(np.std(arr_2d))
# Output: 1.73
```

By default, NumPy calculates the sample standard deviation. To compute the population standard deviation, we pass `ddof=0`

:

```
print(np.std(arr, ddof=0))
# Output: 5.16
```

## Weighted Aggregations

We can apply weighted aggregates by passing additional `weights`

parameters:

```
arr = np.array([1, 2, 3, 4])
weights = np.array([0.2, 0.3, 0.1, 0.4])
print(np.average(arr, weights=weights))
# Output: 2.8
```

This computes the weighted average. Other aggregates like sum, mean, std can also be weighted.

## Aggregates for Boolean Arrays

NumPy aggregates work element-wise on boolean arrays, treating `True`

as 1 and `False`

as 0:

```
bool_arr = np.array([True, False, True])
print(np.sum(bool_arr))
# Output: 2
print(np.mean(bool_arr))
# Output: 0.666
```

This allows aggregations directly on boolean masks.

## Accumulate Aggregates With `reduce`

The `np.ufunc.reduce`

method accumulates aggregates recursively:

```
arr = np.arange(5)
print(np.add.reduce(arr))
# Output: 10
```

This cumulatively sums the array. We can also accumulate products, mins, maxs etc.

## Comparison to Built-in sum() and min()/max()

NumPy aggregates are faster compared to built-in Python functions:

```
import time
arr_large = np.random.rand(1000000)
s = time.time()
res = sum(arr_large)
print(time.time() - s)
# Output: 0.08
s = time.time()
res = np.sum(arr_large)
print(time.time() - s)
# Output: 0.0001
```

So prefer using NumPy aggregations.

## Aggregations on Pandas Dataframes

Pandas dataframe columns can be aggregated via the `.agg()`

method by passing NumPy functions:

```
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df.agg(np.sum))
# Output:
# A 6
# B 15
```

NumPy aggregates thus integrate cleanly into a Pandas workflow.

## Conclusion

In this guide, we explored how to use NumPy aggregations including `np.sum`

, `np.mean`

, `np.median`

, `np.min`

, `np.max`

and `np.std`

on 1D and 2D arrays. We looked at their usage, axis-wise application, weighted aggregations and performance compared to Python built-ins. Finally, we saw how NumPy aggregates can be applied to Pandas dataframes.

NumPy aggregation methods are essential for summarizing and understanding your dataset. They condense arrays into useful statistics that form the basis for analysis and visualization. With the simple examples discussed here, you should be able to start applying NumPy aggregates to real-world data manipulation and exploration tasks.