NumPy is a fundamental Python package for scientific computing and data analysis. It provides support for large, multi-dimensional arrays and matrices as well as a large library of high-level mathematical functions to operate on these arrays.
One of the core features of NumPy is its n-dimensional array object, or ndarray. The np.array() function is used to create arrays in NumPy, which provides significant advantages over Python’s built-in lists such as efficient storage, vectorized operations and broadcasting capabilities.
This comprehensive guide will examine how to create NumPy arrays using np.array(), understand the shape and dtype attributes of arrays, and leverage these tools to build effective data structures for data analysis and scientific applications. Code examples are provided to illustrate key concepts. By the end, you will have a solid grasp of how to generate and manipulate NumPy arrays.
Table of Contents
Open Table of Contents
Creating Arrays with np.array()
The np.array() function creates a new NumPy array from an existing sequence like a Python list or tuple. The basic syntax is:
import numpy as np
array = np.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
The parameters are:
-
object (required): The sequence to convert to an array. This can be a list, tuple, another array or any other sequence type.
-
dtype: The data type of the array. By default, it is inferred from the input data. Some common types are
float
,int
,bool
etc. -
copy: Controls memory allocation. If True (default), the input data is copied. Otherwise, a copy will only be made if necessary.
-
order: Row-major (‘C’) or column-major (‘F’) order. Default is row-major.
-
subok: Returns a sub-class if passed a sub-class. Default is False.
-
ndmin: Specifies minimum number of dimensions. Unspecified dimensions are added to start.
Let’s look at some examples:
import numpy as np
# From list
mylist = [1, 2, 3]
arr = np.array(mylist)
print(arr)
# [1 2 3]
# From tuple
mytuple = (8, 9, 10)
arr = np.array(mytuple)
print(arr)
# [ 8 9 10]
# 2D array from list of lists
list2d = [[11, 12, 13], [21, 22, 23]]
arr = np.array(list2d)
print(arr)
# [[11 12 13]
# [21 22 23]]
We can explicitly define the data type using the dtype
parameter:
float_arr = np.array(mylist, dtype=np.float64)
print(float_arr)
# [1. 2. 3.]
bool_arr = np.array(mylist, dtype=bool)
print(bool_arr)
# [ True True True]
To create an array with a minimum number of dimensions, we can pass the ndmin
argument:
arr = np.array([1, 2, 3], ndmin=5)
print(arr.shape)
# (1, 1, 1, 1, 3)
This creates a 5D array with shape (1, 1, 1, 1, 3) by prepending unspecifed dimensions.
In summary, np.array()
provides a flexible way to generate new NumPy arrays from sequences. The dtype and number of dimensions can be explicitly defined.
Array Attributes: Shape and Dimension
NumPy arrays have attributes like shape
and ndim
that provide information about the number of elements and dimensions in the array.
The shape of an array is a tuple with each element representing the size of that dimension. For a 2D array with 3 rows and 4 columns, the shape attribute would be (3, 4)
.
The number of dimensions, ndim
, is indicated by the length of the shape tuple. A 1D array has a shape of (n,)
while a 2D array is (n, m)
.
Let’s see some examples of shape and dimension:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)
# (2, 3)
print(arr.ndim)
# 2
arr = np.array([1, 2, 3, 4, 5])
print(arr.shape)
# (5,)
print(arr.ndim)
# 1
We can also reshape an existing array to a new shape using arr.reshape()
:
arr = np.array([1, 2, 3, 4, 5, 6])
arr.reshape(3, 2)
# array([[1, 2],
# [3, 4],
# [5, 6]])
Understanding shape and dimension is crucial for performing subsequent mathematical operations on arrays. Many NumPy functions like slicing, iteration, stacking, etc. utilize the shape and ndim attributes.
Array dtype: Specifying Data Types
The data type or dtype
of a NumPy array describes the type and size of its elements. It is specified when an array is created. If not explicitly defined, NumPy chooses a type based on the input data.
Some common data types are:
-
int
- for integer values -
float
- for floating point values -
complex
- for complex numbers -
bool
- for Boolean values True/False -
object
- for Python objects -
string
- for strings -
datetime64
- for date & time values
Let’s see examples of creating arrays with different data types:
import numpy as np
int_arr = np.array([1, 2, 3], dtype=np.int64)
float_arr = np.array([1.5, 2.1, 3.7], dtype=np.float64)
complex_arr = np.array([1+2j, 3-4j])
bool_arr = np.array([True, False, True])
str_arr = np.array(['Python', 'NumPy'], dtype=np.string_)
obj_arr = np.array([np.nan, 0, 1], dtype=np.object)
We can check an array’s dtype using the dtype
attribute:
int_arr.dtype
# dtype('int64')
np.issubdtype(int_arr.dtype, np.integer)
# True
Casting arrays from one dtype to another is done with arr.astype(<newtype>)
:
int_arr = int_arr.astype(np.float32)
int_arr.dtype
# dtype('float32')
It is crucial to set the appropriate data type when creating arrays to allocate enough memory and allow efficient computations. Operations may work incorrectly if array dtypes are incompatible.
Converting Data to Arrays
Real-world data for analysis is often stored in formats like CSV, JSON, Excel, SQL databases, etc. NumPy provides functions to import data from these sources into arrays:
# From CSV
arr = np.genfromtxt('data.csv', delimiter=',')
# From JSON
arr = np.array(json.loads(json_data))
# From SQL databases
cursor.execute(query)
arr = np.array(cursor.fetchall())
# From Excel
arr = np.array(pandas.read_excel('data.xlsx'))
For text or mixed data types, np.loadtxt()
and np.genfromtxt()
are useful. They convert data to homogeneous arrays by configuring the dtype
, delimiter
and skiprows
parameters.
Dates can be converted to datetime64
arrays:
dates = ['2023-01-01', '2023-01-02']
arr = np.array(dates, dtype='datetime64')
In summary, many options exist to import real-world data into NumPy arrays for computation. Care should be taken to handle missing data, heterogeneous types and formatting issues.
Indexing, Slicing and Iterating Arrays
NumPy arrays support vectorized operations that apply functions to entire arrays. But elements can still be accessed individually using indexing and slicing syntax similar to Python lists:
arr = np.array([1, 2, 3, 4])
# Indexing
print(arr[0]) # 1
# Slicing
print(arr[1:3]) # [2 3]
# Iterate through array
for x in arr:
print(x)
# 1
# 2
# 3
# 4
2D arrays are indexed using tuple notation arr[i, j]
. Omitting indices retrieves entire rows or columns as 1D arrays:
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
# First row
print(arr2d[0, :]) # [1 2 3]
# Second column
print(arr2d[:, 1]) # [2 5]
NumPy arrays provide efficient access to elements without needing to loop through each one, enabling fast vector computations.
Reshaping and Flattening Arrays
The shape of an array can be modified without changing the number of elements or data copied using reshape()
:
arr = np.array([1, 2, 3, 4, 5, 6])
arr.reshape(3, 2)
# [[1 2]
# [3 4]
# [5 6]]
Flattening converts a multidimensional array into a 1D array using flatten()
or ravel()
:
arr = np.array([[1, 2], [3, 4]])
flattened = arr.flatten()
# [1 2 3 4]
flattened = arr.ravel()
# [1 2 3 4]
Reshaping and flattening enable modifying the structure of arrays for various computations while reusing the same underlying data.
Stack and Concatenate Arrays
NumPy provides functions like np.stack
, np.vstack
, np.hstack
and np.concatenate
to combine multiple arrays:
Stack: Join arrays along a new axis:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr_stacked = np.stack((arr1, arr2))
# [[1 2 3]
# [4 5 6]]
Concatenate: Join arrays along an existing axis:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
arr_concat = np.concatenate((arr1, arr2), axis=0)
# [[1 2]
# [3 4]
# [5 6]]
Vertical Stack: Stack arrays vertically (along first axis):
v_stacked = np.vstack((arr1, arr2))
# [[1 2]
# [3 4]
# [5 6]]
Horizontal Stack: Stack arrays horizontally (along second axis):
h_stacked = np.hstack((arr1, arr2))
# [[1 2 5]
# [3 4 6]]
Stacking and concatenating arrays enable combining data from different sources into unified data structures.
Splitting Arrays
Large arrays can be split into smaller sub-arrays using np.split
, np.hsplit
, np.vsplit
:
Split: Split array along specified axis and positions:
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, [3, 5])
# [array([1, 2, 3]), array([4, 5]), array([6])]
Horizontal Split: Split array horizontally:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
hsplit_arr = np.hsplit(arr, 2)
# [array([[1], [4], [7]]),
# array([[2, 3], [5, 6], [8, 9]])]
Vertical Split: Split array vertically:
vsplit_arr = np.vsplit(arr, 2)
# [array([[1, 2, 3]]),
# array([[4, 5, 6],
# [7, 8, 9]])]
Splitting arrays is useful for dividing up data for parallel processing or storing parts separately.
Copies and Views
When operating on arrays, it is important to understand how NumPy handles memory allocation.
Copy: The original array data is copied to a new allocation:
arr = np.array([1, 2, 3])
arr_copy = arr.copy()
arr_copy[0] = 0
print(arr)
# [1 2 3]
print(arr_copy)
# [0 2 3]
View: A new array object references the same data in memory:
arr = np.array([1, 2, 3])
arr_view = arr.view()
arr_view[0] = 0
print(arr)
# [0 2 3]
Views can lead to unexpected changes in the original array. Generally, use .copy()
to create arrays that won’t change the original.
Conclusion
In this guide, we looked at how to generate NumPy arrays from sequences using np.array()
, understand array shape, dimension and dtype
attributes, index and slice array elements, modify shapes via stacking/splitting/reshaping operations, and properly handle copies versus views.
NumPy’s fast n-dimensional arrays enable efficient vectorized computations. By leveraging tools like np.array()
, shape/dtype properties and array transformations, we can build effective data structures for data analysis, scientific workloads and numeric programming.
The examples provided here illustrate the key aspects of NumPy arrays. For more advanced techniques, refer to the official NumPy documentation and other resources to continue enhancing your array programming skills.