Skip to content

NumPy: Creating Arrays with np.array(), shape, and dtype

Updated: at 03:55 AM

NumPy is a fundamental Python package for scientific computing and data analysis. It provides support for large, multi-dimensional arrays and matrices as well as a large library of high-level mathematical functions to operate on these arrays.

One of the core features of NumPy is its n-dimensional array object, or ndarray. The np.array() function is used to create arrays in NumPy, which provides significant advantages over Python’s built-in lists such as efficient storage, vectorized operations and broadcasting capabilities.

This comprehensive guide will examine how to create NumPy arrays using np.array(), understand the shape and dtype attributes of arrays, and leverage these tools to build effective data structures for data analysis and scientific applications. Code examples are provided to illustrate key concepts. By the end, you will have a solid grasp of how to generate and manipulate NumPy arrays.

Table of Contents

Open Table of Contents

Creating Arrays with np.array()

The np.array() function creates a new NumPy array from an existing sequence like a Python list or tuple. The basic syntax is:

import numpy as np

array = np.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

The parameters are:

Let’s look at some examples:

import numpy as np

# From list
mylist = [1, 2, 3]
arr = np.array(mylist)
print(arr)
# [1 2 3]

# From tuple
mytuple = (8, 9, 10)
arr = np.array(mytuple)
print(arr)
# [ 8  9 10]

# 2D array from list of lists
list2d = [[11, 12, 13], [21, 22, 23]]
arr = np.array(list2d)
print(arr)
# [[11 12 13]
#  [21 22 23]]

We can explicitly define the data type using the dtype parameter:

float_arr = np.array(mylist, dtype=np.float64)
print(float_arr)
# [1. 2. 3.]

bool_arr = np.array(mylist, dtype=bool)
print(bool_arr)
# [ True  True  True]

To create an array with a minimum number of dimensions, we can pass the ndmin argument:

arr = np.array([1, 2, 3], ndmin=5)
print(arr.shape)
# (1, 1, 1, 1, 3)

This creates a 5D array with shape (1, 1, 1, 1, 3) by prepending unspecifed dimensions.

In summary, np.array() provides a flexible way to generate new NumPy arrays from sequences. The dtype and number of dimensions can be explicitly defined.

Array Attributes: Shape and Dimension

NumPy arrays have attributes like shape and ndim that provide information about the number of elements and dimensions in the array.

The shape of an array is a tuple with each element representing the size of that dimension. For a 2D array with 3 rows and 4 columns, the shape attribute would be (3, 4).

The number of dimensions, ndim, is indicated by the length of the shape tuple. A 1D array has a shape of (n,) while a 2D array is (n, m).

Let’s see some examples of shape and dimension:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)
# (2, 3)

print(arr.ndim)
# 2

arr = np.array([1, 2, 3, 4, 5])
print(arr.shape)
# (5,)

print(arr.ndim)
# 1

We can also reshape an existing array to a new shape using arr.reshape():

arr = np.array([1, 2, 3, 4, 5, 6])
arr.reshape(3, 2)
# array([[1, 2],
#        [3, 4],
#        [5, 6]])

Understanding shape and dimension is crucial for performing subsequent mathematical operations on arrays. Many NumPy functions like slicing, iteration, stacking, etc. utilize the shape and ndim attributes.

Array dtype: Specifying Data Types

The data type or dtype of a NumPy array describes the type and size of its elements. It is specified when an array is created. If not explicitly defined, NumPy chooses a type based on the input data.

Some common data types are:

Let’s see examples of creating arrays with different data types:

import numpy as np

int_arr = np.array([1, 2, 3], dtype=np.int64)
float_arr = np.array([1.5, 2.1, 3.7], dtype=np.float64)
complex_arr = np.array([1+2j, 3-4j])
bool_arr = np.array([True, False, True])
str_arr = np.array(['Python', 'NumPy'], dtype=np.string_)
obj_arr = np.array([np.nan, 0, 1], dtype=np.object)

We can check an array’s dtype using the dtype attribute:

int_arr.dtype
# dtype('int64')

 np.issubdtype(int_arr.dtype, np.integer)
# True

Casting arrays from one dtype to another is done with arr.astype(<newtype>):

int_arr = int_arr.astype(np.float32)
int_arr.dtype
# dtype('float32')

It is crucial to set the appropriate data type when creating arrays to allocate enough memory and allow efficient computations. Operations may work incorrectly if array dtypes are incompatible.

Converting Data to Arrays

Real-world data for analysis is often stored in formats like CSV, JSON, Excel, SQL databases, etc. NumPy provides functions to import data from these sources into arrays:

# From CSV
arr = np.genfromtxt('data.csv', delimiter=',')

# From JSON
arr = np.array(json.loads(json_data))

# From SQL databases
cursor.execute(query)
arr = np.array(cursor.fetchall())

# From Excel
arr = np.array(pandas.read_excel('data.xlsx'))

For text or mixed data types, np.loadtxt() and np.genfromtxt() are useful. They convert data to homogeneous arrays by configuring the dtype, delimiter and skiprows parameters.

Dates can be converted to datetime64 arrays:

dates = ['2023-01-01', '2023-01-02']
arr = np.array(dates, dtype='datetime64')

In summary, many options exist to import real-world data into NumPy arrays for computation. Care should be taken to handle missing data, heterogeneous types and formatting issues.

Indexing, Slicing and Iterating Arrays

NumPy arrays support vectorized operations that apply functions to entire arrays. But elements can still be accessed individually using indexing and slicing syntax similar to Python lists:

arr = np.array([1, 2, 3, 4])

# Indexing
print(arr[0]) # 1

# Slicing
print(arr[1:3]) # [2 3]

# Iterate through array
for x in arr:
  print(x)

# 1
# 2
# 3
# 4

2D arrays are indexed using tuple notation arr[i, j]. Omitting indices retrieves entire rows or columns as 1D arrays:

arr2d = np.array([[1, 2, 3], [4, 5, 6]])

# First row
print(arr2d[0, :]) # [1 2 3]

# Second column
print(arr2d[:, 1]) # [2 5]

NumPy arrays provide efficient access to elements without needing to loop through each one, enabling fast vector computations.

Reshaping and Flattening Arrays

The shape of an array can be modified without changing the number of elements or data copied using reshape():

arr = np.array([1, 2, 3, 4, 5, 6])

arr.reshape(3, 2)
# [[1 2]
# [3 4]
# [5 6]]

Flattening converts a multidimensional array into a 1D array using flatten() or ravel():

arr = np.array([[1, 2], [3, 4]])

flattened = arr.flatten()
# [1 2 3 4]

flattened = arr.ravel()
# [1 2 3 4]

Reshaping and flattening enable modifying the structure of arrays for various computations while reusing the same underlying data.

Stack and Concatenate Arrays

NumPy provides functions like np.stack, np.vstack, np.hstack and np.concatenate to combine multiple arrays:

Stack: Join arrays along a new axis:

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

arr_stacked = np.stack((arr1, arr2))
# [[1 2 3]
# [4 5 6]]

Concatenate: Join arrays along an existing axis:

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])

arr_concat = np.concatenate((arr1, arr2), axis=0)
# [[1 2]
# [3 4]
# [5 6]]

Vertical Stack: Stack arrays vertically (along first axis):

v_stacked = np.vstack((arr1, arr2))
# [[1 2]
# [3 4]
# [5 6]]

Horizontal Stack: Stack arrays horizontally (along second axis):

h_stacked = np.hstack((arr1, arr2))
# [[1 2 5]
# [3 4 6]]

Stacking and concatenating arrays enable combining data from different sources into unified data structures.

Splitting Arrays

Large arrays can be split into smaller sub-arrays using np.split, np.hsplit, np.vsplit:

Split: Split array along specified axis and positions:

arr = np.array([1, 2, 3, 4, 5, 6])

split_arr = np.split(arr, [3, 5])
# [array([1, 2, 3]), array([4, 5]), array([6])]

Horizontal Split: Split array horizontally:

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

hsplit_arr = np.hsplit(arr, 2)
# [array([[1], [4], [7]]),
# array([[2, 3], [5, 6], [8, 9]])]

Vertical Split: Split array vertically:

vsplit_arr = np.vsplit(arr, 2)
# [array([[1, 2, 3]]),
# array([[4, 5, 6],
#        [7, 8, 9]])]

Splitting arrays is useful for dividing up data for parallel processing or storing parts separately.

Copies and Views

When operating on arrays, it is important to understand how NumPy handles memory allocation.

Copy: The original array data is copied to a new allocation:

arr = np.array([1, 2, 3])
arr_copy = arr.copy()
arr_copy[0] = 0

print(arr)
# [1 2 3]

print(arr_copy)
# [0 2 3]

View: A new array object references the same data in memory:

arr = np.array([1, 2, 3])
arr_view = arr.view()

arr_view[0] = 0
print(arr)
# [0 2 3]

Views can lead to unexpected changes in the original array. Generally, use .copy() to create arrays that won’t change the original.

Conclusion

In this guide, we looked at how to generate NumPy arrays from sequences using np.array(), understand array shape, dimension and dtype attributes, index and slice array elements, modify shapes via stacking/splitting/reshaping operations, and properly handle copies versus views.

NumPy’s fast n-dimensional arrays enable efficient vectorized computations. By leveraging tools like np.array(), shape/dtype properties and array transformations, we can build effective data structures for data analysis, scientific workloads and numeric programming.

The examples provided here illustrate the key aspects of NumPy arrays. For more advanced techniques, refer to the official NumPy documentation and other resources to continue enhancing your array programming skills.