NumPy is a fundamental Python package for scientific computing and data analysis. It provides support for large, multi-dimensional arrays and matrices as well as a large library of high-level mathematical functions to operate on these arrays.

One of the core features of NumPy is its n-dimensional array object, or ndarray. The np.array() function is used to create arrays in NumPy, which provides significant advantages over Python’s built-in lists such as efficient storage, vectorized operations and broadcasting capabilities.

This comprehensive guide will examine how to create NumPy arrays using np.array(), understand the shape and dtype attributes of arrays, and leverage these tools to build effective data structures for data analysis and scientific applications. Code examples are provided to illustrate key concepts. By the end, you will have a solid grasp of how to generate and manipulate NumPy arrays.

## Table of Contents

## Open Table of Contents

## Creating Arrays with np.array()

The np.array() function creates a new NumPy array from an existing sequence like a Python list or tuple. The basic syntax is:

```
import numpy as np
array = np.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
```

The parameters are:

**object**(required): The sequence to convert to an array. This can be a list, tuple, another array or any other sequence type.**dtype**: The data type of the array. By default, it is inferred from the input data. Some common types are`float`

,`int`

,`bool`

etc.**copy**: Controls memory allocation. If True (default), the input data is copied. Otherwise, a copy will only be made if necessary.**order**: Row-major (‘C’) or column-major (‘F’) order. Default is row-major.**subok**: Returns a sub-class if passed a sub-class. Default is False.**ndmin**: Specifies minimum number of dimensions. Unspecified dimensions are added to start.

Let’s look at some examples:

```
import numpy as np
# From list
mylist = [1, 2, 3]
arr = np.array(mylist)
print(arr)
# [1 2 3]
# From tuple
mytuple = (8, 9, 10)
arr = np.array(mytuple)
print(arr)
# [ 8 9 10]
# 2D array from list of lists
list2d = [[11, 12, 13], [21, 22, 23]]
arr = np.array(list2d)
print(arr)
# [[11 12 13]
# [21 22 23]]
```

We can explicitly define the data type using the `dtype`

parameter:

```
float_arr = np.array(mylist, dtype=np.float64)
print(float_arr)
# [1. 2. 3.]
bool_arr = np.array(mylist, dtype=bool)
print(bool_arr)
# [ True True True]
```

To create an array with a minimum number of dimensions, we can pass the `ndmin`

argument:

```
arr = np.array([1, 2, 3], ndmin=5)
print(arr.shape)
# (1, 1, 1, 1, 3)
```

This creates a 5D array with shape (1, 1, 1, 1, 3) by prepending unspecifed dimensions.

In summary, `np.array()`

provides a flexible way to generate new NumPy arrays from sequences. The dtype and number of dimensions can be explicitly defined.

## Array Attributes: Shape and Dimension

NumPy arrays have attributes like `shape`

and `ndim`

that provide information about the number of elements and dimensions in the array.

The **shape** of an array is a tuple with each element representing the size of that dimension. For a 2D array with 3 rows and 4 columns, the shape attribute would be `(3, 4)`

.

The **number of dimensions**, `ndim`

, is indicated by the length of the shape tuple. A 1D array has a shape of `(n,)`

while a 2D array is `(n, m)`

.

Let’s see some examples of shape and dimension:

```
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)
# (2, 3)
print(arr.ndim)
# 2
arr = np.array([1, 2, 3, 4, 5])
print(arr.shape)
# (5,)
print(arr.ndim)
# 1
```

We can also reshape an existing array to a new shape using `arr.reshape()`

:

```
arr = np.array([1, 2, 3, 4, 5, 6])
arr.reshape(3, 2)
# array([[1, 2],
# [3, 4],
# [5, 6]])
```

Understanding shape and dimension is crucial for performing subsequent mathematical operations on arrays. Many NumPy functions like slicing, iteration, stacking, etc. utilize the shape and ndim attributes.

## Array dtype: Specifying Data Types

The **data type** or `dtype`

of a NumPy array describes the type and size of its elements. It is specified when an array is created. If not explicitly defined, NumPy chooses a type based on the input data.

Some common data types are:

`int`

- for integer values`float`

- for floating point values`complex`

- for complex numbers`bool`

- for Boolean values True/False`object`

- for Python objects`string`

- for strings`datetime64`

- for date & time values

Let’s see examples of creating arrays with different data types:

```
import numpy as np
int_arr = np.array([1, 2, 3], dtype=np.int64)
float_arr = np.array([1.5, 2.1, 3.7], dtype=np.float64)
complex_arr = np.array([1+2j, 3-4j])
bool_arr = np.array([True, False, True])
str_arr = np.array(['Python', 'NumPy'], dtype=np.string_)
obj_arr = np.array([np.nan, 0, 1], dtype=np.object)
```

We can check an array’s dtype using the `dtype`

attribute:

```
int_arr.dtype
# dtype('int64')
np.issubdtype(int_arr.dtype, np.integer)
# True
```

Casting arrays from one dtype to another is done with `arr.astype(<newtype>)`

:

```
int_arr = int_arr.astype(np.float32)
int_arr.dtype
# dtype('float32')
```

It is crucial to set the appropriate data type when creating arrays to allocate enough memory and allow efficient computations. Operations may work incorrectly if array dtypes are incompatible.

## Converting Data to Arrays

Real-world data for analysis is often stored in formats like CSV, JSON, Excel, SQL databases, etc. NumPy provides functions to import data from these sources into arrays:

```
# From CSV
arr = np.genfromtxt('data.csv', delimiter=',')
# From JSON
arr = np.array(json.loads(json_data))
# From SQL databases
cursor.execute(query)
arr = np.array(cursor.fetchall())
# From Excel
arr = np.array(pandas.read_excel('data.xlsx'))
```

For text or mixed data types, `np.loadtxt()`

and `np.genfromtxt()`

are useful. They convert data to homogeneous arrays by configuring the `dtype`

, `delimiter`

and `skiprows`

parameters.

Dates can be converted to `datetime64`

arrays:

```
dates = ['2023-01-01', '2023-01-02']
arr = np.array(dates, dtype='datetime64')
```

In summary, many options exist to import real-world data into NumPy arrays for computation. Care should be taken to handle missing data, heterogeneous types and formatting issues.

## Indexing, Slicing and Iterating Arrays

NumPy arrays support vectorized operations that apply functions to entire arrays. But elements can still be accessed individually using indexing and slicing syntax similar to Python lists:

```
arr = np.array([1, 2, 3, 4])
# Indexing
print(arr[0]) # 1
# Slicing
print(arr[1:3]) # [2 3]
# Iterate through array
for x in arr:
print(x)
# 1
# 2
# 3
# 4
```

2D arrays are indexed using tuple notation `arr[i, j]`

. Omitting indices retrieves entire rows or columns as 1D arrays:

```
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
# First row
print(arr2d[0, :]) # [1 2 3]
# Second column
print(arr2d[:, 1]) # [2 5]
```

NumPy arrays provide efficient access to elements without needing to loop through each one, enabling fast vector computations.

## Reshaping and Flattening Arrays

The shape of an array can be modified without changing the number of elements or data copied using `reshape()`

:

```
arr = np.array([1, 2, 3, 4, 5, 6])
arr.reshape(3, 2)
# [[1 2]
# [3 4]
# [5 6]]
```

Flattening converts a multidimensional array into a 1D array using `flatten()`

or `ravel()`

:

```
arr = np.array([[1, 2], [3, 4]])
flattened = arr.flatten()
# [1 2 3 4]
flattened = arr.ravel()
# [1 2 3 4]
```

Reshaping and flattening enable modifying the structure of arrays for various computations while reusing the same underlying data.

## Stack and Concatenate Arrays

NumPy provides functions like `np.stack`

, `np.vstack`

, `np.hstack`

and `np.concatenate`

to combine multiple arrays:

**Stack:** Join arrays along a new axis:

```
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr_stacked = np.stack((arr1, arr2))
# [[1 2 3]
# [4 5 6]]
```

**Concatenate:** Join arrays along an existing axis:

```
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
arr_concat = np.concatenate((arr1, arr2), axis=0)
# [[1 2]
# [3 4]
# [5 6]]
```

**Vertical Stack:** Stack arrays vertically (along first axis):

```
v_stacked = np.vstack((arr1, arr2))
# [[1 2]
# [3 4]
# [5 6]]
```

**Horizontal Stack:** Stack arrays horizontally (along second axis):

```
h_stacked = np.hstack((arr1, arr2))
# [[1 2 5]
# [3 4 6]]
```

Stacking and concatenating arrays enable combining data from different sources into unified data structures.

## Splitting Arrays

Large arrays can be split into smaller sub-arrays using `np.split`

, `np.hsplit`

, `np.vsplit`

:

**Split:** Split array along specified axis and positions:

```
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, [3, 5])
# [array([1, 2, 3]), array([4, 5]), array([6])]
```

**Horizontal Split:** Split array horizontally:

```
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
hsplit_arr = np.hsplit(arr, 2)
# [array([[1], [4], [7]]),
# array([[2, 3], [5, 6], [8, 9]])]
```

**Vertical Split:** Split array vertically:

```
vsplit_arr = np.vsplit(arr, 2)
# [array([[1, 2, 3]]),
# array([[4, 5, 6],
# [7, 8, 9]])]
```

Splitting arrays is useful for dividing up data for parallel processing or storing parts separately.

## Copies and Views

When operating on arrays, it is important to understand how NumPy handles memory allocation.

**Copy:** The original array data is copied to a new allocation:

```
arr = np.array([1, 2, 3])
arr_copy = arr.copy()
arr_copy[0] = 0
print(arr)
# [1 2 3]
print(arr_copy)
# [0 2 3]
```

**View:** A new array object references the same data in memory:

```
arr = np.array([1, 2, 3])
arr_view = arr.view()
arr_view[0] = 0
print(arr)
# [0 2 3]
```

Views can lead to unexpected changes in the original array. Generally, use `.copy()`

to create arrays that won’t change the original.

## Conclusion

In this guide, we looked at how to generate NumPy arrays from sequences using `np.array()`

, understand array shape, dimension and `dtype`

attributes, index and slice array elements, modify shapes via stacking/splitting/reshaping operations, and properly handle copies versus views.

NumPy’s fast n-dimensional arrays enable efficient vectorized computations. By leveraging tools like `np.array()`

, shape/dtype properties and array transformations, we can build effective data structures for data analysis, scientific workloads and numeric programming.

The examples provided here illustrate the key aspects of NumPy arrays. For more advanced techniques, refer to the official NumPy documentation and other resources to continue enhancing your array programming skills.