Arrays
NumPy arrays are ubiquitous for representing numerical data. Like lists, arrays are mutable and can represent collections of objects. Unlike for lists, the elements of an array must all be of the same (typically numeric) type. In this section, we learn how to create and manipulate basic arrays. Throughout this book, we will assume that the NumPy package is loaded with the following statement:
import numpy as np
Creating Arrays
To construct a basic array (i.e., class np.ndarray
), we often use the function
np.array()
. Although many types
of objects can be passed, a list will often do, as follows:
= np.array([0.29, 0.55, -0.31, -0.84, 0.97]) x
The shape
attribute of the
np.ndarray
object is an integer
tuple representing the size (i.e., length) of each of its dimensions. For instance, the shape of the x
above is printed
with
print(x.shape)
This returns (5,)
, which indicates
the array has a single dimension, called an axis in NumPy, with size 5
.
In NumPy, 1D arrays are called vectors, 2D arrays are called matrices, and higher-dimensional arrays are called tensors. The mathematical objects with the same names (i.e., vectors, matrices, and tensors) are usually represented with arrays with corresponding names. A matrix can be created as follows:
= np.array([
A 0, 1, 2, 3], # First row
[4, 5, 6, 7], # Second row
[8, 9, 10, 11], # Third row
[ ])
So A.shape
is (3, 4)
and it represents a
We often need to create an array with a specific shape and populate it later. Perhaps the best way to do so is a function call like
= np.full(shape=(5, 3), fill_value=np.nan) T
This creates a (5, 3)
matrix with
the special nan
(i.e., not a
number) float
as each
element. Related functions np.zeros()
and np.ones()
can also be used to create
arrays of arbitrary shape filled with 0
and 1
values,
respectively. However, for an array that is to be populated
subsequently, we prefer np.full()
with nan
elements because it is
easier to notice if parts of the array have been mistakenly left
unpopulated.
The np.arange()
function is
similar to the built-in range()
function. An array of sequential numbers with integer spacing can be
easily created with the np.arange()
function. For instance,
=0, stop=10)
np.arange(start0, 10)
np.arange(=10)
np.arange(stop10) np.arange(
All these statements yield an array that prints as follows (printed arrays look like lists):
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Of course, the start
argument
is necessary for an array that starts at a value other than 0
. There is a
step
argument for np.arange()
, and it is useful for
integer steps other than the default 1
. However, for
non-integer steps, we prefer a different function altogether: np.linspace()
. For instance, the
following creates a 1D array of
=0, stop=3, num=11) np.linspace(start
This array can be printed to show the following:
[0., 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3.]
Note that by default stop
is the
last sample; endpoint=False
would exclude it.
We make extensive use of np.linspace()
and regular use of the
similar np.logspace()
, which
creates a 1D array logarithmically spaced between two powers of the base
provided. For instance, in the default base
=0, stop=3, num=6) np.logspace(start
This array can be printed to show the following:
[1., 3.98107171, 15.84893192, 63.09573445, 251.18864315, 1000.]
Most of the time, we use numeric values (i.e., dtype
s of int
, float
, complex
, and
bool
types) in Python arrays. It is also possible to create a Python object array with dtype
"O"
for object; for instance:
= np.array([{"foo": "bar"}, {"bar": "baz"}]) # An object array A
Printing the A.dtype
attribute reveals that it is type object
. It is occasionally advantageous to use
object arrays instead of lists, primarily for the convenience of NumPy’s
array manipulation capabilities.
Accessing, Slicing, and Assigning Elements
Array elements can be accessed via indices in the same way as with lists. For instance,
= np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) # 3x4
A 0, 0] # => 0
A[0, 3] # => 3
A[1, 0] # => 4
A[2, 3] # => 11
A[1] #=> [4, 5, 6, 7] A[
Similarly, array slicing has the same syntax as list slicing. For instance,
0:2] # => [[0, 1, 2, 3], [4, 5, 6, 7]] (view)
A[-1] # => [[0, 1, 2, 3], [4, 5, 6, 7]] (view)
A[:1] # => [1, 5, 9] (view)
A[:, 0:2] # => [[0, 1], [4, 5], [8, 9]] (view) A[:,
An important difference between list and array slicing is that,
whereas in list slicing the returned list is a copy of a
portion of the original list, in array slicing, the returned value is a
view of a portion of the original array. An
array view object, just as with dict
view
objects (see section 1.8), uses the same data as the
original object. Therefore, mutating a view object mutates its original
object and vice versa. For instance, using the same A
matrix from above,
= A[:, 0] # => [0, 4, 8] (first column view)
a 1] = 6 # Assign a new value to view element (second row)
a[0, 0] = 2 # Assign a new value to the original array
A[print(a)
print(A)
This prints
[2 6 8]
[[ 2 1 2 3]
[ 6 5 6 7]
[ 8 9 10 11]]
In other words, the data in A
and a
are the same data. To
create a copy instead of a view from a slice, simply append the copy()
method. For instance, the
following array b
is a copy of a
portion of A
, so its data are
independent:
= A[:, 0].copy() # => [0, 4, 8] (first column copy) b
It is often useful to find the indices of an array that meet some condition. Placing an array in a conditional statement returns Boolean an array of Boolean values for each element that can be used as an index for the array. For instance,
= np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) # 3x4
A print(A > 4)
prints
[[False, False, False, False]
[False, True, True, True]
[ True, True, True, True]]
This can be used as an index to select those elements that meet the condition. For instance,
> 4] A[A
returns those elements greater than 4
, as
follows:
[ 5, 6, 7, 8, 9, 10, 11]
The use of the Boolean-valued array resulting from the expression
A > 4
as an index is a type of advanced indexing
(i.e., slicing), which uses an array with data type Boolean or integer,
a non-tuple sequence, or a tuple with at least one sequence object.
Unlike basic slicing, which returns a view of the original array,
advanced indexing always returns a copy.
Because arrays are mutable, elements can be replaced just as with lists. For instance, the statements
0, 0] = 2
A[2] = 2 A[:,
mutate A
such that it now
prints as
[[ 2, 1, 2, 3],
[ 4, 5, 2, 7],
[ 8, 9, 2, 11]]
Combining the conditional indexing from above with assignment, we can
make assignments based on a condition. For instance, working with the
same A
array, we can coerce
values above
> 5] = 5 A[A
Now A
prints as
[[2, 1, 2, 3],
[4, 5, 2, 5],
[5, 5, 2, 5]]
Appending To and Concatenating Arrays
Appending an element to an array is possible with the np.append()
function (there is no append()
method), but its use in loops
is discouraged due to the fact that it creates a new copy of the array
at every call. However, in some cases it is just the right function, and
it works as shown in the following code:
= np.array([0, 1, 2])
a 3) # => [0, 1, 2, 3] np.append(a,
When needing to construct the elements of an array in a loop, it is
vastly more efficient to initialize the array with np.full()
or similar function (see subsection 3.1.1) before beginning the loop, using index
assignment. For instance,
= np.full((5,), np.nan) # Initialize with nans
a for i in range(0, len(a)):
if i == 0:
= 1
a[i] else:
= (a[i - 1] + 1) ** 2
a[i] print(f"It is {np.any(np.isnan(a))} there are nans in a:\n{a}")
prints
It is False there are nans in a:
[1.00000e+00 4.00000e+00 2.50000e+01 6.76000e+02 4.58329e+05]
The statement np.any(np.isnan(a))
is a nice idiom for detecting if any nan
s remain in the array. This is a good check
that we have in fact replaced all elements of the initialized array with
numbers.
Array concatenation is the ordered
collection of arrays. The np.concatenate()
function returns a
concatenation of arrays given as a tuple to its first argument. For
instance,
= np.array([[0, 1], [2, 3]]) # 2x2
a = np.array([[4, 5]]) # 1x2
b # => [[0, 1], [2, 3], [4, 5]] (3x2) np.concatenate((a, b))
The axis
optional argument,
0
by
default, determines the dimension along which the array concatenates.
For instance, with the same a
and
b
from above,
=0) # => [[0, 1], [2, 3], [4, 5]] (3x2)
np.concatenate((a, b), axis=1) # => [[0, 1, 4], [2, 3, 5]] (2x3) np.concatenate((a, b.T), axis
Here we have used the transpose array
attribute, which returns a view of the array with its axes swapped (see
subsection 3.2.1). The arrays to be concatenated must
have matching dimensions except in the axis
dimension.
numpy-basics
Online Resources for Section 3.1
No online resources.