共计 11543 个字符,预计需要花费 29 分钟才能阅读完成。
School of Engineering
Laboratory Session 2
Course : Diploma in Engineering with Business
Module : EGM271 Statistics & Data Analytics
Title : The NumPy Library (Part 1)
What is NumPy?
NumPy is a basic package for scientific computing with Python and especially for data
analysis. In fact, this library is the basis of a large amount of mathematical and scientific
Python packages, and among them, as you will see later in the book, the pandas library.
This library, specialized for data analysis, is fully developed using the concepts introduced
by NumPy. In fact, the built-in tools provided by the standard Python library could be too
simple or inadequate for most of the calculations in data analysis. Having knowledge of
the NumPy library is important to being able to use all scientific Python packages, and
particularly, to use and understand the pandas library.
Note that throughout all the lab sheets in this module, we are using Spyder:
- Lines with“In:”are the codes you type in the console. You can run them
simply by pressing“Enter”. - Lines with“Out:”are the output shown in the console.
- Lines without“In”or“Out”are the codes you type in the editor. You need to
click the Run button to run it.
To start using Numpy, type this in your Spyder editor and run the program:
import numpy as np
Ndarray: The Heart of the Library
The NumPy library is based on one main object: ndarray (which stands for N-dimensional
array). This object is a multidimensional homogeneous array with a predetermined
number of items:
• Homogeneous because virtually all the items in it are of the same type and the
same size. In fact, the data type is specified by another NumPy object called dtype
(data-type); each ndarray is associated with only one type of dtype.
• The number of the dimensions and items in an array is defined by its shape, a
tuple of N-positive integers that specifies the size for each dimension. The
dimensions are defined as axes and the number of axes as rank.
• Another peculiarity of NumPy arrays is that their size is fixed, that is, once you
define their size at the time of creation, it remains unchanged. This behaviour is
different from Python lists, which can grow or shrink in size.
The easiest way to define a new ndarray is to use the array() function, passing a Python
list containing the elements to be included in it as an argument.
In: a = np.array([1, 2, 3])
In: a
Out: array([1, 2, 3])
You can easily check that a newly created object is an ndarray by passing the new
variable to the type() function.
In: type(a)
Out: <type ‘numpy.ndarray’>
You can also refer to the Variable Explorer for information about your array:
The name is a, data type is int32. The Size (3,) indicates that it is of rank 1 (1 row), and
size 3 (3 columns).
What you have just seen is the simplest case of a one-dimensional array. But the use of
arrays can be easily extended to several dimensions. For example, if you define a twodimensional
array 2×2:
In: b = np.array([[1.3, 2.4],[0.3, 4.1]])
This array has rank 2, since it has two rows, each of length 2.
Types of Data
So far you have seen only simple integer and float numeric values, but NumPy arrays are
designed to contain a wide variety of data types. For example, you can use the data type
string:
In: g = np.array([[‘a’, ‘b’],[‘c’, ‘d’]])
In: g
Out: array([[‘a’, ‘b’],[‘c’, ‘d’]], dtype='<U1′)
You can search in the Internet for the types of data supported by NumPy, but in this
module, most of the time we only use integer, float and string.
The dtype Option
Each ndarray object is associated with a dtype object that uniquely defines the type of
data that will occupy each item in the array. By default, the array() function can associate
the most suitable type according to the values contained in the sequence of lists or tuples.
You can explicitly define the data type using the dtype option as argument of the function.
For example, if you want to define an array with complex values, you can use the dtype
option as follows:
In: f = np.array([[1, 2, 3],[4, 5, 6]], dtype=complex)
In: f
Out: array([[1.+0.j, 2.+0.j, 3.+0.j],
[4.+0.j, 5.+0.j, 6.+0.j]])
Intrinsic Creation of an Array
The NumPy library provides a set of functions that generate ndarrays with initial content,
created with different values depending on the function. They allow a single line of code
to generate large amounts of data.
The zeros() function, for example, creates a full array of zeros with dimensions defined
by the shape argument. For example, to create a two-dimensional array 3×3, you can use:
In: np.zeros((3, 3))
Out: array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
While the ones() function creates an array full of ones in a very similar way.
In: np.ones((3, 3))
Out: array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
A feature that will be particularly useful is arange(). This function generates NumPy arrays
with numerical sequences that respond to particular rules depending on the passed
arguments. Some examples are shown below:
In: np.arange(0, 10)
Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In: np.arange(4, 10)
Out: array([4, 5, 6, 7, 8, 9])
In: np.arange(0, 12, 3)
Out: array([0, 3, 6, 9])
In: np.arange(0, 6, 0.6)
Out: array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4])
By looking at the examples above, can you explain what does each of the arguments in
arange() mean?
To generate two-dimensional arrays you can still continue to use the arange() function
but combined with the reshape() function. This function divides a linear array in different
parts in the manner specified by the shape argument. For example, to generate a 3×4 2D
array:
In: np.arange(0, 12).reshape(3, 4)
Out: array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
Another function very similar to arange() is linspace(). This function still takes as its first
two arguments the initial and end values of the sequence, but the third argument, instead
of specifying the distance between one element and the next, defines the number of
elements into which we want the interval to be split.
In: np.linspace(0,10,5)
Out: array([0. , 2.5, 5. , 7.5, 10.])
Finally, another method to obtain arrays already containing values is to fill them with
random values. This is possible using the random() function of the numpy.random module.
This function will generate an array with many elements as specified in the argument.
In: np.random.random(3)
Out: array([0.63016916, 0.65787052, 0.15795233])
In: np.random.random((3,3))
Out:
array([[0.32913055, 0.84398299, 0.97548948],
[0.46833406, 0.63764554, 0.38420612],
[0.4752704 , 0.90156781, 0.57395037]])
Indexing
Array indexing always uses square brackets ([]) to index the elements of the array so
that the elements can then be referred individually for various, uses such as extracting a
value, selecting items, or even assigning a new value. When you create a new array, an
appropriate scale index is also automatically created (see Fig. 1).
Fig. 1: Indexing a 1d ndarray
In order to access a single element of an array, you can refer to its index.
In: a = np.arange(10, 16)
In: a
Out: array([10, 11, 12, 13, 14, 15])
In: a[4]
Out: 14
In: a[-1]
Out: 15
In: a[0]
Out: 10
In: a[-6]
Out: 10
To select multiple items at once, you can pass array of indexes in square brackets.
In: a[[1, 3, 4]]
Out: array([11, 13, 14])
Moving on to the two-dimensional case, namely the matrices, they are represented as
rectangular arrays consisting of rows and columns. Indexing in this case is represented
by a pair of values: the first value is the index of the row and the second is the index of
the column. Therefore, if you want to access the values or select elements in the matrix,
you will still use square brackets, but this time there are two values [row index, column
index] (see Fig. 2).
Fig. 2: Indexing a 2d array
In: A = np.arange(10, 19).reshape((3, 3))
In: A
Out: array([[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
In: A[1, 2]
Out: 15
Slicing
Slicing allows you to extract portions of an array to generate new arrays. Depending on
the portion of the array that you want to extract, you must use the slice syntax; that is,
you will use a sequence of numbers separated by colons (:) within square brackets. If you
want to extract a portion of the array, for example one that goes from the second to the
sixth element, you have to insert the index of the starting element, that is 1, and the index
of the final element, that is 5, separated by (:).
In: a = np.arange(10, 16)
In: a
Out: array([10, 11, 12, 13, 14, 15])
In: a[1:5]
Out: array([11, 12, 13, 14])
You can use a third number that defines the gap in the sequence of the elements. For
example, with a value of 2, the array will take the elements in an alternating fashion.
In: a[1:5:2]
Out: array([11, 13])
If you omit the first number, NumPy implicitly interprets this number as 0 (i.e., the initial
element of the array). If you omit the second number, this will be interpreted as the
maximum index of the array; and if you omit the last number this will be interpreted as 1.
All the elements will be considered without intervals.
In: a[::2]
Out: array([10, 12, 14])
In: a[:5:2]
Out: array([10, 12, 14])
In: a[:5:]
Out: array([10, 11, 12, 13, 14])
In the case of a two-dimensional array, the slicing syntax still applies, but it is separately
defined for the rows and columns. For example, if you want to extract only the first row:
In: A = np.arange(10, 19).reshape((3, 3))
In: A
Out: array([[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
In: A[0,:]
Out: array([10, 11, 12])
As you can see in the second index, if you leave only the colon without defining a number,
you will select all the columns. Instead, if you want to extract all the values of the first
column, you have to write the inverse.
In: A[:,0]
Out: array([10, 13, 16])
Instead, if you want to extract a smaller matrix, you need to explicitly define all intervals
with indexes that define them.
In: A[0:2, 0:2]
Out: array([[10, 11],
[13, 14]])
If the indexes of the rows or columns to be extracted are not contiguous, you can specify
an array of indexes.
In: A[[0,2], 0:2]
Out: array([[10, 11],
[16, 17]])
Iterating an Array
To iterate the elements in a 1d array, we just need to use the for construct:
import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
print(x)
To iterate the elements in a 2d array, we use the nested for construct:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
for y in x:
print(y)
If you want to launch an aggregate function that returns a value calculated for every single
column or on every single row, there is an optimal way that leaves it to NumPy to manage
the iteration: the apply_along_axis() function. This function takes three arguments: the
aggregate function, the axis on which to apply the iteration, and the array. If the option
axis equals 0, then the iteration evaluates the elements column by column, whereas if
axis equals 1 then the iteration evaluates the elements row by row. For example, you can
calculate the average values first by column and then by row:
In: A = np.arange(10, 19).reshape((3, 3))
In: A
Out: array([[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
In: np.apply_along_axis(np.mean, axis=0, arr=A)
Out: array([13., 14., 15.])
In: np.apply_along_axis(np.mean, axis=1, arr=A)
Out: array([11., 14., 17.])
Instead of using the built-in NumPy functions, like np.mean, apply_along_axis() function
also accept user-defined functions.
In: def foo(x):
return x/2
In: np.apply_along_axis(foo, axis=1, arr=A)
Out: array([[5. , 5.5, 6.],
[6.5, 7. , 7.5],
[8. , 8.5, 9.]])
Exercise
a) Write a NumPy program to convert a list of numeric value into a one-dimensional
NumPy array.
Expected Output:
Out: Original List: [12.23, 13.32, 100, 36.32]
Out: One-dimensional NumPy array: [12.23 13.32 100. 36.32]
b) Write a NumPy program to create a 3×3 matrix with values ranging from 2 to 10.
c) Write a NumPy program to create a null vector of size 10 and update sixth value
to 11.
d) Write a NumPy program to create an array with values ranging from 12 to 37, then
reverse the array (first element becomes last).
Expected Output:
Original array:
[12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 - 34 35 36 37]
Reverse array:
[37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 - 15 14 13 12]
e) Write a NumPy program to create a 5×5 array with 1 on the border and 0 inside.
Expected Output:
Original array:
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]] - on the border and 0 inside in the array
[[1. 1. 1. 1. 1.]
[1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0.]]
f) Write a NumPy program to append values to the end of an array.
Expected Output:
Original array:
[10, 20, 30]
After append values to the end of the array:
[10 20 30 40 50 60 70 80 90]
g) Create a 5X2 integer array from a range between 100 to 200 such that the
difference between each element is 10.
Expected Output:
Creating 5X2 array using np.arange
[[100 110]
[120 130]
[140 150]
[160 170]
[180 190]]
h) Print the array of items in the third column from all rows of the input array.
Expected Output:
Printing Input Array
[[11 22 33]
[44 55 66]
[77 88 99]]
Printing array of items in the third column from all rows
[33 66 99]
i) Return array of odd rows and even columns from below NumPy array.
Expected Output:
Printing Input Array
[[3 6 9 12]
[15 18 21 24]
[27 30 33 36]
[39 42 45 48]
[51 54 57 60]]
Printing array of odd rows and even columns
[[6 12]
[30 36]
[54 60]]
j) Create an 8X3 integer array from a range between 10 to 34 such that the difference
between each element is 1 and then Split the array into four equal-sized sub-arrays.
(Hint: Make use of the function, np.split)
Expected Output:
Creating 8X3 array using numpy.arange
[[10 11 12]
[13 14 15]
[16 17 18]
[19 20 21]
[22 23 24]
[25 26 27]
[28 29 30]
[31 32 33]]
Dividing 8X3 array into 4 sub array
[array([[10, 11, 12],[13, 14, 15]]),
array([[16, 17, 18],[19, 20, 21]]),
array([[22, 23, 24],[25, 26, 27]]),
array([[28, 29, 30],[31, 32, 33]])]
End of Lab 2
WX:codehelp