关于python:NumPy之数据类型对象dtype

63次阅读

共计 6274 个字符，预计需要花费 16 分钟才能阅读完成。

之前讲到了 NumPy 中有多种数据类型，每种数据类型都是一个 dtype(numpy.dtype)对象。明天咱们来具体解说一下 dtype 对象。

先看下 dtype 办法的定义：

class numpy.dtype(obj, align=False, copy=False)

其作用就是将对象 obj 转成 dtype 类型的对象。

它带了两个可选的参数:

align – 是否依照 C 编译器的构造体输入格局对齐对象。
Copy – 是拷贝对象，还是对对象的援用。

dtype 能够用来形容数据的类型（int，float，Python 对象等），形容数据的大小，数据的字节程序（小端或大端）等。

可转换的 obj 对象能够有很多种类型，咱们一一来进行解说

如果 obj 对象自身就是一个 dtype 对象，那么能够进行无缝转换。

不传的话，默认就是float_，这也是为什么咱们创立数组默认都是 float 类型的起因。

内置的数组标量能够被转换成为相干的 data-type 对象。

后面一篇文章咱们讲到了什么是数组标量类型。数组标量类型是能够通过 np.type 来拜访的数据类型。比方：np.int32, np.complex128等。

咱们看下数组标量的转换：

In [85]: np.dtype(np.int32)
Out[85]: dtype('int32')

In [86]: np.dtype(np.complex128)
Out[86]: dtype('complex128')

这些以 np 结尾的内置数组标量类型能够参考我之前写的文章“NumPy 之: 数据类型”。

留神，数组标量并不是 dtype 对象，尽管很多状况下，能够在须要应用 dtype 对象的时候都能够应用数组标量。

一些通用类型对象，能够被转换成为相应的 dtype 类型：

通用类型对象	dtype 类型
`number`, `inexact`, `floating`	float
`complexfloating`	`cfloat`
`integer`, `signedinteger`	`int_`
`unsignedinteger`	`uint`
`character`	`string`
`generic`, `flexible`	`void`

一些 Python 内置的类型和数组标量类型是等价的，也能够被转换成为 dtype：

Python 类型	dtype 类型
int	`int_`
bool	`bool_`
float	`float_`
complex	`cfloat`
bytes	`bytes_`
str	`str_`
buffer	`void`
(all others)	`object_`

看下内置 Python 类型转换的例子：

In [82]: np.dtype(float)
Out[82]: dtype('float64')

In [83]: np.dtype(int)
Out[83]: dtype('int64')

In [84]:  np.dtype(object)
Out[84]: dtype('O')

任何 type 对象只有蕴含 dtype 属性，并且这个属性属于能够转换的范畴的话，都能够被转换成为 dtype。

对于每个内置的数据类型来说都有一个和它对应的字符编码，咱们也能够应用这些字符编码来进行转换：

In [134]: np.dtype('b')  # byte, native byte order
Out[134]: dtype('int8')

In [135]: np.dtype('>H')  # big-endian unsigned short
Out[135]: dtype('>u2')

In [136]: np.dtype('<f') # little-endian single-precision float
Out[136]: dtype('float32')

In [137]: np.dtype('d') # double-precision floating-point number
Out[137]: dtype('float64')

Numpy 中数组类型的对象有一个属性叫做typestr。

typestr 形容了这个数组中寄存的数据类型和长度。

typestr 由三局部组成，第一局部是形容数据字节程序：< 小端 > 大端。

第二局部是数组外面元素的根本类型：

类型	形容
`t`	Bit field (following integer gives the number of bits in the bit field).
`b`	Boolean (integer type where all values are only True or False)
`i`	Integer
`u`	Unsigned integer
`f`	Floating point
`c`	Complex floating point
`m`	Timedelta
`M`	Datetime
`O`	Object (i.e. the memory contains a pointer to PyObject)
`S`	String (fixed-length sequence of char)
`U`	Unicode (fixed-length sequence of Py_UNICODE)
`V`	Other (void * – each item is a fixed-size chunk of memory)

最初一部分就是数据的长度。

dtype 反对上面几种类型的转换：

类型	形容
`'?'`	boolean
`'b'`	(signed) byte
`'B'`	unsigned byte
`'i'`	(signed) integer
`'u'`	unsigned integer
`'f'`	floating-point
`'c'`	complex-floating point
`'m'`	timedelta
`'M'`	datetime
`'O'`	(Python) objects
`'S'`, `'a'`	zero-terminated bytes (not recommended)
`'U'`	Unicode string
`'V'`	raw data (`void`)

咱们看几个例子：

In [137]: np.dtype('d')
Out[137]: dtype('float64')

In [138]: np.dtype('i4')
Out[138]: dtype('int32')

In [139]: np.dtype('f8')
Out[139]: dtype('float64')

In [140]:  np.dtype('c16')
Out[140]: dtype('complex128')

In [141]: np.dtype('a25')
Out[141]: dtype('S25')

In [142]: np.dtype('U25')
Out[142]: dtype('<U25')

逗号宰割的字符串能够用来示意结构化的数据类型。

对于这种结构化的数据类型也能够转换成为 dtpye 格局，转换后的 dtype，将会以 f1，f2, … fn- 1 作为名字来保留对应的格局数据。咱们举个例子：

In [143]: np.dtype("i4, (2,3)f8, f4")
Out[143]: dtype([('f0', '<i4'), ('f1', '<f8', (2, 3)), ('f2', '<f4')])

下面的例子中，f0 保留的是 32 位的整数，f1 保留的是 2 x 3 数组的 64-bit 浮点数。f2 是一个 32-bit 的浮点数。

再看另外一个例子：

In [144]: np.dtype("a3, 3u8, (3,4)a10")
Out[144]: dtype([('f0', 'S3'), ('f1', '<u8', (3,)), ('f2', 'S10', (3, 4))])

所有在 numpy.sctypeDict.keys() 中的字符，都能够被转换为 dtype：

In [146]: np.sctypeDict.keys()
Out[146]: dict_keys(['?', 0, 'byte', 'b', 1, 'ubyte', 'B', 2, 'short', 'h', 3, 'ushort', 'H', 4, 'i', 5, 'uint', 'I', 6, 'intp', 'p', 7, 'uintp', 'P', 8, 'long', 'l', 'L', 'longlong', 'q', 9, 'ulonglong', 'Q', 10, 'half', 'e', 23, 'f', 11, 'double', 'd', 12, 'longdouble', 'g', 13, 'cfloat', 'F', 14, 'cdouble', 'D', 15, 'clongdouble', 'G', 16, 'O', 17, 'S', 18, 'unicode', 'U', 19, 'void', 'V', 20, 'M', 21, 'm', 22, 'bool8', 'Bool', 'b1', 'float16', 'Float16', 'f2', 'float32', 'Float32', 'f4', 'float64', 'Float64', 'f8', 'float128', 'Float128', 'f16', 'complex64', 'Complex32', 'c8', 'complex128', 'Complex64', 'c16', 'complex256', 'Complex128', 'c32', 'object0', 'Object0', 'bytes0', 'Bytes0', 'str0', 'Str0', 'void0', 'Void0', 'datetime64', 'Datetime64', 'M8', 'timedelta64', 'Timedelta64', 'm8', 'int64', 'uint64', 'Int64', 'UInt64', 'i8', 'u8', 'int32', 'uint32', 'Int32', 'UInt32', 'i4', 'u4', 'int16', 'uint16', 'Int16', 'UInt16', 'i2', 'u2', 'int8', 'uint8', 'Int8', 'UInt8', 'i1', 'u1', 'complex_', 'int0', 'uint0', 'single', 'csingle', 'singlecomplex', 'float_', 'intc', 'uintc', 'int_', 'longfloat', 'clongfloat', 'longcomplex', 'bool_', 'unicode_', 'object_', 'bytes_', 'str_', 'string_', 'int', 'float', 'complex', 'bool', 'object', 'str', 'bytes', 'a'])

应用的例子：

In [147]: np.dtype('uint32')
Out[147]: dtype('uint32')

In [148]: np.dtype('float64')
Out[148]: dtype('float64')

通过应用 dtype 形成的元组，咱们能够生成新的 dtype。

元组也有很多种形式。

对于不固定长度的 dtype，能够指定 size：

In [149]: np.dtype((np.void, 10))
Out[149]: dtype('V10')

In [150]: np.dtype(('U', 10))
Out[150]: dtype('<U10')

对于固定长度的 dtype，能够指定 shape：

In [151]:  np.dtype((np.int32, (2,2)))
Out[151]: dtype(('<i4', (2, 2)))

In [152]: np.dtype(('i4, (2,3)f8, f4', (2,3)))
Out[152]: dtype(([('f0', '<i4'), ('f1', '<f8', (2, 3)), ('f2', '<f4')], (2, 3)))

list 中的元素是一个个的 field，每个 field 都是由 2 - 3 个局部组成的，别离是 field 名字，field 类型，field 的 shape。

field_name如果是’‘的话，就会应用默认的 f1，f2 …. 作为名字。field_name 也能够是一个 2 元组，由 title 和 name 组成。

field_dtype 就是 field 的 dtype 类型。

shape 是一个可选字段，如果 field_dtype 是一个数组的话，就须要指定 shape。

In [153]: np.dtype([('big', '>i4'), ('little', '<i4')])
Out[153]: dtype([('big', '>i4'), ('little', '<i4')])

下面是两个字段，一个是大端的 32 位的 int，一个是小端的 32 位的 int。

In [154]: np.dtype([('R','u1'), ('G','u1'), ('B','u1'), ('A','u1')])
Out[154]: dtype([('R', 'u1'), ('G', 'u1'), ('B', 'u1'), ('A', 'u1')])

四个字段，每个都是无符号整形。

这种写法能够指定 name 列表和 formats 列表：

In [157]: np.dtype({'names': ['r','g','b','a'], 'formats': [np.uint8, np.uint8, np.uint8, np.uint8]})
Out[157]: dtype([('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')])

offsets 指的是每个字段的 byte offsets。titles 是字段的 title，itemsize 是整个 dtype 的 size。

In [158]: np.dtype({'names': ['r','b'], 'formats': ['u1', 'u1'],
     ...:                'offsets': [0, 2],
     ...:                'titles': ['Red pixel', 'Blue pixel']})
     ...:
Out[158]: dtype({'names':['r','b'], 'formats':['u1','u1'], 'offsets':[0,2], 'titles':['Red pixel','Blue pixel'], 'itemsize':3})

能够将根本的 dtype 类型转换为结构化的 dtype 类型：

In [159]: np.dtype((np.int32,{'real':(np.int16, 0),'imag':(np.int16, 2)}))
Out[159]: dtype([('real', '<i2'), ('imag', '<i2')])

32 位的 int 转换成两个 16 位的 int。

In [161]: np.dtype(('i4', [('r','u1'),('g','u1'),('b','u1'),('a','u1')]))
Out[161]: dtype([('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')])

32 位的 int，转换成 4 个 unsigned integers。

本文已收录于 http://www.flydean.com/04-python-numpy-datatype-obj/

最艰深的解读，最粗浅的干货，最简洁的教程，泛滥你不晓得的小技巧等你来发现！

欢送关注我的公众号:「程序那些事」, 懂技术，更懂你！

正文完

python

发表至： python

2021-04-30

0

关于python:解决Pycharm不能自动生成函数注释

关于python:使用Python发送邮件在自己程序出错时发邮件通知自己

关于python:NumPy-获取唯一元素出现次数展平数组

Python3-实现单例设计模式

关于数据:数栈产品分享简析数据中台如何通过DataAPI实现数据共享

关于python:NumPy之数据类型对象dtype

简介

dtype 的定义

可转换为 dtype 的对象

dtype 对象

None

数组标量类型

通用类型

内置 Python 类型

带有.dtype 属性的对象

一个字符的 string 对象

数组类型的 String

逗号宰割的字符串

类型字符串

元组

(flexible_dtype, itemsize)

(fixed_dtype, shape)

[(field_name, field_dtype, field_shape), …]

{‘names’: …, ‘formats’: …, ‘offsets’: …, ‘titles’: …, ‘itemsize’: …}

(base_dtype, new_dtype)

Just My Socks（注册教程内含优惠码）

关于python:NumPy之数据类型对象dtype

简介

dtype 的定义

可转换为 dtype 的对象

dtype 对象

None

数组标量类型

通用类型

内置 Python 类型

带有.dtype 属性的对象

一个字符的 string 对象

数组类型的 String

逗号宰割的字符串

类型字符串

元组

(flexible_dtype, itemsize)

(fixed_dtype, shape)

[(field_name, field_dtype, field_shape), …]

{‘names’: …, ‘formats’: …, ‘offsets’: …, ‘titles’: …, ‘itemsize’: …}

(base_dtype, new_dtype)

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）