关于python:Python-IO

文件关上和敞开就是两个函数，一个 open 函数一个 close 函数

open 函数的原型

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

后面说 open 函数返回的是一个 file-like 对象，然而这个 file-like 对象并不是固定的，这个对象的类型会随着关上 mode 的变动而变动。

以文本模式关上文件（’w’, ‘r’，’wt’，’rt’ 等），返回一个 TextIOWrapper。
当用二进制模式关上文件时，返回的对象也会变动。
在二进制读取模式，返回一个 BufferedReader。
在二进制写模式和二进制追加模式，返回一个 BufferedWriter。
在二进制读 / 写模式下，返回一个 BufferedRandom。

In [1]: f = open('./hello.py')    # 间接 open 函数关上，文件不存在会 FileNotFoundError
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-b6df97277b77> in <module>()
----> 1 f = open('./hello.py')

FileNotFoundError: [Errno 2] No such file or directory: './hello.py'

In [2]: f = open('./hello.py')    # 创立文件之后就能够关上，返回一个 file-like 对象

In [3]: f.read()    # 读出文件全部内容
Out[3]: "#!/usr/bin/env python\n# coding=utf-8\nprint('hello world')\n"

In [4]: f.close()    # 敞开文件

<!–more–>

文件读写次要是 read 和 write 及其变种，文件的读写依赖于 open 函数的 mode 参数。

Mode 具体含意如下

‘r’ open for reading (default)
‘w’ open for writing, truncating the file first
‘x’ create a new file and open it for writing
‘a’ open for writing, appending to the end of the file if it exists
‘b’ binary mode
‘t’ text mode (default)
‘+’ open a disk file for updating (reading and writing)
‘U’ universal newline mode (deprecated)

阐明：

当 mode=’x’ 时，如果文件不存在，则会抛出异样 FileExistsError。
当 mode=’w’ 时，只有关上了文件，即便不写入内容，也会先清空文件。
当 mode 蕴含 + 时，会减少额定的读写操作，也就说原来是只读的，会减少可写的操作，原来是只写的，会减少可读的操作，然而 + 不扭转其余行为。

mode=t 按字符操作
mode=b 按字节操作

In [1]: f = open('./hello.py', mode='rt')    # mode=t 读入的内容是字符串

In [2]: s = f.read()

In [3]: s
Out[3]: "#!/usr/bin/env python\n# coding=utf-8\nprint('hello world')\n"

In [4]: type(s)    # s 是 str 类型的
Out[4]: str

In [5]: f.close()

In [6]: f = open('./hello.py', mode='rb')    # mode=b 读入的是 bytes

In [7]: s = f.read()

In [8]: s
Out[8]: b"#!/usr/bin/env python\n# coding=utf-8\nprint('hello world')\n"

In [9]: type(s)
Out[9]: bytes

当关上文件的时候，解释器会持有一个指针，指向文件的某个地位，当咱们读写文件的时候，总是从指针处开始向后操作，并且挪动指针。当 mode= r 时，指针是指向 0(文件开始)，当 mode= a 时，指针指向 EOF(文件开端)

和文件指针相干的两个函数是 tell 函数和 seek 函数

tell 函数

返回以后流的地位，对于文件来说，就是文件流的地位，即文件指针的地位。

seek 函数

扭转文件流的地位，并返回新的相对地位。

seek(cookie, whence=0, /) method of _io.TextIOWrapper instance

对于文件指针的总结

当 seek 超出文件开端，不会有异样，tell 也会超出文件开端，然而写数据的时候，还是会从文件开端开始写

write 操作从 min(EOF, tell())处开始

文件指针按字节操作（无论是字符模式还是字节模式）
tell 办法返回以后文件指针地位
seek 办法挪动文件指针
whence 参数 SEEK_SET(0) 从 0 开始向后挪动 offset 个字节, SEEK_CUR(1) 从以后地位向后挪动 offset 个字节, SEEK_END(2) 从 EOF 向后挪动 offset 个字节
offset 是整数
当 mode 为 t 时，whence 为 SEEK_CUR 或者 SEEK_END 时，offset 只能为 0
文件指针不能为正数
读文件的时候从文件指针 (pos) 开始向后读
写文件的时候从 min(EOF,pos)处开始向后写
以 append 模式关上的时候，无论文件指针在何处，都从 EOF 开始写

文件缓冲区由 open 函数的 buffering 参数决定，buffering 示意缓冲形式，参数默认值为 -1，示意文本模式和二进制模式都是采纳默认的缓冲区。

buffering=-1

二进制模式：DEFAULT_BUFFER_SIZE
文本模式：DEFAULT_BUFFER_SIZE

buffering=0

二进制模式：unbuffered
文本模式：不容许

buffering=1

二进制模式：1
文本模式：line buffering

buffering>1

二进制模式：buffering
文本模式：DEFAULT_BUFFER_SIZE

总结

二进制模式：判断缓冲区残余地位是否足够寄存以后字节，如果不能，先 flush，在把以后字节写入缓冲区，如果以后字节大于缓冲区大小，间接 flush。
文本模式：line buffering，遇到换行就 flush，非 line buffering，如果以后字节加缓冲区中的字节，超出缓冲区大小，间接将缓冲区和以后字节全副 flush。
flush 和 close 能够强制刷新缓冲区。

上下文治理，会在来到时主动敞开文件，然而不会开启新的作用域。

In [1]: with open('./hello.py') as f:
    ...:     pass
    ...: 

In [2]: f.readable()    # 来到上下文治理后，文件已敞开，不可再进行 I / O 操作
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-97a5eee249a2> in <module>()
----> 1 f.readable()

ValueError: I/O operation on closed file    

In [3]: f
Out[3]: <_io.TextIOWrapper name='./hello.py' mode='r' encoding='UTF-8'>

In [4]: f.closed    # f 曾经敞开
Out[4]: True

上下文治理除了 with open('./hello.py') as f: 这种写法外，还有另外一种写法

In [21]: f = open('./hello.py')

In [22]: with f:
    ...:     pass
    ...:

像 open() 函数返回的这种有个 read() 办法的对象，在 Python 中统称为 file-like Object。除了 file 外，还能够是内存的字节流，网络流，自定义流等等。常见的有 StringIO 和 BytesIO。

StringIO 顾名思义就是在内存中读写 str。

要把 str 写入 StringIO，咱们须要先创立一个 StringIO 对象，而后项文件一样写入并读取。file 反对的操作 StringIO 根本都是反对的。

In [1]: from io import StringIO

In [2]: help(StringIO)


In [3]: sio = StringIO()    # 创立 StringIO 对象，也能够用 str 来初始化 StringIO

In [4]: sio.write('hello world')
Out[4]: 11

In [5]: sio.write('!')
Out[5]: 2

In [6]: sio.getvalue()    # getvalue()办法用于取得写入后的 str。Out[6]: 'hello world !'

In [7]: sio.closed
Out[7]: False

In [8]: sio.readline()
Out[8]: ''

In [9]: sio.seekable()
Out[9]: True

In [10]: sio.seek(0, 0)    # 反对 seek 操作
Out[10]: 0

In [11]: sio.readline()
Out[11]: 'hello world !'

要读取 StringIO，能够用一个 str 初始化 StringIO，而后，像读文件一样读取：

In [1]: from io import StringIO

In [2]: sio = StringIO('I\nlove\npython!')

In [3]: for line in sio.readlines():
   ...:     print(line.strip())
   ...:     
I
love
python!

StringIO 操作的只能是 str，如果要操作二进制数据，就须要应用 BytesIO。

BytesIO 实现了在内存中读写 bytes，咱们创立一个 BytesIO，而后写入一些 bytes：

In [1]: from io import BytesIO

In [2]: bio = BytesIO()

In [3]: bio.write(b'abcd')
Out[3]: 4

In [4]: bio.seek(0)
Out[4]: 0

In [5]: bio.read()
Out[5]: b'abcd'

In [6]: bio.getvalue()    # getvalue 能够一次性独处全部内容，不论文件指针在哪里
Out[6]: b'abcd'

和 StringIO 相似，能够用一个 bytes 初始化 BytesIO，而后，像读文件一样读取：

In [1]: from io import BytesIO

In [2]: bio = BytesIO(b'abcd')

In [3]: bio.read()
Out[3]: b'abcd'

门路操作有 os.path 和 pathlib 两种形式。

os.path 是已字符串的形式操作门路的：import os
pathlib 是面向对象设计的文件系统门路：import pathlib

pathlib 在 python3.2 以上开始默认反对，在 python2.7 中如果要应用 pathlib 须要装置

pip install pathlib

pathlib 模块的源代码见：Lib/pathlib.py

pathlib 目录的根本应用是 pathlib 模块中的 Path 这个类。

In [1]: import pathlib    # 引入 pathlib 这个模块

In [2]: cwd = pathlib.Path('.')    # 应用 pathlib 模块的 Path 类初始化以后门路，参数是一个 PurePath

In [3]: cwd    # 返回值是一个 PosixPath，如果是 windows 环境会返回一个 WindowsPath
Out[3]: PosixPath('.')

通过 help(pathlib.Path) 能够查看到 Path 类的各个 Methods。

Help on class Path in module pathlib:

class Path(PurePath)
 |  PurePath represents a filesystem path and offers operations which
 |  don't imply any actual filesystem I/O.  Depending on your system,
 |  instantiating a PurePath will return either a PurePosixPath or a
 |  PureWindowsPath object.  You can also instantiate either of these classes
 |  directly, regardless of your system.
 |  
 |  Method resolution order:
 |      Path
 |      PurePath
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __enter__(self)
 |  
 |  __exit__(self, t, v, tb)
 |  
...

目录操作的几个函数：

is_dir(self)：判断门路是否是目录
iterdir(self)：生成以后门路下所有文件 (包含文件夹) 的生成器，然而不会 yield ‘.’ 和 ’..’ 这两个门路
mkdir(self, mode=511, parents=False, exist_ok=False)：删除当前目录，能够指定 mode
rmdir(self)：删除目录，并且目录必须为空，否则会报错

应用示例如下

In [4]: cwd.is_dir()
Out[4]: True

In [5]: cwd.iterdir()    # iterdir 函数返回的是一个生成器
Out[5]: <generator object Path.iterdir at 0x7f6727d926d0>

In [6]: for f in cwd.iterdir():    # 不会生成 '.' 和 '..'
   ...:     print(type(f))
   ...:     print(f)
   ...:     
<class 'pathlib.PosixPath'>
hello.py
<class 'pathlib.PosixPath'>
aa.py

In [7]: cwd.mkdir('abc')    # pathlib 的 mkdir 是门路对象的办法
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-3b48dd61eb0f> in <module>()
----> 1 cwd.mkdir('abc')

/home/clg/.pyenv/versions/3.5.2/lib/python3.5/pathlib.py in mkdir(self, mode, parents, exist_ok)
   1212         if not parents:
   1213             try:
-> 1214                 self._accessor.mkdir(self, mode)
   1215             except FileExistsError:
   1216                 if not exist_ok or not self.is_dir():

/home/clg/.pyenv/versions/3.5.2/lib/python3.5/pathlib.py in wrapped(pathobj, *args)
    369         @functools.wraps(strfunc)
    370         def wrapped(pathobj, *args):
--> 371             return strfunc(str(pathobj), *args)
    372         return staticmethod(wrapped)
    373 

TypeError: an integer is required (got type str)

In [8]: d = pathlib.Path('./abc')

In [9]: d.exists()
Out[9]: False

In [10]: d.mkdir(755)     # 创立文件夹，然而 755 不等于 0o755(8 进制)

In [11]: %ls
aa.py  abc/  hello.py

In [12]: %ls -ld ./abc
d-wxrw---t. 2 clg clg 6 Feb 13 21:01 ./abc/    # mode 指定有问题，所以权限不失常

In [13]: d.rmdir()

In [14]: d.exists()
Out[14]: False

In [15]: d.mkdir(0o755)    # 应用 8 进制指定 mode

In [16]: %ls -ld ./abc
drwxr-xr-x. 2 clg clg 6 Feb 13 21:03 ./abc/

次要是一些门路的通用操作

In [17]: f = pathlib.Path('./ab/cd/a.txt')

In [18]: f.exists()
Out[18]: False

In [19]: f.is_file()
Out[19]: False

In [20]: f.is_absolute()
Out[20]: False

In [21]: f = pathlib.Path('./hello.py')

In [22]: f.is_file()
Out[22]: True

In [23]: f.is_absolute()
Out[23]: False

In [24]: f.absolute()    # 获取门路的绝对路径
Out[24]: PosixPath('/home/clg/workspace/subworkspace/hello.py')

In [25]: f.chmod(0o755)    # 扭转门路的权限

In [26]: %ls -ld ./hello.py
-rwxr-xr-x. 1 clg clg 58 Feb  8 13:32 ./hello.py*

In [27]: f.cwd()    # 返回一个新门路指向当前工作目录
Out[27]: PosixPath('/home/clg/workspace/subworkspace')

In [28]: f.home()
Out[28]: PosixPath('/home/clg')

In [29]: pathlib.Path('~').expanduser()    # 将~ 转换胜利绝对路径
Out[29]: PosixPath('/home/clg')

In [30]: f.name()    # name 是一个属性，不是一个办法
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-f0ea48ccc8ff> in <module>()
----> 1 f.name()

TypeError: 'str' object is not callable

In [31]: f.name    # 获取失去的是根本名称 basename
Out[31]: 'hello.py'

In [32]: f.home().name
Out[32]: 'clg'

In [33]: f.owner()    # 获取属主
Out[33]: 'clg'

In [34]: f.home().parent
Out[34]: PosixPath('/home')

In [35]: f.parts
Out[35]: ('hello.py',)

In [36]: f.absolute().parts    # 获取门路的拆分
Out[36]: ('/', 'home', 'clg', 'workspace', 'subworkspace', 'hello.py')

In [37]: f.root    # 获取根目录，然而 './hello.py' 获取到的则是 '.'
Out[37]: ''

In [38]: f.home().root    # 获取根目录
Out[38]: '/'

In [39]: f.suffix    # 获取后缀
Out[39]: '.py'

In [40]: f.stat()    # 相似 os.stat()，返回门路的各项信息
Out[40]: os.stat_result(st_mode=33261, st_ino=34951327, st_dev=64768, st_nlink=1, st_uid=1000, st_gid=1000, st_size=58, st_atime=1486531928, st_mtime=1486531926, st_ctime=1486995977)

In [41]: f.stat().st_mode    # 获取 stat()返回后果中的各个信息的办法：应用 '.'
Out[41]: 33261

In [42]: d = pathlib.Path('..')

In [43]: for x in d.glob(*.py):    # rglob(self, pattern)参数是一个 pattern
  File "<ipython-input-43-3fdfb8e408ac>", line 1
    for x in d.glob(*.py):
                     ^
SyntaxError: invalid syntax


In [44]: for x in d.glob('*.py'):    # 返回以后门路下的通配文件
    ...:     print(x)
    ...:     
../judge.py
../progress.py
../zipperMethod.py
../decorator.py

In [45]: for x in d.rglob('*.py'):    # 返回以后门路下及其子门路下的通配文件（递归）...:     print(x)
    ...:     
../judge.py
../progress.py
../zipperMethod.py
../decorator.py
../subworkspace/hello.py
../subworkspace/aa.py

应用 shutil 模块即可

import shutil

shutil.copyfileobj # 操作对象是文件对象
shutil.copyfile # 仅复制内容
shutil.copymode # 仅复制权限
shutil.copystat # 仅复制元数据
shutil.copy # 复制文件内容和权限 copyfile + copymode
shutil.copy2 # 复制文件内容和元数据 copyfile + copystat
shutil.copytree # 递归复制目录
shutil.rmtree # 用于递归删除目录
shutil.move # 具体实现依赖操作系统，如果操作系统实现了 rename 零碎调用，间接走 rename 零碎调用，如果没实现，先应用 copytree 复制，而后应用 rmtree 删除源文件

序列化：对象转化为数据
反序列化：数据转化为对象

pickle 是 Python 公有的序列化协定

pickle 源代码见：lib/python3.5/pickle.py

次要函数

dumps 对象导出为数据，即序列化
loads 数据载入为对象，即反序列化，反序列化一个对象时，必须存在此对象的类

In [1]: import pickle

In [2]: class A:    # 申明一个类 A
   ...:     def print(self):
   ...:         print('aaaa')
   ...:         

In [3]: a = A()    # 定义类 A 的一个对象 a

In [4]: pickle.dumps(a)    # 对象导出为数据
Out[4]: b'\x80\x03c__main__\nA\nq\x00)\x81q\x01.'

In [5]: b = pickle.dumps(a)

In [6]: pickle.loads(b)    # 数据导出为对象
Out[6]: <__main__.A at 0x7f5dcdc71dd8>

In [7]: a
Out[7]: <__main__.A at 0x7f5dcdd28be0>    # 两个对象的地址不一样，然而两个对象的内容的确一样的

In [8]: aa = pickle.loads(b)

In [9]: a.print()    # 原始对象的 print 函数
aaaa

In [10]: aa.print()    # 反序列化对象的 print 函数
aaaa

JSON 格局反对的数据类型如下

类型	形容
Number	在 JavaScript 中的双精度浮点格局
String	双引号的反斜杠本义的 Unicode，对应 python 中的 str
Boolean	true 或 false
Array	值的有序序列，对应 python 中的 list
Value	它能够是一个字符串，一个数字，真的还是假（true/false），空 (null) 等
Object	无序汇合键值对，对应 python 中的 dict
Whitespace	能够应用任何一对中的令牌
null	empty

应用示例如下

In [1]: import json

In [2]: d = {'a': 1, 'b': [1, 2, 3]}

In [3]: json.dumps(d)
Out[3]: '{"a": 1,"b": [1, 2, 3]}'

In [4]: json.loads('{"a": 1,"b": [1, 2, 3]}')
Out[4]: {'a': 1, 'b': [1, 2, 3]}

json 参考：JSON 数据格式

记得帮我点赞哦！

精心整顿了计算机各个方向的从入门、进阶、实战的视频课程和电子书，依照目录正当分类，总能找到你须要的学习材料，还在等什么？快去关注下载吧！！！

朝思暮想，必有回响，小伙伴们帮我点个赞吧，非常感谢。

我是职场亮哥，YY 高级软件工程师、四年工作教训，回绝咸鱼争当龙头的斜杠程序员。

听我说，提高多，程序人生一把梭

如果有幸能帮到你，请帮我点个【赞】，给个关注，如果能顺带评论给个激励，将不胜感激。

职场亮哥文章列表：更多文章

自己所有文章、答复都与版权保护平台有单干，著作权归职场亮哥所有，未经受权，转载必究！

关于python:Python-IO

文件关上和敞开

文件读写

open 函数的 mode 参数

mode=t&mode=b

文件指针

文件缓冲区

上下文治理

File-like 对象

StringIO

BytesIO

门路操作 pathlib

目录操作

通用操作

文件复制挪动删除

序列化和反序列化

Python 公有协定 pickle

通用的 json 协定