关于程序员:一文搞定Python读取文件的全部知识

35次阅读

共计 7026 个字符,预计需要花费 18 分钟才能阅读完成。

文件是无处不在的,无论咱们应用哪种编程语言,解决文件对于每个程序员都是必不可少的

文件解决是一种用于创立文件、写入数据和从中读取数据的过程,Python 领有丰盛的用于解决不同文件类型的包,从而使得咱们能够更加轻松不便的实现文件解决的工作

本文纲要:

  • 应用上下文管理器关上文件
  • Python 中的文件读取模式
  • 读取 text 文件
  • 读取 CSV 文件
  • 读取 JSON 文件

关上文件

在拜访文件的内容之前,咱们须要关上文件。Python 提供了一个内置函数能够帮忙咱们以不同的模式关上文件。open() 函数承受两个基本参数:文件名和模式

默认模式是“r”,它以只读形式关上文件。这些模式定义了咱们如何拜访文件以及咱们如何操作其内容。open() 函数提供了几种不同的模式,咱们将在前面逐个探讨

上面咱们通过’Python 之禅‘文件来进行前面的探讨学习

f = open('zen_of_python.txt', 'r')
print(f.read())
f.close()

Output:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
...

在下面的代码中,open() 函数以只读模式关上文本文件,这容许咱们从文件中获取信息而不能更改它。在第一行,open() 函数的输入被赋值给一个代表文本文件的对象 f,在第二行中,咱们应用 read() 办法读取整个文件并打印其内容,close() 办法在最初一行敞开文件。须要留神,咱们必须始终在解决完关上的文件后敞开它们以开释咱们的计算机资源并防止引发异样

在 Python 中,咱们能够应用 with 上下文管理器来确保程序在文件敞开后开释应用的资源,即便产生异样也是如此

with open('zen_of_python.txt') as f:
    print(f.read())

Output:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
...

下面的代码应用 with 语句创立了一个上下文,并绑定到变量 f,所有文件对象办法都能够通过该变量拜访文件对象。read() 办法在第二行读取整个文件,而后应用 print() 函数输入文件内容

当程序达到 with 语句块上下文的开端时,它会敞开文件以开释资源并确保其余程序能够失常调用它们。通常当咱们解决不再须要应用的,须要立刻敞开的对象(例如文件、数据库和网络连接)时,强烈推荐应用 with 语句

这里须要留神的是,即便在退出 with 上下文管理器块之后,咱们也能够拜访 f 变量,然而该文件是已敞开状态。让咱们尝试一些文件对象属性,看看变量是否依然存在并且能够拜访:

print("Filename is'{}'.".format(f.name))
if f.closed:
    print("File is closed.")
else:
    print("File isn't closed.")

Output:

Filename is 'zen_of_python.txt'.
File is closed.

然而此时是不可能从文件中读取内容或写入文件的,敞开文件时,任何拜访其内容的尝试都会导致以下谬误:

f.read()

Output:

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_9828/3059900045.py in <module>
----> 1 f.read()

ValueError: I/O operation on closed file.

Python 中的文件读取模式

正如咱们在后面提到的,咱们须要在关上文件时指定模式。下表是 Python 中的不同的文件模式:

模式阐明

  • ‘r’ 关上一个只读文件
  • ‘w’ 关上一个文件进行写入。如果文件存在,会笼罩它,否则会创立一个新文件
  • ‘a’ 关上一个仅用于追加的文件。如果该文件不存在,会创立该文件
  • ‘x’ 创立一个新文件。如果文件存在,则失败
  • ‘+’ 关上一个文件进行更新

咱们还能够指定以文本模式“t”、默认模式或二进制模式“b”关上文件。让咱们看看如何应用简略的语句复制图像文件 dataquest_logo.png:

with open('dataquest_logo.png', 'rb') as rf:
    with open('data_quest_logo_copy.png', 'wb') as wf:
        for b in rf:
            wf.write(b)

下面的代码复制 Dataquest 徽标图像并将其存储在同一门路中。’rb’ 模式以二进制模式关上文件并进行读取,而 ‘wb’ 模式以文本模式关上文件以并行写入

读取文本文件

在 Python 中有多种读取文本文件的办法,上面咱们介绍一些读取文本文件内容的有用办法

到目前为止,咱们曾经理解到能够应用 read() 办法读取文件的全部内容。如果咱们只想从文本文件中读取几个字节怎么办,能够在 read() 办法中指定字节数。让咱们尝试一下:

with open('zen_of_python.txt') as f:
    print(f.read(17))

Output:

The Zen of Python

下面的简略代码读取 zen_of_python.txt 文件的前 17 个字节并将它们打印进去

有时一次读取一行文本文件的内容更有意义,在这种状况下,咱们能够应用 readline() 办法

with open('zen_of_python.txt') as f:
    print(f.readline())

Output:

The Zen of Python, by Tim Peters

下面的代码返回文件的第一行,如果咱们再次调用该办法,它将返回文件中的第二行等,如下:

with open('zen_of_python.txt') as f:
    print(f.readline())
    print(f.readline())
    print(f.readline())
    print(f.readline())

Output:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.

Explicit is better than implicit.

这种有用的办法能够帮忙咱们以增量形式读取整个文件。

以下代码通过逐行迭代来输入整个文件,直到跟踪咱们正在读取或写入文件的地位的文件指针达到文件开端。当 readline() 办法达到文件开端时,它返回一个空字符串

with open('zen_of_python.txt') as f:
    line = f.readline()
    while line:
        print(line, end='')
        line = f.readline()

Output:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

下面的代码在 while 循环之外读取文件的第一行并将其调配给 line 变量。在 while 循环中,它打印存储在 line 变量中的字符串,而后读取文件的下一行。while 循环迭代该过程,直到 readline() 办法返回一个空字符串。空字符串在 while 循环中的计算结果为 False,因而迭代过程终止

读取文本文件的另一个有用办法是 readlines() 办法,将此办法利用于文件对象会返回蕴含文件每一行的字符串列表

with open('zen_of_python.txt') as f:
    lines = f.readlines()

让咱们查看 lines 变量的数据类型,而后打印它:

print(type(lines))
print(lines)

Output:

<class 'list'>
['The Zen of Python, by Tim Peters\n', '\n', 'Beaut...]

它是一个字符串列表,其中列表中的每个我的项目都是文本文件的一行,`\n 转义字符示意文件中的新行。此外,咱们能够通过索引或切片操作拜访列表中的每个我的项目:

print(lines)
print(lines[3:5])
print(lines[-1])

Output:

['The Zen of Python, by Tim Peters\n', '\n', 'Beautiful is better than ugly.\n', ... -- let's do more of those!"]
['Explicit is better than implicit.\n', 'Simple is better than complex.\n']
Namespaces are one honking great idea -- let's do more of those!

读取 CSV 文件

到目前为止,咱们曾经学会了如何应用惯例文本文件。然而有时数据采纳 CSV 格局,数据业余人员通常会检索所需信息并操作 CSV 文件的内容

接下来咱们将应用 CSV 模块,CSV 模块提供了有用的办法来读取存储在 CSV 文件中的逗号分隔值。咱们当初就尝试以下

import csv
with open('chocolate.csv') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        print(row)

Output:

['Company', 'Bean Origin or Bar Name', 'REF', 'Review Date', 'Cocoa Percent', 'Company Location', 'Rating', 'Bean Type', 'Country of Origin']
['A. Morin', 'Agua Grande', '1876', '2016', '63%', 'France', '3.75', 'Â\xa0', 'Sao Tome']
['A. Morin', 'Kpime', '1676', '2015', '70%', 'France', '2.75', 'Â\xa0', 'Togo']
['A. Morin', 'Atsane', '1676', '2015', '70%', 'France', '3', 'Â\xa0', 'Togo']
['A. Morin', 'Akata', '1680', '2015', '70%', 'France', '3.5', 'Â\xa0', 'Togo']
...

CSV 文件的每一行造成一个列表,其中每个我的项目都能够轻松的被拜访,如下所示:

import csv
with open('chocolate.csv') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        print("The {} company is located in {}.".format(row[0], row[5]))

Output:

The Company company is located in Company Location.
The A. Morin company is located in France.
The A. Morin company is located in France.
The A. Morin company is located in France.
The A. Morin company is located in France.
The Acalli company is located in U.S.A..
The Acalli company is located in U.S.A..
The Adi company is located in Fiji.
...

很多时候,应用列的名称而不是应用它们的索引,这通常对业余人员来说更不便。在这种状况下,咱们不应用 reader() 办法,而是应用返回字典对象汇合的 DictReader() 办法

import csv
with open('chocolate.csv') as f:
    dict_reader = csv.DictReader(f, delimiter=',')
    for row in dict_reader:
        print("The {} company is located in {}.".format(row['Company'], row['Company Location']))

Output:

The A. Morin company is located in France.
The A. Morin company is located in France.
The A. Morin company is located in France.
The A. Morin company is located in France.
The Acalli company is located in U.S.A..
The Acalli company is located in U.S.A..
The Adi company is located in Fiji.
...

读取 JSON 文件

咱们次要用于存储和替换数据的另一种风行文件格式是 JSON,JSON 代表 JavaScript Object Notation,容许咱们应用逗号分隔的键值对存储数据

接下来咱们将加载一个 JSON 文件并将其作为 JSON 对象应用,而不是作为文本文件,为此咱们须要导入 JSON 模块。而后在 with 上下文管理器中,咱们应用了属于 json 对象的 load() 办法,它加载文件的内容并将其作为字典存储在上下文变量中。

import json
with open('movie.json') as f:
    content = json.load(f)
    print(content)

Output:

{'Title': 'Bicentennial Man', 'Release Date': 'Dec 17 1999', 'MPAA Rating': 'PG', 'Running Time min': 132, 'Distributor': 'Walt Disney Pictures', 'Source': 'Based on Book/Short Story', 'Major Genre': 'Drama', 'Creative Type': 'Science Fiction', 'Director': 'Chris Columbus', 'Rotten Tomatoes Rating': 38, 'IMDB Rating': 6.4, 'IMDB Votes': 28827}

让咱们查看内容变量的数据类型:

print(type(content))

Output:

<class 'dict'>

它的数据类型是字典,因而咱们能够不便的从中提取数据

print('{} directed by {}'.format(content['Title'], content['Director']))

Output:

Bicentennial Man directed by Chris Columbus

总结

明天咱们探讨了 Python 中的文件解决,重点是读取文件的内容。咱们理解了 open() 内置函数、with 上下文管理器,以及如何读取文本、CSV 和 JSON 等常见文件类型。

好了,这就是明天分享的全部内容,喜爱就点个赞吧~

本文由 mdnice 多平台公布

正文完
 0