关于python:开眼界Python-遍历文件可以这样做

37次阅读

共计 3645 个字符，预计需要花费 10 分钟才能阅读完成。

Python 对于文件夹或者文件的遍历个别有两种操作方法，一种是至二级利用其封装好的 walk 办法操作：

import os
for root,dirs,files in os.walk("/Users/cxhuan/Downloads/globtest/hello"):
    for dir in dirs:
        print(os.path.join(root, dir))
    for file in files:
        print(os.path.join(root, file))

下面代码运行后果如下：

/Users/cxhuan/Downloads/globtest/hello/world
/Users/cxhuan/Downloads/globtest/hello/.DS_Store
/Users/cxhuan/Downloads/globtest/hello/hello3.txt
/Users/cxhuan/Downloads/globtest/hello/hello2.txt
/Users/cxhuan/Downloads/globtest/hello/hello1.txt
/Users/cxhuan/Downloads/globtest/hello/world/world1.txt
/Users/cxhuan/Downloads/globtest/hello/world/world3.txt
/Users/cxhuan/Downloads/globtest/hello/world/world2.txt

上述程序，将 os.walk 读取到的所有门路 root、目录名 dirs 与文件名 files，也就是三个文件数组利用 foreach 循环输入。join 办法就是将其门路与目录名或者文件名连接起来，组成一个残缺的目录。

另一种是用递归的思路，写成上面的模式：

import os
files = list()
def dirAll(pathname):
    if os.path.exists(pathname):
        filelist = os.listdir(pathname)
        for f in filelist:
            f = os.path.join(pathname, f)
            if os.path.isdir(f):
                dirAll(f)
            else:
                dirname = os.path.dirname(f)
                baseName = os.path.basename(f)
                if dirname.endswith(os.sep):
                    files.append(dirname+baseName)
                else:
                    files.append(dirname+os.sep+baseName)


dirAll("/Users/cxhuan/Downloads/globtest/hello")
for f in files:
    print(f)

运行下面代码，失去的后果和下面一样。

这两种办法都没问题，就是写起来比拟麻烦，特地是第二种，一不小心还有可能写出 bug。

明天咱们来介绍第三种办法——利用 glob 模块来遍历文件。

简介
glob 是 python 自带的一个操作文件的模块，以简洁实用著称。因为这个模块的性能比较简单，所以也很容易上手和应用。它次要用来查找合乎特定规定的文件门路。应用这个模块来查找文件，只须要用到 *、? 和 [] 这三个匹配符：

 *：匹配 0 个或多个字符；?：匹配单个字符；[]：匹配指定范畴内的字符，如：[0-9]匹配数字。

glob.glob 办法
glob.glob 办法次要返回所有匹配的文件门路列表。它只有一个参数 pathname，定义了文件门路匹配规定，这里能够是绝对路径，也能够是相对路径。

应用 * 匹配
咱们能够用 * 匹配零个或者多个字符。

for p1 in glob.glob('/Users/cxhuan/Downloads/globtest/*'):
    print(p1)

运行下面代码，会将 globtest 文件夹下仅有的目录输入进去，输入内容如下：

/Users/cxhuan/Downloads/globtest/hello

咱们也能够通过制订层级来遍历文件或者文件夹：

for p in glob.glob('/Users/cxhuan/Downloads/globtest/*/*'):
    print(p)

下面的代码会遍历 globtest 文件夹以及子文件夹，将所有的文件或文件夹门路打印进去：

/Users/cxhuan/Downloads/globtest/hello/world
/Users/cxhuan/Downloads/globtest/hello/hello3.txt
/Users/cxhuan/Downloads/globtest/hello/hello2.txt
/Users/cxhuan/Downloads/globtest/hello/hello1.txt

咱们也能够对文件或者文件夹进行过滤:

for p in glob.glob('/Users/cxhuan/Downloads/globtest/hello/*3.txt'):
    print(p)

下面代码值匹配 hello 目录下的文件名开端为‘3’的 txt 文件，运行后果如下：

/Users/cxhuan/Downloads/globtest/hello/hello3.txt

应用 ? 匹配
咱们能够用问号 (?) 匹配任何单个的字符。

for p in glob.glob('/Users/cxhuan/Downloads/globtest/hello/hello?.txt'):
    print(p)

下面的代码输入 hello 目录下的以‘hello’结尾的 txt 文件，输入后果如下：

/Users/cxhuan/Downloads/globtest/hello/hello3.txt
/Users/cxhuan/Downloads/globtest/hello/hello2.txt
/Users/cxhuan/Downloads/globtest/hello/hello1.txt

应用 [] 匹配
咱们能够应用 [] 来匹配一个范畴：

for p in glob.glob('/Users/cxhuan/Downloads/globtest/hello/*[0-2].*'):
    print(p)

咱们想要失去 hello 目录下的文件名结尾数字的范畴为 0 到 2 的文件，运行下面代码，取得的输入为：

/Users/cxhuan/Downloads/globtest/hello/hello2.txt
/Users/cxhuan/Downloads/globtest/hello/hello1.txt

glob.iglob 办法
python 的 glob 办法能够对文件夹下所有文件进行遍历，并返回一个 list 列表。而 iglob 办法一次只获取一个匹配门路。上面是一个简略的例子来阐明二者的区别：

p = glob.glob('/Users/cxhuan/Downloads/globtest/hello/hello?.*')
print(p)

print('----------------------')

p = glob.iglob('/Users/cxhuan/Downloads/globtest/hello/hello?.*')
print(p)

运行下面代码，后果返回是：

['/Users/cxhuan/Downloads/globtest/hello/hello3.txt', '/Users/cxhuan/Downloads/globtest/hello/hello2.txt', '/Users/cxhuan/Downloads/globtest/hello/hello1.txt']
----------------------
<generator object _iglob at 0x1040d8ac0>

从下面的后果咱们能够很容易看到二者的区别，前者返回的是一个列表，后者返回的是一个可迭代对象。

咱们针对这个可迭代对象做一下操作看看：

p = glob.iglob('/Users/cxhuan/Downloads/globtest/hello/hello?.*')
print(p.__next__())
print(p.__next__())

运行后果如下：

/Users/cxhuan/Downloads/globtest/hello/hello3.txt
/Users/cxhuan/Downloads/globtest/hello/hello2.txt

咱们能够看到，针对这个可迭代对象，咱们一次能够获取到一个元素。这样做的益处是节俭内存，试想如果一个门路下有大量的文件夹或者文件，咱们应用这个迭代对象不必一次性全副获取到内存，而是能够缓缓获取。

总结
明天分享的模块尽管性能简略，然而对于咱们遍历文件或者目录来说足够应用了，并且办法简略易懂，值得大家常常应用。

以上就是本次分享的所有内容，想要理解更多 python 常识欢送返回公众号：Python 编程学习圈，每日干货分享

正文完

python

发表至： python

2022-06-10

0

关于python:python-下载大文件

关于python:oeasypython0031挂起进程恢复进程进程切换

关于python:Python比较运算符分类和含义

关于python:Python-中更优雅的日志记录方案

关于presto:Meta公司新探索-利用Alluxio数据缓存降低Presto延迟

关于python:开眼界Python-遍历文件可以这样做

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）