正则表达式概述　
　　正则表达式（简称为 regex）是一些由字符和特殊符号组成的字符串，形容了模式的反复或者表述多个字符。
　　正则表达式能依照某种模式匹配一系列有类似特色的字符串。
　　换句话说，它们可能匹配多个字符串。
　　不同语言的正则表达式有差别，本文叙述是Python的正则表达式。
　　解释代码大多摘自《Python编程疾速上手让繁琐工作自动化》
正则表达式书写
　　正则表达式就是一个字符串，与一般字符串不同的是，正则表达式蕴含了0个或多个表达式符号以及特殊字符，详见《Python外围编程》1.2节。

正则表达式书写

‘hing’
‘\wing’
‘123456’
‘\d\d\d\d\d\d’
‘regex.py’
‘.*.py’

创立正则表达式对象
　　孤立的一个正则表达式并不能起到匹配字符串的作用，要让其可能匹配指标字符，须要创立一个正则表达式对象。通常向compile()函数传入一个原始字符模式的正则表达式，即 r’…..’

re模块的compile()函数将返回（创立）一个Regex模式对象

import re
phoneNumRegex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)
罕用的正则表达式模式
4.1 括号分组

Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)’)
mo = Regex.search(‘My number is 415-555-4242.’)
Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)’) # 创立Regex对象
mo = Regex.search(‘My number is 415-555-4242.’) # 返回Match对象
mo.group() # 调用Regex对象的group()办法将返回整个匹配文本
‘415-555-4242’
mo.group(1)
‘415’
mo.group(2)
‘555-4242’
mo.group(0)
‘415-555-4242’
mo.groups()
(‘415’, ‘555-4242’)
a,b = mo.groups() # groups()办法返回多个值得元组
a
‘415’
b
‘555-4242’

4.2 用管道匹配多个分组

heroRegex = re.compile (r’Batman|Tina Fey’)
mo1 = heroRegex.search(‘Batman and Tina Fey.’)
mo1.group()
‘Batman’
mo2 = heroRegex.search(‘Tina Fey and Batman.’)
mo2.group()
‘Tina Fey

4.3 用问号实现可选匹配

batRegex = re.compile(r’Bat(wo)?man’) # 如果’wo’没有用括号括起来，则可选的字符将是Batwo
mo1 = batRegex.search(‘The Adventures of Batman’)
mo1.group()
‘Batman’
mo2 = batRegex.search(‘The Adventures of Batwoman’)
mo2.group()
‘Batwoman’

4.4 用星号匹配零次或屡次

batRegex = re.compile(r’Bat(wo)man’) # 如果要匹配’‘号则用*
mo1 = batRegex.search(‘The Adventures of Batman’)
mo1.group()
‘Batman’
mo2 = batRegex.search(‘The Adventures of Batwoman’)
mo2.group()
‘Batwoman’
mo3 = batRegex.search(‘The Adventures of Batwowowowoman’)
mo3.group()
‘Batwowowowoman

4.5 用加号匹配一次或屡次

batRegex = re.compile(r’Bat(wo)+man’) # 如果要匹配+号用+
mo1 = batRegex.search(‘The Adventures of Batwoman’)
mo1.group()
‘Batwoman’
mo2 = batRegex.search(‘The Adventures of Batwowowowoman’)
mo2.group()
‘Batwowowowoman’
mo3 = batRegex.search(‘The Adventures of Batman’)
mo3 == None
True

4.6 用花括号匹配特定次数
　　上面代码的 “?” 示意非贪婪匹配。问号在正则表达式中可能有两种含意：申明非贪婪匹配或示意可选的分组。这两种含意是齐全无关的。

greedyHaRegex = re.compile(r'(Ha){3,5}’) # 若果要匹配{,则用{
mo1 = greedyHaRegex.search(‘HaHaHaHaHa’)
mo1.group()
‘HaHaHaHaHa’
nongreedyHaRegex = re.compile(r'(Ha){3,5}?’)
mo2 = nongreedyHaRegex.search(‘HaHaHaHaHa’)
mo2.group()
‘HaHaHa’

贪婪和非贪婪匹配
　　利用非贪婪匹配的目标往往在于不想让通配符（.）连通配符之外的匹配字符也被匹配，代码如下。当然3.6也是非贪婪匹配的一个例子

nongreedyRegex = re.compile(r'<.*?>’)
mo = nongreedyRegex.search(‘<To serve man> for dinner.>’)
mo.group()
‘<To serve man>’
greedyRegex = re.compile(r'<.*>’)
mo = greedyRegex.search(‘<To serve man> for dinner.>’)
mo.group()
‘<To serve man> for dinner.>’

Regex 对象罕用办法
　　如上所述，compile()函数创立了一个Regex对象，Regex对象罕用办法如下
6.1 search(), group(), groups()

Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)’)

mo = Regex.search(‘My number is 415-555-4242.’)
Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)’) # 创立Regex对象
mo = Regex.search(‘My number is 415-555-4242.’) # 返回Match对象
mo.group() # 调用Regex对象的group()办法将返回整个匹配文本
‘415-555-4242’
mo.group(1)
‘415’
mo.group(2)
‘555-4242’
mo.group(0)
‘415-555-4242’
mo.groups()
(‘415’, ‘555-4242’)
a,b = mo.groups() # groups()办法返回多个值得元组
a
‘415’
b
‘555-4242’

6.2 findall()
　　如果调用在一个没有分组的正则表达式上，findall()将返回一个匹配字符串的列表。
　　如果调用在一个有分组的正则表达式上，findall()将返回一个字符串的元组的列表（每个分组对应一个字符串）

Regex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’) # has no groups
Regex.findall(‘Cell: 415-555-9999 Work: 212-555-0000’)
[‘415-555-9999’, ‘212-555-0000’]
Regex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)’) # has groups
Regex.findall(‘Cell: 415-555-9999 Work: 212-555-0000’)
[(‘415’, ‘555’, ‘1122’), (‘212’, ‘555’, ‘0000’)]
6.3 sub()
namesRegex = re.compile(r’Agent \w+’)
namesRegex.sub(‘CENSORED’, ‘Agent Alice gave the secret documents to Agent Bob.’)
‘CENSORED gave the secret documents to CENSORED.’
namesRegex = re.compile(r’Agent \w+’)
namesRegex.sub(‘CENSORED’, ‘Agent Alice gave the secret documents to Agent Bob.’ , 1) # 匹配1次
‘CENSORED gave the secret documents to Agent Bob.’

re.IGNOREC ASE、 re.DOTALL 和 re.VERBOSE
　　要让正则表达式不辨别大小写，能够向 re.compile()传入 re.IGNORECASE 或 re.I，作为第二个参数。
　　通过传入 re.DOTALL 作为 re.compile()的第二个参数，能够让句点字符匹配所有字符，包含换行字符。
　　要在多行正则表达式中增加正文，则向 re.compile()传入变量 re.VERBOSE，作为第二个参数。

someRegexValue = re.compile(‘foo’, re.IGNORECASE | re.DOTALL | re.VERBOSE)
(?:…)

re.findall(r’http://(?:\w+.)*(\w+.com)’, ‘http://google.com http://www.google.com http://code.google.com’)
[‘google.com’, ‘google.com’, ‘google.com’]

9.代码实际

（文件读写）疯狂填词2.py

”’
创立一个疯狂填词（ Mad Libs）程序，它将读入文本文件，并让用户在该文本文件中呈现
ADJECTIVE、 NOUN、 ADVERB 或 VERB 等单词的中央，加上他们本人的文本。例如，一个文本文件可能看起来像这样：
The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was
unaffected by these events.
程序将找到这些呈现的单词，并提醒用户取代它们。
Enter an adjective:
silly
Enter a noun:
chandelier
Enter a verb:
screamed
Enter a noun:
pickup truck
以下的文本文件将被创立：
The silly panda walked to the chandelier and then screamed. A nearby pickup truck was unaffected by these events.
后果应该打印到屏幕上，并保留为一个新的文本文件。
”’

import re

def mad_libs(filename_path, save_path):

with open(filename_path,'r') as strings: # 相对路径下的文档
    words = strings.read()
Regex = re.compile(r'\w[A-Z]+')   # \w ：匹配1个任何字母、数字或下划线
finds = Regex.findall(words)
for i in finds:
    replace = input('输出你想替换 {} 的单词:\n'.format(i)) 
    Regex2 = re.compile(i)
    words = Regex2.sub(replace,words,1) # 这个变量必须要是words与下面统一否则只打印最初替换的一个,能够画栈堆图跟踪这个变量的值
print(words)

# strings.close()  不必这一行，with 上下文管理器会主动敞开

with open(save_path,'a') as txt: 
    txt.write(words + '\n') #分行写
    txt.close()
    
# save_txt = open('保留疯狂填词文档.txt','a')
# save_txt.write(words)
# save_txt.close()

if name == ‘__main__’:

filename_path = input('输出要替换的txt文本门路：')    # '疯狂填词原始文档.txt'
save_path = input('输出要保留的文件门路(蕴含文件名称）:') # '保留疯狂填词文档.txt'
mad_libs(filename_path, save_path)

关于后端:Python正则表达式

正则表达式书写

re模块的compile()函数将返回（创立）一个Regex模式对象

（文件读写）疯狂填词2.py

评论

发表回复取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

关于后端:Python正则表达式

正则表达式书写

re模块的compile()函数将返回（创立）一个Regex模式对象

（文件读写）疯狂填词2.py

评论

发表回复 取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

发表回复取消回复