python正则表达式最详解

一、正则表达式–元字符

re 模块使 Python 语言拥有全部的正则表达式功能

1. 数量词

# 提取大小写字母混合的单词import rea = \'Excel 12345Word23456PPT12Lr\'r = re.findall(\'[a-zA-Z]{3,5}\',a)# 提取字母的数量3个到5个print(r)# [\'Excel\', \'Word\', \'PPT\']# 贪婪 与 非贪婪  【Python默认使用贪婪模式】# 贪婪：\'[a-zA-Z]{3,5}\'# 非贪婪：\'[a-zA-Z]{3,5}?\' 或 \'[a-zA-Z]{3}\'# 建议使用后者，不要使用?号，否则你会与下面的?号混淆# 匹配0次或无限多次 *号，*号前面的字符出现0次或无限次import rea = \'exce0excell3excel3\'r = re.findall(\'excel*\',a)r = re.findall(\'excel.*\',a) # [\'excell3excel3\']# excel 没有l 有很多l都可以匹配出来print(r)# [\'exce\', \'excell\', \'excel\']# 匹配1次或者无限多次 +号，+号前面的字符至少出现1次import rea = \'exce0excell3excel3\'r = re.findall(\'excel+\',a)print(r)# [\'excell\', \'excel\']# 匹配0次或1次  ?号，?号经常用来去重复import rea = \'exce0excell3excel3\'r = re.findall(\'excel?\',a)print(r)# [\'exce\', \'excel\', \'excel\']

2. 字符匹配

line = \'xyz,xcz.xfc.xdz,xaz,xez,xec\'r = re.findall(\'x[de]z\', line)# pattern 是x开始，z结束，含d或eprint(r)# [\'xdz\', \'xez\']r = re.findall(\'x[^de]z\', line)# pattern 是x开始，z结束，不是含d或eprint(r)# [\'xyz\', \'xcz\', \'xaz\']

# \\w 可以提取中文，英文，数字和下划线，不能提取特殊字符import rea = \'Excel 12345Word\\n23456_PPT12lr\'r = re.findall(\'\\w\',a)print(r)# [\'E\', \'x\', \'c\', \'e\', \'l\', \'1\', \'2\', \'3\', \'4\', \'5\', \'W\', \'o\', \'r\', \'d\', \'2\', \'3\', \'4\', \'5\', \'6\', \'_\', \'P\', \'P\', \'T\', \'1\', \'2\', \'l\', \'r\']# \\W 提取特殊字符，空格 \\n \\timport rea = \'Excel 12345Word\\n23456_PPT12lr\'r = re.findall(\'\\W\',a)print(r)# [\' \', \'\\n\']

3. 边界匹配

# 限制电话号码的位置必需是8-11位才能提取import retel = \'13811115888\'r = re.findall(\'^\\d{8,11}$\',tel)print(r)# [\'13811115888\']

4. 组

# 将abc打成一个组，{2}指的是重复几次，匹配abcabcimport rea = \'abcabcabcxyzabcabcxyzabc\'r = re.findall(\'(abc){2}\',a)  # 与# [\'abc\', \'abc\']print(r)r = re.findall(\'(abc){3}\',a)# [\'abc\']

5. 匹配模式参数

# findall第三参数 re.I忽略大小写import rea = \'abcFBIabcCIAabc\'r = re.findall(\'fbi\',a,re.I)print(r)# [\'FBI\']# 多个模式之间用 | 连接在一起import rea = \'abcFBI\\nabcCIAabc\'r = re.findall(\'fbi.{1}\',a,re.I | re.S)# 匹配fbi然后匹配任意一个字符包括\\nprint(r)# [\'FBI\\n\']

二、方法

re.findall

匹配出字符串中所有与制定值相关的值
以列表的形式返回
未匹配则返回空列表

import rere.findall(pattern, string, flags=0)pattern.findall(string[ , pos[ , endpos]])

import reline = \"111aaabbb222小呼噜奥利奥\"r = re.findall(\'[0-9]\',line)print(r)# [\'1\', \'1\', \'1\', \'2\', \'2\', \'2\']

re.match

re.match 尝试从字符串的起始位置匹配一个模式
如果不是起始位置匹配成功的话，match()就返回none。

re.match(pattern, string, flags=0)# (标准，要匹配的，标志位)

print(re.match(\'www\',\'www.xxxx.com\'))print(re.match(\'www\',\'www.xxxx.com\').span())print(re.match(\'com\',\'www.xxxx.com\'))

<re.Match object; span=(0, 3), match=\'www\'>(0, 3)None

group匹配对象

import rea = \'life is short,i use python,i love python\'r = re.search(\'life(.*)python(.*)python\',a)print(r.group(0))       # 完整正则匹配 ，life is short,i use python,i love pythonprint(r.group(1))       # 第1个分组之间的取值 is short,i useprint(r.group(2))       # 第2个分组之间的取值 ,i loveprint(r.group(0,1,2)) # 以元组形式返回3个结果取值 (\'life is short,i use python,i love python\', \' is short,i use \', \',i love \')print(r.groups())       # 返回就是group(1)和group(2) (\' is short,i use \', \',i love \')

import re# .*        表示任意匹配除换行符（\\n、\\r）之外的任何单个或多个字符# (.*?)     表示\"非贪婪\"模式，只保存第一个匹配到的子串# re.M      多行匹配，影响 ^ 和 $# re.I      使匹配对大小写不敏感line = \"Cats are smarter than dogs\"matchObj1 = re.match(r\'(.*) are (.*?) .*\', line,  re.M|re.I)matchObj2 = re.match(r\'(.*) smarter (.*?) .*\', line,  re.M|re.I)matchObj3 = re.match(r\'(.*) than (.*)\', line,  re.M|re.I)print(matchObj1)print(matchObj2)print(matchObj3)# <re.Match object; span=(0, 26), match=\'Cats are smarter than dogs\'># <re.Match object; span=(0, 26), match=\'Cats are smarter than dogs\'># Noneif matchObj1:print (\"matchObj1.group() : \", matchObj1.group())print (\"matchObj1.group(1) : \", matchObj1.group(1))print (\"matchObj1.group(2) : \", matchObj1.group(2))else:print (\"No match!!\")if matchObj2:print (\"matchObj2.group() : \", matchObj2.group())print (\"matchObj2.group(1) : \", matchObj2.group(1))print (\"matchObj2.group(2) : \", matchObj2.group(2))else:print (\"No match!!\")if matchObj3:print (\"matchObj3.group() : \", matchObj3.group())print (\"matchObj3.group(1) : \", matchObj3.group(1))print (\"matchObj3.group(2) : \", matchObj3.group(2))else:print (\"No match!!\")# matchObj1.group() :  Cats are smarter than dogs# matchObj1.group(1) :  Cats# matchObj1.group(2) :  smarter# matchObj2.group() :  Cats are smarter than dogs# matchObj2.group(1) :  Cats are# matchObj2.group(2) :  than# matchObj3.group() :  Cats are smarter than dogs# matchObj3.group(1) :  Cats are smarter# matchObj3.group(2) :  dogs

import re# 点 是匹配单个字符# 星是前面的东西出现0次或无数次# 点星就是任意字符出现0次或无数次str = \"a b a b\"matchObj1 = re.match(r\'a(.*)b\', str,  re.M|re.I)matchObj2 = re.match(r\'a(.*?)b\', str,  re.M|re.I)print(\"matchObj1.group() : \", matchObj1.group())print(\"matchObj2.group() : \", matchObj2.group())# matchObj1.group() :  a b a b# matchObj2.group() :  a b

re.search

扫描整个字符串并返回第一个成功的匹配。

re.search(pattern, string, flags=0)

import  reline = \"cats are smarter than dogs\"matchObj = re.match(r\'dogs\',line,re.M|re.I)matchObj1= re.search(r\'dogs\',line,re.M|re.I)matchObj2= re.match(r\'(.*) dogs\',line,re.M|re.I)if matchObj:print (\"match --> matchObj.group() : \", matchObj.group())else:print (\"No match!!\")if matchObj1:print (\"match --> matchObj1.group() : \", matchObj1.group())else:print (\"No match!!\")if matchObj2:print (\"match --> matchObj2.group() : \", matchObj2.group())else:print (\"No match!!\")# No match!!# match --> matchObj1.group() :  dogs# match --> matchObj2.group() :  cats are smarter than dogs

re.compile

re.compile是将正则表达式转换为模式对象
这样可以更有效率匹配。使用compile转换一次之后，以后每次使用模式时就不用进行转换

三、检索和替换

re.sub 替换字符串

re.sub(\'被替换的\',\'替换成的\',a)

# 把FBI替换成BBQimport rea = \'abcFBIabcCIAabc\'r = re.sub(\'FBI\',\'BBQ\',a)print(r)# 把FBI替换成BBQ，第4参数写1，证明只替换第一次，默认是0（无限替换）import rea = \'abcFBIabcFBIaFBICIAabc\'r = re.sub(\'FBI\',\'BBQ\',a,1)print(r)# abcBBQabcCIAabc# abcBBQabcFBIaFBICIAabc

# 把函数当参数传到sub的列表里，实现把业务交给函数去处理，例如将FBI替换成$FBI$import rea = \'abcFBIabcFBIaFBICIAabc\'def 函数名(形参):分段获取 = 形参.group()           # group（）在正则表达式中用于获取分段截获的字符串，获取到FBIreturn \'$\' + 分段获取 + \'$\'r = re.sub(\'FBI\',函数名,a)print(r)

总结

本篇文章就到这里了，希望能够给你带来帮助，也希望您能够多多关注脚本之家的更多内容！

您可能感兴趣的文章:

Python 正则表达式详解
python正则表达式查找和替换内容的实例详解
Python正则表达式的应用详解
正则表达式+Python re模块详解
Python常用的正则表达式处理函数详解
python使用正则表达式去除中文文本多余空格，保留英文之间空格方法详解
详解Python 正则表达式模块

目录

一、正则表达式–元字符

1. 数量词

2. 字符匹配

3. 边界匹配

4. 组

5. 匹配模式参数

二、方法

re.findall

re.match

group匹配对象

re.search

re.compile

三、检索和替换

re.sub 替换字符串

总结

相关推荐

热门文章

热门标签

回顶部