[python爬蟲之路day8]:正則表達式

前面我們學習了lxml和Beautifulsoup解析工具，今天我們來學習相對較難的正則表達式。
在python中，正則表達式簡言之就是在一串字符提取你想要的字符串。
import re
#一.匹配單個字符

*#1.匹配字符串*
text="heooo"
ret=re.match("he",text)
print(ret.group())
*#2.點（.）匹配任意字符[一個]，不能匹配換行符*
text="heooo"
ret=re.match(".",text)
print(ret.group())
*#3.“\d”匹配數字0-9*
text='2'
ret=re.match("\d",text)
print(ret.group())
*#4.“\D”匹配任意非數字,和”\d“正好相反*
text='d'
ret=re.match("\D",text)
print(ret.group())
*#5.\s匹配空白字符，“\n,\r,\t,空格”*
*#6.\w匹配a-z,A-Z,數字和下劃線*
*#7.\W匹配與\w相反的
#8.組合匹配[],滿足[]的內容就可以匹配*
text='2332-23212112'
ret=re.match("[\d\-]+",text)
print(ret.group())
*#8.1.[]匹配替代\d*
#ret=re.match("[0-9]",text)
*#8.2.[]匹配替代\D， 使用 "  ^  "*
text='='
ret=re.match("[^0-9]",text)
print(ret.group())
*#8.3.[]匹配替代\w*
text='D'
ret=re.match("[a-zA-Z0-9]",text)
print(ret.group())
*#8.4.[]匹配替代\W*
text='--'
ret=re.match("[^a-zA-Z0-9]+",text)
print(ret.group())

#二.匹配多個字符

#1.使用*匹配0個或多個字符*
text='98342'
ret=re.match("\d*",text)
print(ret.group())
*#2.使用+ 匹配1個或多個字符*
text='98342'
ret=re.match("\d+",text)
print(ret.group())
*#3.使用+ 匹配1個或多個字符*
text='98342'
ret=re.match("\d+",text)
print(ret.group())
*#4.使用？ 匹配一個或者0個字符*
text='asd'
ret=re.match("\w?",text)
print(ret.group())
*#5.{m}匹配m個字符*
text='asdsds'
ret=re.match("\w{4}",text)
print(ret.group())
*#6.{m,n}匹配m-n個字符,以[2,3,4,5]中最多爲準*
text='asdj33g'
ret=re.match("\w{2,5}",text)
print(ret.group())

#######小案例#########

#1.匹配手機號
text='13837389987'
ret=re.match("1[345789]\d{9}",text)
print(ret.group())
#2.匹配郵箱
text='[email protected]'
ret=re.match("\w+@[a-z0-9]+\.[a-z]+",text)
print(ret.group())
#3.匹配網頁
text='https://www.bilibili.com/'
ret=re.match("(http|https|ftp)://[^/s]+",text)
print(ret.group())
#4.匹配身份證
text='12432318881149884X'#17數字+x/X
ret=re.match("\d{17}[\dxX]",text)
print(ret.group())
#三.開始結束或語法  "^",在[]中表示取反
#1.^表示開始（脫字號）
text='fffs'
ret=re.search("^\w+",text)
print(ret.group())
#2.$表示結尾
text='[email protected]'
ret=re.match("\w+@qq\.com$",text)
print(ret.group())
#3."|"匹配多個字符串或者表達式
#4.貪婪模式和非貪婪模式
text='14232984'
ret=re.match("\d+",text)#貪婪1423984
ret=re.match("\d+?",text)#非貪婪1
print(ret.group())
text='<>嘻嘻嘻<\h1>'
ret=re.match("<.+>",text)#貪婪<h1>嘻嘻嘻<\h1>

ret=re.match("<.+?>",text)#非貪婪<h1>

#4.1.匹配0-100
text='12'
ret=re.match("0$|[1-9]\d?$|100$",text)
#問號表示要麼有1個要麼就沒有

#四。轉義字符和原生字符串
text=‘money is $222’
ret=re.search("$\d+",text)
print(ret.group())
加\之後失去原來意義，變成普通字符。
或者加 r" “(原生字符串)
在正則表達式和python中，“\”表示轉義,所以如果想在普通的字符中匹配出,應該寫4個, 使用原生字符串可以解決這個問題。
text=‘money is \c’
ret=re.search(”\\c",text)
print(ret.group())
ret=re.search(r"\c",text) 【原生字符串】
print(ret.group())
#五，match函數及search分組
1.match 從開頭匹配
2.search 整句匹配，但是一旦匹配成功就不匹配了。
3.分組：

result：

#六.re是正則的一個庫，下面介紹其中的常用函數。

#findall 以列表形式返回
text='the app is $33 and the sun is $23'
ret=re.findall('\$\d+',text)
print(ret)#注意此處不用group
#sub函數，後者替換前者匹配的部分
text='the app is $33 and the sun is $23'
ret=re.sub('\$\d+',"2",text)
print(ret)'''
#小栗子：
text='''<dd class="job_bt">
        <h3 class="description">職位描述：</h3>
        <div class="job-detail">
        <p>【職位職責】<br> &nbsp;1. &nbsp;負責部門 python 組的管理；<br> &nbsp;2. &nbsp;負責高質量的設計和編碼，承擔重點、難點的技術攻堅；</p>
        </div>
    </dd>
ret=re.sub('<.+?>','',text)
print(ret)
#split,按照要求分割成列表
text='he is a do%g'
ret=re.split('[^a-zA-Z]',text)
print(ret)'''
#compile
#1.預編譯，提高效率
text='he is a  98.32'
r=re.compile('\d+\.?\d*')
ret=re.search(r,text)
print(ret.group())
#2.提供一種寫法
text='he is a  98.32'
r=re.compile('''
    \d+ #小數點前
    \.? #小數點
    \d*#小數點後
   ''', re.VERBOSE)
ret=re.search(r,text)
print(ret.group())

[python爬蟲之路day8]:正則表達式

推薦2款開源、美觀的WinForm UI控件庫

NET9 AspnetCore將整合OpenAPI的文檔生成功能而無需三方庫

[LaTeX科研第一步]：我用LaTeX整理了一份LaTeX極速入門手冊，分享給大家~

[LaTeX科研入門07]：多行公式的寫入

[LaTeX科研入門08]：極速設置參考文獻

[LaTeX科研入門06]：數學矩陣

[python爬蟲之路day8]:正則表達式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結