Groovy Tip 29 正則表達式三

Groovy Tip 29 正則表達式三

本篇主要來談談"捕獲組"和"非匹配組"以及與它們相關聯的一些概念。

"捕獲組"應該來說是一個很重要的特性，特別是在進行文字處理的時候。比如，我們經常會遇到一些文字或數字跟一些符號混合在一起，而我們需要把這些文字或數字從這些符號中分離出來。這時候，我們就可以用到"捕獲組"。

先從一個簡單的例子說起。比如，我們有如下的一個email地址：

[email protected]

我們需要從上面的email地址中分離出"fgp"、"sina"和"com"來，如果使用"split"方法的話，我們需要做兩次"split"動作才能達到我們的要求。

但是，如果使用"捕獲組"的話，我們只需要做一次動作。如：

def amail = '[email protected]'

def re = /(.*)@(.*)/.(.*)/

def matcher = (amail =~ re)

println matcher[0]

運行結果爲：

["[email protected]", "fgp", "sina", "com"]

再舉一個看起來有那麼一點點實用的例子，比如我們有如下的一組價格表，由商品名稱、價格以及它們所能打的折扣組成。

computer 3000￥ 10%

mouse 50￥ 0%

memory 200￥ 20%

現在，我們希望把商品名稱、價格和打折分別提取出來。

使用"捕獲組"的代碼如下：

def goods =

"""computer 3000￥ 10%

mouse 50￥ 0%

memory 200￥ 20%"""

def groups = {

def re = /(.*) (.*)￥ (.*)%/

def matcher = (it =~ re)

println matcher[0]

}

goods.split('/n').each(groups)

運行上述代碼的結果爲：

["computer 3000￥ 10%", "computer", "3000", "10"]

["mouse 50￥ 0%", "mouse", "50", "0"]

["memory 200￥ 20%", "memory", "200", "20"]

相比較而言，"非匹配組"的使用就更爲複雜一些，這裏面除了"非匹配組"本身的概念，還有一些相關的概念需要說明。

首先要說明的是"最大匹配"和"最小匹配"的概念。在正則表達式中，我們的一些操作符，如"?"、"*"和"+"在默認的情況下，都是指的"最大匹配"；如果需要需要"最小匹配"，則需要在上述操作符後面加上"?"操作符，才能表示它們是"最小匹配"。

下面來舉一個經典的例子來說明。比如我們有如下的一個html語句：

那麼，我們先進行如下的配置：

def html = '<td>abc</td>'

def re = /<.*>/

def matcher = (html =~ re)

println matcher[0]

再進行如下的匹配：

def html = '<td>abc</td>'

def re = /<.*?>/

def matcher = (html =~ re)

println matcher[0]

其中，第一段代碼就進行的就是"最大匹配"，運行結果爲：

第二段代碼爲"最小匹配"，運行結果爲：

<td>

所謂"非匹配組"，指的是在一個字符串裏，有我們想要的匹配組，也有我們不想要的非匹配組。我們想要的匹配組好說，就是使用我們上面所說到的"捕獲組"來解決；那麼我們不想要的非匹配組，我們該怎麼處理呢？

要匹配"非匹配組"，我們要做的工作其實是很簡單，就是括號，並且在括號裏以"?:"開頭。下面來舉一個例子說明。

還是以上面的價格表爲例，比如我們有如下的價格表：

computer Intel CUP 3000￥ 10%

mouse made in China mainland 50￥ 0%

memory made in Taiwan 200￥ 20%

這個價格表比前面的價格表更爲複雜一些，中間夾雜了一些對商品的描述。現在，我們還是希望取出商品名稱、價格和打折來，而不需要商品的描述。

這樣，我們就用到了"非匹配組"，代碼如下：

def goods =

"""computer Intel CUP 3000￥ 10%

mouse made in China mainland 50￥ 0%

memory made in Taiwan 200￥ 20%"""

def groups = {

def matcher = (it =~ /(.*?)(?: .+)+ (.*)￥ (.*)%/);

if (matcher.matches())

{

println matcher[0]

}

goods.split('/n').each(groups)

運行結果爲：

["computer Intel CUP 3000￥ 10%", "computer", "3000", "10"]

["mouse made in China mainland 50￥ 0%", "mouse", "50", "0"]

["memory made in Taiwan 200￥ 20%", "memory", "200", "20"]

在上面的代碼中，正則表達式中的"(?: .+)+"就是"非匹配組"。值得注意的是，該正則表達式的開頭"(.*?)"，就用到了"最小匹配"的概念，如果我們把其中的問號去掉，變成"最大匹配"，那麼結果又將是什麼樣子呢？

def goods =

"""computer Intel CUP 3000￥ 10%

mouse made in China mainland 50￥ 0%

memory made in Taiwan 200￥ 20%"""

def groups = {

def matcher = (it =~ /(.*)(?: .+)+ (.*)￥ (.*)%/);

if (matcher.matches())

{

println matcher[0]

}

goods.split('/n').each(groups)

運行結果爲：

["computer Intel CUP 3000￥ 10%", "computer Intel", "3000", "10"]

["mouse made in China mainland 50￥ 0%", "mouse made in China", "50", "0"]

["memory made in Taiwan 200￥ 20%", "memory made in", "200", "20"]

可以看到，上面就不是我們想要的結果了。

Groovy Tip 29 正則表達式三

Groovy探索 “as”關鍵字的深入使用

Groovy Tip 32 方法的參數一

Oracle收購SUN對Groovy/Grails的影響

Groovy探索之MOP 七運行期內的方法和屬性分析

Groovy探索之MOP 十 Interceptor 二

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Groovy Tip 29 正則表達式 三

Groovy Tip 29 正則表達式三