正則表達式初探(Java String regex Grok)

前言

什麼是正則表達式?不同的網站的解釋略有差別。在此我引用 wikipedia 的版本:In theoretical computer science and formal language theory, a regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. 直譯過來就是:一個字符的序列,它定義了一個搜索模式

很多編程語言內置了regex ( regular expression 的縮寫 ) 的功能(都是一些大神寫的算法,我們凡人學會使用就行了),不同的語言在語法定義上略有不同。我初次學習正則表達式,是基於 java 的正則表達式。

來幾個有用的網址。
正則表達式快速入門
常用的正則表達式集錦
Java 正則表達式中文學習網站
Java 正則表達式 English 學習網站
在線測試 Grok 正則表達式網站
Grok 正則表達式學習
BM 算法詳解,傳說中的 Ctrl + F ?


talk is cheap, show me the code

String 的 regex

String 有 4 個方法用到了 regex : matches( ),split( ), replaceFirst( ), replaceAll( )

package regextest;

public class RegexTestStrings
{
    public final static String EXAMPLE_TEST = 
    "This is my small example string which I'm going to use for pattern matching   .";

    public static void main(String[] args)
    {
        // 判斷是否是:第一個字符是‘word字符’的字符串
        System.out.println(EXAMPLE_TEST.matches("\\w.*")); 

        // 用 white spaces 拆開字符串,返回拆開後的String數組
        String[] splitString = (EXAMPLE_TEST.split("\\s+")); 
        System.out.println(splitString.length);
        for (String string : splitString)
        {
            System.out.println(string);
        }

        // 把符合正則式"\\s+"的字符串,全部替換成"才"
        System.out.println(EXAMPLE_TEST.replaceFirst("\\s+", "才")); 

        // 把符合正則式"\\s+"的字符串,全部替換成"才"
        System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "才")); 
    }
}

輸出結果:

true
15
This
is
my
small
example
string
which
I'm
going
to
use
for
pattern
matching
.
This才is my small example string which I'm going to use for pattern matching   .
This才is才my才small才example才string才which才I'm才going才tousefor才pattern才matching才.

java. util. regex

import java.util.regex.Matcher 和 java.util.regex.Pattern,裏面有很多方法可以用

package regextest;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{

    public static void main(String[] args)
    {
        String line = "The price for iPhone is 5288, which is a little expensive.";
        // 提取字符串中的唯一的數字,圓括號是用來分組的, ^ 是“取反”的意思
        String regex = "(.*[^\\d])(\\d+)(.*)";

        // 創建 Pattern 對象
        Pattern pattern = Pattern.compile(regex);

        // 創建 matcher 對象
        Matcher mather = pattern.matcher(line);

        if (mather.find())
        {
            System.out.println("Found value: " + mather.group(2));
        }
        else
        {
            System.out.println("NO MATCH");
        }
    }

}

輸出結果:

Found value: 5288

grok 更加強大的 regex

在 Matcher,Pattern 的基礎上, import 了很多包;進行了升級,可以調用的方法更多,更加強大。

import com.google.code.regexp.Matcher;
import com.google.code.regexp.Pattern;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import org.apache.commons.lang3.StringUtils;

某網站對Grok的定義:
Java Grok is simple tool that allows you to easily parse logs and other files (single line). With Java Grok, you can turn unstructured log and event data into structured data (JSON).

Java Grok program is a great tool for parsing log data and program output. You can match any number of complex patterns on any number of inputs (processes and files) and have custom reactions.

一個簡單的例子:從日誌文件中讀取數據,提取想要的信息:一是時間,二是來源IP

輸入:

Mon Nov  9 06:47:33 2015; UDP; eth1; 461 bytes; from 88.150.240.169:tag-pm to 123.40.222.170:sip
Mon Nov  9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:49208 to 123.40.222.170:sip
Mon Nov  9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:54159 to 123.40.222.170:sip
Mon Nov  9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:53640 to 123.40.222.170:sip
Mon Nov  9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:52483 t
package com.yz.utils.grok.api;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;

public class GrokTest
{

        public static void main(String[] args)
        {
            FileInputStream   fiStream = null;
            InputStreamReader iStreamReader = null;
            BufferedReader    bReader = null; 
            //用於包裝InputStreamReader,提高處理性能。因爲BufferedReader有緩衝的,而InputStreamReader沒有。

        try
        {
            String line = "";
            // 從文件系統中的某個文件中獲取字節
            fiStream = new FileInputStream("C:\\dev1\\javagrok\\javagrok\\iptraf_eth1_15.06.11"); 

            // InputStreamReader 是字節流通向字符流的橋樑
            iStreamReader = new InputStreamReader(fiStream); 

            // 從字符輸入流中讀取文件中的內容,封裝了一個new InputStreamReader的對象
            bReader = new BufferedReader(iStreamReader);     

            Grok grok = new Grok();
            // Grok 提供了很多現成的pattern,可以直接拿來用。用已有的pattern,來構成新的pattern。
             grok.addPatternFromFile("c:\\dev1\\cloudshield\\patterns\\patterns"); 

            grok.addPattern("fromIP", "%{IPV4}");
            // compile 一個 pattern,期間我被空格坑了一下
            grok.compile(".*%{MONTH}\\s+%{MONTHDAY}\\s+%{TIME}\\s+%{YEAR}.*%{fromIP}.* to 123.40.222.170:sip"); 
            Match match = null;

            while((line = bReader.readLine()) != null)       // 注意這裏的括號,被坑了一次
            {
                match = grok.match(line);
                match.captures();
                if(!match.isNull())
                {
                    System.out.print(match.toMap().get("YEAR").toString() + " ");
                    System.out.print(match.toMap().get("MONTH").toString() + " ");
                    System.out.print(match.toMap().get("MONTHDAY").toString() + " ");
                    System.out.print(match.toMap().get("TIME").toString() + " ");
                    System.out.print(match.toMap().get("fromIP").toString() + "\n");
                }
                else
                {
                    System.out.println("NO MATCH");
                }
            }

        }
        catch (FileNotFoundException fnfe)
        {
            System.out.println("file not found exception");
            fnfe.printStackTrace();
        }
        catch (IOException ioe)
        {
            System.out.println("input/output exception");
            ioe.printStackTrace();
        }
        catch (Exception e)
        {
            System.out.println("unknown exception");
            e.printStackTrace();
        }
        finally
        {
            try
            {
                if(bReader!=null)   
                {
                    bReader.close();
                    bReader=null;
                }
                if(iStreamReader!=null)
                {
                    iStreamReader.close();
                    iStreamReader=null;
                }
                if(fiStream!=null)
                {
                    fiStream.close();
                    fiStream=null;
                }
            }
            catch(IOException ioe)
            {
                System.out.println("input/output exception");
                ioe.printStackTrace();
            }
        }
    }

}

輸出:

2015 Nov 9 06:47:33 88.150.240.169
2015 Nov 9 06:47:34 88.150.240.169
2015 Nov 9 06:47:34 88.150.240.169
2015 Nov 9 06:47:34 88.150.240.169
NO MATCH
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章