LeetCode 0010 Regular Expression Matching

The General Solution

I think we better understand the general solution for string matching problem.
The general solution is Nondeterministic Finite Automate (NFA) and Deterministic Finite Automate (DFA).
If you don’t know or forget the 2 things, do not panic, spend a little time on Compiler Construction and you will learn a lot.

The difference between NFA and DFA is that DFA is a simpler version of NFA, which means DFA run faster than NFA.
But please notice that we need more time to build a DFA from a NFA.

Since we only perform string match once for each query, building a NFA and querying from it is obviously faster than building a DFA from a NFA and querying from the DFA.

Check the NFA solution for more implementation detail.

Time & Space Complex

Say string s’s length is nn and string p’s length is mm.
Building a NFA needs O(m)O(m) time and O(m)O(m) space.
Querying an answer needs O(nm)O(n*m) time in worst case with O(n)O(n) space if we deal with the GC perfectly.

Simpler DP Solution

With the idea of NFA, we can evaluate a simpler solution since complex data structures in interview coding are meaningless.
(but they are still very important in daily development!!)

Imagine the process of NFA:

  1. input an empty character ε\varepsilon to the initial states of NFA.
  2. generate new states from previous states and input.
  3. input next character of the string s to current states of NFA.
  4. relate step 2 and step 3 until all characters of string s have been processed.
  5. check whether current states have the final state of NFA.

Since each states in NFA corresponding to each characters of string s, we do not need to construct the NFA explicitly.
Every character of string s can be used as NFA states directly.

So we can write the following DP equation:

  • if P[j]P[j] is a repeatable character (end with *), then fi,j=fi,j1isCharacterMatch(Si,Pj)&&(fi1,j1fi1,j)f_{i, j} = f_{i, j-1} || isCharacterMatch(S_i, P_j) \&\& (f_{i-1, j-1} || f_{i-1, j})

  • if P[j]P[j] is not a repeatable character, then fi,j=isCharacterMatch(Si,Pj)&&fi1,j1f_{i, j} = isCharacterMatch(S_i, P_j) \&\& f_{i-1, j-1}

fi,jf_{i,j} means the query result of string S[0...i]S[0 ... i] from the P[0...j]P[0 ... j] NFA.

Then the simpler DP solution come out:

boolean isMatch(int i, int j) {
    if (j < 0) {
      return i < 0;
    }
    if (i < 0) {
      return '*' == p[j] && isMatch(i, j-2);
    }
    if ('*' == p[j]) {
      // repeatable
      int realJ = j -1;
      return isMatch(i, realJ-1) ||
        isCharacterMatch(s[i], p[realJ]) && (isMatch(i-1, realJ-1) || isMatch(i-1, j));
    }
    else {
      return isCharacterMatch(s[i], p[j]) && isMatch(i-1, j-1);
    }
}

boolean isCharacterMatch(char chS, char chP) {
    return '.' == chP || chS == chP;
}

Time & Space Complex

Needs O(NM)O(N*M) time and space in worst case.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章