LeetCode 0010 Regular Expression Matching

原創

2020-05-12 05:50

The General Solution

I think we better understand the general solution for string matching problem.
The general solution is Nondeterministic Finite Automate (NFA) and Deterministic Finite Automate (DFA).
If you don’t know or forget the 2 things, do not panic, spend a little time on Compiler Construction and you will learn a lot.

The difference between NFA and DFA is that DFA is a simpler version of NFA, which means DFA run faster than NFA.
But please notice that we need more time to build a DFA from a NFA.

Since we only perform string match once for each query, building a NFA and querying from it is obviously faster than building a DFA from a NFA and querying from the DFA.

Check the NFA solution for more implementation detail.

Time & Space Complex

Say string s’s length is $n$ and string p’s length is $m$ .
Building a NFA needs $O(m)$ time and $O(m)$ space.
Querying an answer needs $O(n*m)$ time in worst case with $O(n)$ space if we deal with the GC perfectly.

Simpler DP Solution

With the idea of NFA, we can evaluate a simpler solution since complex data structures in interview coding are meaningless.
(but they are still very important in daily development!!)

Imagine the process of NFA:

input an empty character $\varepsilon$ to the initial states of NFA.
generate new states from previous states and input.
input next character of the string s to current states of NFA.
relate step 2 and step 3 until all characters of string s have been processed.
check whether current states have the final state of NFA.

Since each states in NFA corresponding to each characters of string s, we do not need to construct the NFA explicitly.
Every character of string s can be used as NFA states directly.

So we can write the following DP equation:

if $P[j]$ is a repeatable character (end with $*$ ), then $f_{i, j} = f_{i, j-1} || isCharacterMatch(S_i, P_j) \&\& (f_{i-1, j-1} || f_{i-1, j})$
if $P[j]$ is not a repeatable character, then $f_{i, j} = isCharacterMatch(S_i, P_j) \&\& f_{i-1, j-1}$

$f_{i,j}$ means the query result of string $S[0 ... i]$ from the $P[0 ... j]$ NFA.

Then the simpler DP solution come out:

boolean isMatch(int i, int j) {
    if (j < 0) {
      return i < 0;
    }
    if (i < 0) {
      return '*' == p[j] && isMatch(i, j-2);
    }
    if ('*' == p[j]) {
      // repeatable
      int realJ = j -1;
      return isMatch(i, realJ-1) ||
        isCharacterMatch(s[i], p[realJ]) && (isMatch(i-1, realJ-1) || isMatch(i-1, j));
    }
    else {
      return isCharacterMatch(s[i], p[j]) && isMatch(i-1, j-1);
    }
}

boolean isCharacterMatch(char chS, char chP) {
    return '.' == chP || chS == chP;
}

Time & Space Complex

Needs $O(N*M)$ time and space in worst case.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

LeetCode 0010 Regular Expression Matching

The General Solution

Time & Space Complex

Simpler DP Solution

Time & Space Complex

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

【Python】保存gym截圖

【譯】使用 GitHub Copilot 作爲你的編碼 GPS

Linux 服務器配置-安裝portainer-ce社區版

外行也能讀懂的網絡硬件設備功能原理速成

安裝Auto-GPT

codeforces 405 D. Bear and Tree Jumps 樹形dp

bzoj 4297 Rozstaw szyn 思維 dfs

codeforces 399 B. Code For 1 遞推規律

LeetCode 0010 Regular Expression Matching

HDU 4827 Cycle Cocycle 01高斯消元 bitset加速模板

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結