[Leetcode] - Longest Substring Without Repeating Characters

原創

2018-09-01 00:32

Given a string, find the length of the longest substring without repeating characters. For example, the longest substring without repeating letters for "abcabcbb" is "abc", which the length is 3. For "bbbbb" the longest substring is "b", with the length of 1.

這兩天一直在努力研究各種substring相關的問題，於是在Leetcode上面找到了這道題，複習一下當時的想法。具體思路是這樣的，用一個HashMap來記錄每個出現的字符和它們在字符串中所出現的位置，也就是index。所以說，這個HashMap應該被初始化爲HashMap<Character, Integer>。用start和end兩個變量來保存當前NRCS(Non-Repeating Character Substring)的起始和終止位置，說白了就是一個窗口，start初始爲0，end用作循環裏面自增的變量，end的大小等於string長度的時候結束循環。最後用len這個變量記錄最長的NRCS的長度，將其初始化爲0，因爲NRCS的長度至少爲1。

有了這些東西之後，我們就可以開始遍歷這個string求答案了。對於每個字符，如果HashMap中沒有，就將它以及它的index放進去。如果HashMap中已經有了這個Key的話，我們需要做如下幾件事情：

1. 首先，把這個字符以前出現的index從HashMap裏面拿出來，存在變量index裏面。

2. 然後，比較end－start和len的大小，如果需要的話，更新len的值。（注意：end－start得到的值就是當前NRCS的長度）

3. 接着，檢查字符串在start和index間的所有字符，對於每個字符，把HashMap中對應的key－value pair刪除。

4. 最後，將start設置爲這個index加1的位置。

還有一點需要注意的是，如果這個字符串沒有重複的字符，那麼上面的方法將不會更新len的值。所以，在循環結束之後，還需要比較一次end－start和len的大小，需要時update一下len的值。

綜上所述，time complexity是O(n)。由於用了HashMap，所以space complexity是O(d)，這裏面d是字符串中不重複字符的個數。

代碼如下：

public class Solution {
    public int lengthOfLongestSubstring(String s) {
        if(s==null || s.length()==0) return 0;
        if(s.length()==1) return 1;
        
        HashMap<Character, Integer> map = new HashMap<Character, Integer>();
        int start=0, end=1, len=Integer.MIN_VALUE;
        map.put(s.charAt(start), 0);
        while(end < s.length()) {
            if(map.containsKey(s.charAt(end))) {    // find duplicate characters
                len = Math.max(len, end-start);     // update the maximum length if necessary
                int index = map.get(s.charAt(end));
                for(int k=start; k<=index; k++) {
                    map.remove(s.charAt(k));
                }
                start = index+1;
            }
            map.put(s.charAt(end), end);
            end++;
        }
        len = Math.max(len, end-start);     // in case there is no duplicate in this string
        return len;
    }
}

捎帶手的複習一下substring，subsequence，prefix，suffix等的定義，下面是wikipedia裏面的一段話：

A substring of a string $S$ is another string $S'$ that occurs "in" $S$ . For example, "the best of" is a substring of "It was the best of times". This is not to be confused with subsequence, which is a generalization of substring. For example, "Itwastimes" is a subsequence of "It was the best of times", but not a substring.

Prefix and suffix are refinements of substring. A prefix of a string $S$ is a substring of $S$ that occurs at the beginning of $S$ . A suffix of a string $S$ is a substring that occurs at the end of $S$ .

Not including the empty substring, the number of substrings of a string of length $n$ where symbols only occur once, is the number of ways to choose two distinct places between symbols to start/end the substring. Including the very beginning and very end of the string, there are $n+1$ such places. So there are $\tbinom{n+1}{2} = \tfrac{n(n+1)}{2}$ non-empty substrings.