split 陷阱分析

java 字符串split有很多坑，使用時請小心！！

Java代碼

  System.out.println(":ab:cd:ef::".split(":").length);//末尾分隔符全部忽略    

  System.out.println(":ab:cd:ef::".split(":",-1).length);//不忽略任何一個分隔符    

  System.out.println(StringUtils.split(":ab:cd:ef::",":").length);//最前面的和末尾的分隔符全部都忽略,apache commons    

  System.out.println(StringUtils.splitPreserveAllTokens(":ab:cd:ef::",":").length);//不忽略任何一個分隔符 apache commons     

輸出：    

4    

6    

3    

6

看了下jdk裏String類的public String[] split(String regex,int limit)方法，感覺平時不太會用這方法，以爲在用正則表達式來拆分時候，如果匹配到的字符是最後一個字符時，會拆分出兩個空字符串，例如"o"split("o",5) or "o"split("o",-2)時候結果是"" "" 也就是下圖中紅框裏的內容，所以平時一般都用split(String regex) 方法，其實也就等同於split(String regex，0)方法，把結尾的空字符串丟棄！

String的split方法用到的參數是一個正則式，雖然強大，但是有時候容易出錯。而且string並沒有提供簡化版本。org.apache.commons.lang.StringUtils提供的split改變了這一狀況，開始使用完整的字符串作爲參數，而不是regex。同時，對類似功能的jdk版本的StringTokenizer，在內部方法splitWorker中有段註釋：Direct code is quicker than StringTokenizer.也就是說，這個是更快的一個工具了~~

StringUtils裏的split和splitPreserveAllTokens 底層都是調用splitWorker方法實現的
下面分別來理解下兩個私有的splitWorker方法：

Java代碼  

private static String[] splitWorker(String str, char separatorChar, boolean preserveAllTokens)  

{  

        // Performance tuned for 2.0 (JDK1.4)  

        if (str == null) {  

            return null;  

        }  

        int len = str.length();  

        if (len == 0) {  

            return ArrayUtils.EMPTY_STRING_ARRAY;  

        }  

        List list = new ArrayList();  

        int i = 0, start = 0;  

        boolean match = false;  

        boolean lastMatch = false;  

        while (i < len) {  

            if (str.charAt(i) == separatorChar) {  

                if (match || preserveAllTokens) {  

                    list.add(str.substring(start, i));  

                    match = false;  

                    lastMatch = true;  

                }  

                start = ++i;  

                continue;  

            }  

            lastMatch = false;  

            match = true;  

            i++;  

        }  

        if (match || (preserveAllTokens && lastMatch)) {  

            list.add(str.substring(start, i));  

        }  

        return (String[]) list.toArray(new String[list.size()]);  

    }

是一個核心方法，用於拆分字符串，其中字符c表示分隔符，另外布爾變量b表示c在首尾的不同處理方式。爲真，則在首位留一個""的字符串。但是在中間是沒有作用的。該方法執行如下操作：
如果字符串爲null，則返回null。
如果字符串爲""，則返回""。
用i作爲指針遍歷字符串，match和lastMatch分別表示遇到和最後遇到可分割的內容。
如果字符串中第一個就遇到c，則看b的值，如果爲真，則會在結果數組中存入一個""。如果沒遇到，match置真，lastMatch置假，表示有要分割的內容。
一旦遇到c，則在結果數組中輸出字符串在i之前的子字符串，並把起始點調整到i之後。且match置假，lastMatch置真。
遍歷結束，如果match爲真（到最後也沒有遇到c），或者lastMatch和b同爲真（最後一個字符是c），則輸出最後的部分（如果是後者，則會輸出一個""）。

Java代碼  

private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens)  

{  

        // Performance tuned for 2.0 (JDK1.4)  

        // Direct code is quicker than StringTokenizer.  

        // Also, StringTokenizer uses isSpace() not isWhitespace()  

        if (str == null) {  

            return null;  

        }  

        int len = str.length();  

        if (len == 0) {  

            return ArrayUtils.EMPTY_STRING_ARRAY;  

        }  

        List list = new ArrayList();  

        int sizePlus1 = 1;  

        int i = 0, start = 0;  

        boolean match = false;  

        boolean lastMatch = false;  

        if (separatorChars == null) {  

            // Null separator means use whitespace  

            while (i < len) {  

                if (Character.isWhitespace(str.charAt(i))) {  

                    if (match || preserveAllTokens) {  

                        lastMatch = true;  

                        if (sizePlus1++ == max) {  

                            i = len;  

                            lastMatch = false;  

                        }  

                        list.add(str.substring(start, i));  

                        match = false;  

                    }  

                    start = ++i;  

                    continue;  

                }  

                lastMatch = false;  

                match = true;  

                i++;  

            }  

        } else if (separatorChars.length() == 1) {  

            // Optimise 1 character case  

            char sep = separatorChars.charAt(0);  

            while (i < len) {  

                if (str.charAt(i) == sep) {  

                    if (match || preserveAllTokens) {  

                        lastMatch = true;  

                        if (sizePlus1++ == max) {  

                            i = len;  

                            lastMatch = false;  

                        }  

                        list.add(str.substring(start, i));  

                        match = false;  

                    }  

                    start = ++i;  

                    continue;  

                }  

                lastMatch = false;  

                match = true;  

                i++;  

            }  

        } else {  

            // standard case  

            while (i < len) {  

                if (separatorChars.indexOf(str.charAt(i)) >= 0) {  

                    if (match || preserveAllTokens) {  

                        lastMatch = true;  

                        if (sizePlus1++ == max) {  

                            i = len;  

                            lastMatch = false;  

                        }  

                        list.add(str.substring(start, i));  

                        match = false;  

                    }  

                    start = ++i;  

                    continue;  

                }  

                lastMatch = false;  

                match = true;  

                i++;  

            }  

        }  

        if (match || (preserveAllTokens && lastMatch)) {  

            list.add(str.substring(start, i));  

        }  

        return (String[]) list.toArray(new String[list.size()]);  

    }

也是一個核心方法，用於拆分字符串，其與上一個方法的不同之處在於其分隔符用字符串表示一組字符，且增加一個max變量，表示輸出的字符串數組的最大長度。另外注意該方法的b如果爲真，會在首尾及中間起作用，且如果分隔符字符串長度大於1，則數組中的""會更多（根據分隔符字符的數量）。該方法執行如下操作：
如果字符串爲null，則返回null。
如果字符串爲""，則返回""。
之後的處理分三種情況，分別是分隔符字符串爲null，則默認爲" "；分割符字符串長度爲1；分割符字符串爲普通字符串。這三種處理的不同只是在當前遍歷中的字符的判斷問題。
    1.利用Character.isWhitespace方法判斷每個字符是否爲" "。
    2.先把字符串轉化爲一個char，然後就和前一個splitWorker方法類似。
    3.利用indexOf方法查找當前字符是否在分隔符字符串中，然後就和前一個splitWorker方法類似。
    需要注意的是，如果輸出的數組的數量已經等於max的值，則把指針直接挪到最後，等待下次遍歷的時候直接跳出。同時由於lastMatch和match都置爲假，最後也不會輸出""了。
   遍歷結束，如果match爲真（到最後也沒有遇到c），或者lastMatch和b同爲真（最後一個字符在分隔符字符串中），則輸出最後的部分（如果是後者，則會輸出一個""）。

轉載自:http://yinny.iteye.com/blog/1750210

傳說中的大神

發佈了42 篇原創文章 · 獲贊 6 · 訪問量 16萬+

私信關注

split 陷阱分析

java 字符串split有很多坑，使用時請小心！！

Java代碼

java export jar 在Window,liunix 運行

java 直接訪問WebSphere JNDI

ext3 簡單grid 頁面

extjs3 用戶管理頁面

JSON 簡單封裝

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結