Java String的序列化小結

String對我們來說太熟悉了,因爲它無處不在,更因爲用String可以描述這個世界幾乎所有的東西,甚至於爲了描述精確的數值都需要String出馬(因爲計算機眼中的二進制和人類眼中的十進制間總有那麼點隔膜)。因爲熟悉而變得簡單,也容易被忽略。今天記錄一下關於String的容易被忽略的兩個問題。

  • 字符串重用——節省內存

因爲字符串太多,如果能夠重用則能夠節省很大的內存。首先看下面一個例子:

        String string1 = “HELLOHELLO”;

        String string2 = “HELLO” + “HELLO”;

上面創建了幾個字符串?1 or 2?後者是動態創建的,不過相信JVM可以對其直接優化的,因爲編譯時已經知道內容了,猜測是一個instance,即同一個char數組。Heapdump出來後觀察果然是一個。

String string3 = args[0]+ args[1];

輸入參數HELLO HELLO? 字符串變成幾個?沒錯啊,是兩個HELLOHELLO了。Dump heap後觀察,果然是兩個了。(其實不用dump healp,debug也可以看出來,string1和string3中的char[]指向地址是不一樣的)。

依此延伸,可以而知由java反序列化而來的那些string也是不一樣的。實例如下;

    public final static void main(String[] args) throws Exception {

        new StringDeserialized().testDescirialized();

    }

    public void testDescirialized() throws Exception {

        String testString = “HELLOHELLO”;

        ObjectOutputStream dataOutStream = new ObjectOutputStream(new FileOutputStream(“./stringdeserialized.data”));

        for (int i = 0; i < 1000; i++)

            dataOutStream.writeObject(testString);

        dataOutStream.close();

        List<String> readAgainList = new ArrayList<String>(100);

        for (int i = 0; i < 100; i++) {

            ObjectInputStream dataInputStream = new ObjectInputStream(new FileInputStream(“./stringdeserialized.data”));

            readAgainList.add((String) dataInputStream.readObject());

            dataInputStream.close();

        }

        Thread.sleep(Integer.MAX_VALUE);

    }

 

截圖是heap dump出來的,有HELLOHELLO的個數有101個,佔用的size>8080。對於JVM的內存使用可參考 http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

問題來了,系統維護的數據大多是字符串信息,比如configserver,而很多的信息都是同一個字符串,那麼反覆的從網絡序列化而來,佔用多的Heap。當然自己可以寫一個weak hashmap來維護,重用這些字符串。大家知道JVM中有String Pool,使用它無疑最好不過。查找String源碼,發現intern()的註釋如下:

    * When the intern method is invoked, if the pool already contains a

     * string equal to this <code>String</code> object as determined by

     * the {@link #equals(Object)} method, then the string from the pool is

     * returned. Otherwise, this <code>String</code> object is added to the

     * pool and a reference to this <code>String</code> object is returned.

於是改變上面一行代碼爲:

readAgainList.add(((String) dataInputStream.readObject()).intern());

再次Heap dump分析如下,另外可以看出一個包含10個字符的String佔用的Heap是80byte:

  •  字符串序列化的速度

目前CS處理爲了支持所謂的任意類型數據,CS採用了一個技巧,用Swizzle來保存java序列化後的byte類型,Server端無需反序列化就能保存任意類型的data;這樣的壞處有兩個:通用的Java序列化效率不高;協議不通用,對其他語言支持不行。因爲目前的數據信息基本都是String類型,而對對String數據的專門處理,可以通過String內部的byte數組(UTF-8)類表示,這樣也便於其他語言解析。可以考慮增加對publish(String)的支持。於是做了如下測試來比較對String不同serialize/deserialize的速率和大小。

結果是writeUTF最小最快,對於100char的String,差距是數量級的相當明顯,雖然Swizzle使用了一個技巧,當對同一個swizzle instance多次傳輸時,無需重複的序列化。

PS:Swizzle簡單的說就是把信息包裝起來,然後把序列化的byte流緩存起來,這樣如果同樣的一個信息要推送/發送N次,就無能減少N-1次的序列化時間。

public class CompareSerialization {

    public String generateTestData(int stringLength) {

        Random random = new Random();

        StringBuilder builder = new StringBuilder(stringLength);

        for (int j = 0; j < stringLength; j++) {

            builder.append((char) random.nextInt(127));

        }

        return builder.toString();

    }

    public int testJavaDefault(String data) throws Exception {

        ObjectOutputStream outputStream = null;

        ObjectInputStream inputStream = null;

        try {

            ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

            outputStream = new ObjectOutputStream(byteArray);

            outputStream.writeObject(data);

            outputStream.flush();

            inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

            inputStream.readObject();

            return byteArray.size();

        }

        finally {

            outputStream.close();

            inputStream.close();

        }

    }

    public int testJavaDefaultBytes(String data) throws Exception {

        ObjectOutputStream outputStream = null;

        ObjectInputStream inputStream = null;

        try {

            ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

            outputStream = new ObjectOutputStream(byteArray);

            outputStream.writeBytes(data);

            outputStream.flush();

            inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

            byte[] bytes = new byte[byteArray.size()];

            inputStream.read(new byte[byteArray.size()]);

            new String(bytes);

            return byteArray.size();

        }

        finally {

            outputStream.close();

            inputStream.close();

        }

    }

    public int testSwizzle(Swizzle data) throws Exception {

        ObjectOutputStream outputStream = null;

        ObjectInputStream inputStream = null;

        try {

            ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

            outputStream = new ObjectOutputStream(byteArray);

            outputStream.writeObject(data);

            outputStream.flush();

            inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

            inputStream.readObject();

            return byteArray.size();

        }

        finally {

            outputStream.close();

            inputStream.close();

        }

    }

    public int testStringUTF(String data) throws Exception {

        ObjectOutputStream outputStream = null;

        ObjectInputStream inputStream = null;

        try {

            ByteArrayOutputStream byteArray = new ByteArrayOutputStream();

            outputStream = new ObjectOutputStream(byteArray);

            outputStream.writeUTF(data);

            outputStream.flush();

            inputStream = new ObjectInputStream(new ByteArrayInputStream(byteArray.toByteArray()));

            inputStream.readUTF();

            return byteArray.size();

        }

        finally {

            outputStream.close();

            inputStream.close();

        }

    }

    public final static void main(String[] args) throws Exception {

        CompareSerialization compare = new CompareSerialization();

        String data = compare.generateTestData(Integer.parseInt(args[0]));

        Swizzle swizzle = new Swizzle(data);

        System.out.println(“testJavaDefault size on networking:” + compare.testJavaDefault(data));

        System.out.println(“testJavaDefaultBytes size on networking:” + compare.testJavaDefaultBytes(data));

        System.out.println(“testStringUTF size on networking:” + compare.testStringUTF(data));

        System.out.println(“testSwizzle size on networking:” + compare.testSwizzle(swizzle));

        // warm up

        for (int i = 0; i < 100; i++) {

            compare.testJavaDefault(data);

            compare.testJavaDefaultBytes(data);

            compare.testStringUTF(data);

            compare.testSwizzle(swizzle);

        }

        long startTime = System.currentTimeMillis();

        for (int i = 0; i < 10000; i++) {

            compare.testJavaDefault(data);

        }

        long endTime = System.currentTimeMillis();

        System.out.println(“testJavaDefault using time:” + (endTime – startTime));

        startTime = System.currentTimeMillis();

        for (int i = 0; i < 10000; i++) {

            compare.testJavaDefaultBytes(data);

        }

        endTime = System.currentTimeMillis();

        System.out.println(“testJavaDefaultBytes using time:” + (endTime – startTime));

        startTime = System.currentTimeMillis();

        for (int i = 0; i < 10000; i++) {

            compare.testStringUTF(data);

        }

        endTime = System.currentTimeMillis();

        System.out.println(“testStringUTF using time:” + (endTime – startTime));

        startTime = System.currentTimeMillis();

        for (int i = 0; i < 10000; i++) {

            compare.testSwizzle(swizzle);

        }

        endTime = System.currentTimeMillis();

        System.out.println(“testSwizzle using time:” + (endTime – startTime));

    }

}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章