AL32UTF8的varchar2,AL16UTF16的nclob

環境:

操作系統:中文版windowsXP

數據庫:Oracle9iR2

java文件編碼UTF-8,jsp頁面編碼UTF-8

原庫字符集爲AL32UTF8,國家字符集爲UTF8,導出時客戶端字符集也爲UTF8,新創建數據庫字符集爲AL32UTF8

,國家字符集爲AL16UTF16,導入數據時,客戶端字符集爲UTF8,導入數據後一切正常。

當從表A中讀取一條信息a,將其nclob列內容插入表B的nclob列,同時向表B中varchar2列插入帶中文內容,操作完成後

,生成一條信息b,查詢表B此條信息,varchar2列內容無問題,nclob內容亂碼。

首先查看A表此條a信息,nclob列的部分內容如下:

-----------------------------------------------------------------------------------------------------------

<table cellspacing="0" cellpadding="5" width="100%" align="center" border="0">
    <tbody>
        <tr>
            <td align="center" height="48"><strong><font face="黑體"

-----------------------------------------------------------------------------------------------------------

dump得到:dump1

-----------------------------------------------------------------------------------------------------------

Typ=1 Len=2000: 0,60,0,116,0,97,0,98,0,108,0,101,0,32,0,99,

0,101,0,108,0,108,0,115,0,112,0,97,0,99,0,105,0,110,0,103,0,

61,0,34,0,48,0,34,0,32,0,99,0,101,0,108,0,108,0,112,0,97,0,

100,0,100,0,105,0,110,0,103,0,61,0,34,0,53,0,34,0,32,0,119,

0,105,0,100,0,116,0,104,0,61,0,34,0,49,0,48,0,48,0,37,0,34,

0,32,0,97,0,108,0,105,0,103,0,110,0,61,0,34,0,99,0,101,0,110,

0,116,0,101,0,114,0,34,0,32,0,98,0,111,0,114,0,100,0,101,0,

114,0,61,0,34,0,48,0,34,0,62,0,13,0,10,0,32,0,32,0,32,0,32,0,

60,0,116,0,98,0,111,0,100,0,121,0,62,0,13,0,10,0,32,0,32,0,

32,0,32,0,32,0,32,0,32,0,32,0,60,0,116,0,114,0,62,0,13,0,10,

0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,0,32,

0,60,0,116,0,100,0,32,0,97,0,108,0,105,0,103,0,110,0,61,0,

34,0,99,0,101,0,110,0,116,0,101,0,114,0,34,0,32,0,104,0,101,

0,105,0,103,0,104,0,116,0,61,0,34,0,52,0,56,0,34,0,62,0,60,

0,115,0,116,0,114,0,111,0,110,0,103,0,62,0,60,0,102,0,111,

0,110,0,116,0,32,0,102,0,97,0,99,0,101,0,61,0,34,158,209      ==>黑的UTF-16的2個十進制數

-----------------------------------------------------------------------------------------------------------

其中"<"符號Unicode編碼如下:

UTF-8:3C

UTF-16:003C

其中3C爲16進制數,轉換爲10進制則爲3*16+12=60

此外,黑體的“黑”字UTF-8編碼爲E9 BB 91=233 187 145,UTF-16編碼爲9E D1=158 209。

我們可以看到,實際內容確實是以UTF-16編碼方式存儲。

 

再查詢B表的b,部分內容如下:

-----------------------------------------------------------------------------------------------------------

㱴慢汥⁣敬汳灡捩湧㴢〢⁣敬汰慤摩湧㴢㔢⁷楤瑨㴢㄰〥∠慬楧渽≣敮瑥爢⁢潲摥爽∰∾††㱴扯摹㸠†††‼瑲㸠†††††‼瑤⁡汩杮㴢捥湴敲∠橋楧桴㴢㐸∾㱳瑲潮朾㱦潮琠晡捥㴢釤붓∠獩穥㴢㘢㻧뺎꿥뢸蓨꾾駥궦뻨꺡ꇦ鶿ㄼ⽦潮琾㰯獴牯湧㸼⽴搾††††㰯瑲㸠†††‼瑲㸠†††††‼瑤⁡汩杮㴢捥湴敲∠橋楧桴㴢ㄵ∾駥뢈㨼甾♮扳瀻♮扳瀻♮扳瀻♮扳瀻♮

-----------------------------------------------------------------------------------------------------------

dump得到:dump2

-----------------------------------------------------------------------------------------------------------

Typ=1 Len=2000: 60,116,97,98,108,101,32,99,101,108,108,115,112,97,

99,105,110,103,61,34,48,34,32,99,101,108,108,112,97,100,100,105,110,

103,61,34,53,34,32,119,105,100,116,104,61,34,49,48,48,37,34,32,97,108,

105,103,110,61,34,99,101,110,116,101,114,34,32,98,111,114,100,101,

114,61,34,48,34,62,32,32,32,32,60,116,98,111,100,121,62,32,32,32,32,

32,32,32,32,60,116,114,62,32,32,32,32,32,32,32,32,32,32,32,32,60,116,

100,32,97,108,105,103,110,61,34,99,101,110,116,101,114,34,32,104,101,

105,103,104,116,61,34,52,56,34,62,60,115,116,114,111,110,103,62,60,

102,111,110,116,32,102,97,99,101,61,34,233,187,145         ==>黑的UTF-8編碼的三個十進制數

-----------------------------------------------------------------------------------------------------------

我們仔細區分就會發現,dump2其實就是dump1將0丟掉後的內容,這裏的nclob列的編碼方式爲UTF-16,

那麼nclob列在查詢顯示時,第一個字符即爲60,116=3C,74

,但是,這兩個字節的內容在UTF-16編碼表當中不存在,結果即以㱴來顯示。

究其原因,其實表B的這條信息b是以UTF-8編碼方式保存的。

 

從表A信息a內容中的首個漢字“黑”來說,我們知道,dump1中,其對應的值是158 209,而再從dump2中分析

可以確定“黑”字其實是以UTF-8的編碼E9 BB 91保存的。

 

那麼問題的癥結就在於,數據是以UTF-8方式存入,以UTF-16的方式讀取,自然會亂碼了。

 

爲什麼數據會以UTF-8方式存入

查看保存調用的代碼,注意到PreparedStatement.setString方法的API註釋寫道:

void java.sql.PreparedStatement.setString(int parameterIndex, String x) throws SQLException

Sets the designated parameter to the given Java String value. The driver converts this to an SQL VARCHAR or LONGVARCHAR

value (depending on the argument's size relative to the driver's limits on VARCHAR values) when it sends it to the database.

原來當參數String X超過限定值時,其值會被驅動轉換爲一個SQL VARCHAR型值,因爲這裏數據庫字符集是AL32UTF8,而國家字符集是

AL16UTF16,最終,數據因爲過長而以varchar2類型即,以AL32UTF8字符集存入數據庫,也就因此產生了亂碼。

 

 

 

 

發佈了50 篇原創文章 · 獲贊 8 · 訪問量 8萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章