MySQL utf8與utf8mb4

報錯信息:

java.lang.RuntimeException: java.sql.SQLException: 
Incorrect string value: '\\xF0\\x9F\\x9A\\xB9\\xE4\\xBC...' for column 'xxx' ...

引起的原因:
在varchar類型的字段裏面插入的text包含emoj表情。
MySQL 5.5.3 版本以下,CHARACTER SET 爲 utf8 的字段最長只有3個字節。
查看MySQL版本可用以下命令:

mysql> status;
--------------
mysql  Ver 14.14 Distrib 5.7.17, for Linux (x86_64) using  EditLine wrapper

Connection id:      5
Current database:   
Current user:       root@localhost
SSL:            Not in use
Current pager:      stdout
Using outfile:      ''
Using delimiter:    ;
Server version:     5.7.17-0ubuntu0.16.04.1 (Ubuntu)
Protocol version:   10
Connection:     Localhost via UNIX socket
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
UNIX socket:        /var/run/mysqld/mysqld.sock
Uptime:         1 hour 28 min 58 sec

Threads: 1  Questions: 28  Slow queries: 0  Opens: 116  Flush tables: 1  Open tables: 29  Queries per second avg: 0.005
--------------

查看相關字符集設置信息(下面是我修改後的,修改前utf8mb4處都爲utf8):

mysql>  show variables like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

查看所有字符集信息:

mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| binary   | Binary pseudo charset           | binary              |      1 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
+----------+---------------------------------+---------------------+--------+

由上表可見,utf8最長3個字節。

在5.5.3版本以後,增加了 [utf8mb4] (https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html) 字符集。
這是官方原文:

The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplementary characters:
* For a BMP character, utf8 and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
* For a supplementary character, utf8 cannot store the character at all, whereas utf8mb4 requires four bytes to store it. Because utf8 cannot store the character at all, you have no supplementary characters in utf8 columns and need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.

因此,只需要將字符集由utf8修改爲utf8mb4即可支持4字節的BPM等數據。

如何修改:
修改MySQL配置文件,修改mysql的客戶端默認連接字符集,服務端以及數據庫的默認字符集:

sudo vim /etc/mysql/my.cnf

在末尾或者已有的中括號裏的塊中加上如下配置:

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'

重啓MySQL服務:

sudo service mysql restart

重啓服務後,通過以下SQL可以查看結果:

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

MySQL方面修改好後,需要修改已有的數據庫或者表或者字段:
修改表的字符集:

ALTER TABLE tablename charset=utf8mb4;

修改表字段:

ALTER TABLE tablename MODIFY COLUMN columnname VARCHAR(255) CHARACTER SET utf8mb4;

修改表的字符集和修改字段可以用一條SQL完成。

如果使用的java連接MySQL,需要修改原有的連接設置:
原jdbcUrl:jdbc:mysql:///dbname?useUnicode=true&characterEncoding=UTF-8
需要去掉編碼參數:
jdbcUrl:jdbc:mysql:///dbname

** 此處不要想當然的將參數useUnicode=true&characterEncoding=UTF-8修改爲useUnicode=true&characterEncoding=utf8mb4
,否則會報出以下錯誤:

Could not get JDBC Connection; nested exception is com.MySQL.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: 
Unknown character set: 'utf8mb4'

我們在MySQL的配置文件裏已經配置了[client]的默認訪問字符集了,所以,此處將參數去掉即可。

吃水不忘打井人:
http://blog.csdn.net/secretx/article/details/21253559
http://stackoverflow.com/questions/7814293/how-to-insert-utf-8-mb4-characteremoji-in-ios5-in-mysql

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章