MySQL學習筆記——字符集

      字符值包含字母、數字和特殊符號。在字符值可以存儲之前,字母、數字和字符必須轉換爲數值代碼。所以必須建立一個轉換表,其中包含了每個相關字符的數值代碼。這樣的轉換表就稱爲字符集,有時也稱爲代碼字符集(code character set)和字符編碼(character encoding)。

      要想讓計算機處理字符,不僅需要字符到數值的映射,還要考慮如果存儲這些數值,所以便誕生了編碼方案的概念。是定長存儲呢,還是變長存儲?是用一個字節還是用多個字節?仁者見仁,智者見智。依據需要的不同,誕生了很多的編碼方案。對於Unicode,就存在UTF-8、UTF-16、UTF-32。

      而在MySQL中,字符集的概念和編碼方案的概念被看作是同義詞。一個字符集(character set)是一個轉換表和一個編碼方案的組合。校對(collation)的概念是爲了解決排序的順序或字符的分組問題。因爲字符的排序和分組需要字符之間的比較,校對就定義了這些比較的大小關係。

顯示可用的字符集

SHOW CHARACTER SET
或者
SELECT CHARACTER_SET_NAME,DESCRIPTION,DEFAULT_COLLATE_NAME,MAXLEN
FROM INFORMATION_SCHEMA.CHARACTER_SETS


顯示字符集utf8可用的校對

SHOW COLLATION LIKE 'utf8%'
或者
SELECT *
FROM INFOMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE 'utf8%'

很多時候,數據庫中或客戶端顯示亂碼是由於字符集沒有設置正確,用latin1字符集顯示utf8字符集的數據當然會出現問題。這時需要查看數據庫、表和列的字符集是否是你想要的;客戶端的字符集是否的當。

如下是字符集和校對的系統變量

系統變量 說明
CHARACTER_SET_CLIENT 從客戶機發送給服務器的語句的字符集
CHARACTER_SET_CONNECTION 客戶機和服務器連接的字符集
CHARACTER_SET_DATABASE 當前數據庫的默認字符集。每次使用USE語句來“跳轉”到另一個數據庫時,這個變量就會改變。如果沒有當前數據庫,其值爲CHARACTER_SET_SERVER
CHARACTER_SET_RESULTS 從服務器發送到客戶機的SELECT語句的最終結果的字符集,包括列的值,列的元數據——列名,錯誤信息
CHARACTER_SET_SERVER 服務器的默認字符集
CHARACTER_SET_SYSTEM 系統字符集。用於數據庫中對象(如表和列)的名字,也用於存儲在目錄表中函數的名字。其值總是等於utf8
CHARACTER_SET_DIR 註冊的所有字符的文件都在這個目錄中

COLLATION_CONNECTION

當前連接的校對
COLLATION_DATABASE 當前日期的默認校對。每次使用USE語句來“跳轉”到另一個數據庫時,這個變量就會改變。
COLLATION_SERVER 服務器默認校對

數據庫對象的字符集的指定有如下繼承關係:

Server -> Database -> Table -> Column

也就是說,如果後者沒有顯示指定字符集,那麼將採用前者的字符集。

Server Character Set and Collation

MySQL Server has a server character set and a server collation. These can be set at server startup on the command line or in an option file and changed at runtime.

Initially, the server character set and collation depend on the options that you use when you start mysqld. You can use --character-set-server for the character set. Along with it, you can add --collation-server for the collation. If you don't specify a character set, that is the same as saying --character-set-server=latin1. If you specify only a character set (for example, latin1) but not a collation, that is the same as saying --character-set-server=latin1 --collation-server=latin1_swedish_ci because latin1_swedish_ci is the default collation for latin1. Therefore, the following three commands all have the same effect:

shell> mysqld
shell> mysqld --character-set-server=latin1
shell> mysqld --character-set-server=latin1 \
           --collation-server=latin1_swedish_ci

The server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. They have no other purpose.

The current server character set and collation can be determined from the values of the character_set_serverand collation_server system variables. These variables can be changed at runtime.

Database Character Set and Collation

Every database has a database character set and a database collation. The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

CREATE DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]

ALTER DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]

The keyword SCHEMA can be used instead of DATABASE.

The database character set and collation are used as default values for table definitions if the table character set and collation are not specified in CREATE TABLE statements. The database character set also is used by LOAD DATA INFILE. The character set and collation have no other purposes.

The character set and collation for the default database can be determined from the values of thecharacter_set_database and collation_database system variables. The server sets these variables whenever the default database changes. If there is no default database, the variables have the same value as the corresponding server-level system variables, character_set_server and collation_server.

Table Character Set and Collation

Every table has a table character set and a table collation. The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying the table character set and collation:

CREATE TABLE tbl_name (column_list)
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]]

ALTER TABLE tbl_name
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]
The table character set and collation are used as default values for column definitions if the column character set and collation are not specified in individual column definitions. The table character set and collation are MySQL extensions; there are no such things in standard SQL.

Column Character Set and Collation

Every “character” column (that is, a column of type CHARVARCHAR, or TEXT) has a column character set and a column collation. Column definition syntax for CREATE TABLE and ALTER TABLE has optional clauses for specifying the column character set and collation:

col_name {CHAR | VARCHAR | TEXT} (col_length)
    [CHARACTER SET charset_name]
    [COLLATE collation_name]

These clauses can also be used for ENUM and SET columns:

col_name {ENUM | SET} (val_list)
    [CHARACTER SET charset_name]
    [COLLATE collation_name]

Examples:

CREATE TABLE t1
(
    col1 VARCHAR(5)
      CHARACTER SET latin1
      COLLATE latin1_german1_ci
);

ALTER TABLE t1 MODIFY
    col1 VARCHAR(5)
      CHARACTER SET latin1
      COLLATE latin1_swedish_ci;

If you use ALTER TABLE to convert a column from one character set to another, MySQL attempts to map the data values, but if the character sets are incompatible, there may be data loss.

轉換字符集注意事項:

ALTER [IGNORE] TABLE table

CONVERT TO CHARACTER SET charset [COLLATE collation] | [DEFAULT]CHARACTER SET charset[COLLATE collation]

CONVERT子句可能帶來數據上的問題。因此,在使用該子句前,請確保做過備份並再完成前檢查轉換的數據。如果你有字符集列,在轉換過程中數據有可能丟失,首先應該把該列轉換爲二進制大對象(BLOB)數據類型,接着轉換成想要的數據類型和字符集。通常情況下,這種做法極好,因爲BLOB數據不能轉換字符集。

 Character String Literal Character Set and Collation

Every character string literal has a character set and a collation.

A character string literal may have an optional character set introducer and COLLATE clause:

[_charset_name]'string' [COLLATE collation_name]

Examples:

SELECT 'string';
SELECT _latin1'string';
SELECT _latin1'string' COLLATE latin1_danish_ci;


 

For the simple statement SELECT 'string', the string has the character set and collation defined by thecharacter_set_connection and collation_connection system variables.

The _charset_name expression is formally called an introducer. It tells the parser, “the string that is about to follow uses character set X.” Because this has confused people in the past, we emphasize that an introducer does not change the string to the introducer character set like CONVERT() would do. It does not change the string's value, although padding may occur. The introducer is just a signal. An introducer is also legal before standard hex literal and numeric hex literal notation (x'literal' and 0xnnnn), or before bit-field literal notation (b'literal'and 0bnnnn).

 

National Character Set

標準的SQL中使用NCHAR,NVARCHAR等表示國際字符集。但是MySQL不是,它只有CHAR和VARCHAR。需要通過設置字符集來達到存儲存儲其他字符的目的。

 For example, these data type declarations are equivalent:

CHAR(10) CHARACTER SET utf8
NATIONAL CHARACTER(10)
NCHAR(10)

As are these:

VARCHAR(10) CHARACTER SET utf8
NATIONAL VARCHAR(10)
NCHAR VARCHAR(10)
NATIONAL CHARACTER VARYING(10)
NATIONAL CHAR VARYING(10)

You can use N'literal' (or n'literal') to create a string in the national character set. These statements are equivalent:

SELECT N'some text';
SELECT n'some text';
SELECT _utf8'some text';

Connection Character Sets and Collations

Two statements affect the connection-related character set variables as a group:

  • SET NAMES 'charset_name' [COLLATE 'collation_name']

    SET NAMES indicates what character set the client will use to send SQL statements to the server. Thus, SET NAMES 'cp1251' tells the server, “future incoming messages from this client are in character set cp1251.” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)

    SET NAMES 'x' statement is equivalent to these three statements:

    SET character_set_client = x;
    SET character_set_results = x;
    SET character_set_connection = x;
    

    Setting character_set_connection to x also implicitly sets collation_connection to the default collation forx. It is unnecessary to set that collation explicitly. To specify a particular collation, use the optional COLLATE clause:

    SET NAMES 'charset_name' COLLATE 'collation_name'
    
  • SET CHARACTER SET charset_name

    SET CHARACTER SET is similar to SET NAMES but sets character_set_connection andcollation_connection to character_set_database and collation_database. A SET CHARACTER SETx statement is equivalent to these three statements:

    SET character_set_client = x;
    SET character_set_results = x;
    SET collation_connection = @@collation_database;
    

    Setting collation_connection also implicitly sets character_set_connection to the character set associated with the collation (equivalent to executing SET character_set_connection = @@character_set_database). It is unnecessary to set character_set_connection explicitly.

Note

ucs2utf16, and utf32 cannot be used as a client character set, which means that they do not work for SET NAMES or SET CHARACTER SET.


The MySQL client programs mysqlmysqladminmysqlcheckmysqlimport, and mysqlshow determine the default character set to use as follows:

  • In the absence of other information, the programs use the compiled-in default character set, usually latin1.

  • The programs can autodetect which character set to use based on the operating system setting, such as the value of the LANG or LC_ALL locale environment variable on Unix systems or the code page setting on Windows systems. For systems on which the locale is available from the OS, the client uses it to set the default character set rather than using the compiled-in default. For example, setting LANG to ru_RU.KOI8-R causes the koi8r character set to be used. Thus, users can configure the locale in their environment for use by MySQL clients.

    The OS character set is mapped to the closest MySQL character set if there is no exact match. If the client does not support the matching character set, it uses the compiled-in default. For example, ucs2 is not supported as a connection character set.

    C applications that wish to use character set autodetection based on the OS setting can invoke the followingmysql_options() call before connecting to the server:

    mysql_options(mysql,
                  MYSQL_SET_CHARSET_NAME,
                  MYSQL_AUTODETECT_CHARSET_NAME);
  • The programs support a --default-character-set option, which enables users to specify the character set explicitly to override whatever default the client otherwise determines.

Note

Before MySQL 5.5, in the absence of other information, the MySQL client programs used the compiled-in default character set, usually latin1. An implication of this difference is that if your environment is configured to use a non-latin1 locale, MySQL client programs will use a different connection character set than previously, as though you had issued an implicit SET NAMES statement. If the previous behavior is required, start the client with the --default-character-set=latin1 option.

When a client connects to the server, it sends the name of the character set that it wants to use. The server uses the name to set the character_set_clientcharacter_set_results, and character_set_connectionsystem variables. In effect, the server performs a SET NAMES operation using the character set name.

With the mysql client, if you want to use a character set different from the default, you could explicitly execute SET NAMES every time you start up. However, to accomplish the same result more easily, you can add the --default-character-set option setting to your mysql command line or in your option file. For example, the following option file setting changes the three connection-related character set variables set to koi8r each time you invoke mysql:

[mysql]
default-character-set=koi8r

To see the values of the character set and collation system variables that apply to your connection, use these statements:

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

If you change the default character set or collation for a database, stored routines that use the database defaults must be dropped and recreated so that they use the new defaults. (In a stored routine, variables with character data types use the database defaults if the character set or collation are not specified explicitly.

校對命名規則

Collation Names

MySQL collation names follow these rules:

  • A name ending in _ci indicates a case-insensitive collation.

  • A name ending in _cs indicates a case-sensitive collation.

  • A name ending in _bin indicates a binary collation. Character comparisons are based on character binary code values.

 

Nonbinary strings have PADSPACE behavior for all collations, including_bin collations. Trailing spaces are insignificant in comparisons:(也就是說,字符串中末尾的空格不起作用)

mysql> SET NAMES utf8 COLLATE utf8_bin;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
|          1 |
+------------+
1 row in set (0.00 sec)

For binary strings, all characters are significant in comparisons, including trailing spaces:

mysql> SET NAMES binary;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
|          0 |
+------------+
1 row in set (0.00 sec)

The BINARY Operator

The BINARY operator casts the string following it to a binary string. This is an easy way to force a comparison to be done byte by byte rather than character by character. BINARY also causes trailing spaces to be significant.

mysql> SELECT 'a' = 'A';
        -> 1
mysql> SELECT BINARY 'a' = 'A';
        -> 0
mysql> SELECT 'a' = 'a ';
        -> 1
mysql> SELECT BINARY 'a' = 'a ';
        -> 0

BINARY str is shorthand for CAST(str AS BINARY).

The BINARY attribute in character column definitions has a different effect. A character column defined with theBINARY attribute is assigned the binary collation of the column character set. Every character set has a binary collation. For example, the binary collation for the latin1 character set is latin1_bin, so if the table default character set is latin1, these two column definitions are equivalent:

CHAR(10) BINARY
CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin

 Collation and INFORMATION_SCHEMA Searches

String columns in INFORMATION_SCHEMA tables have a collation of utf8_general_ci, which is case insensitive. However, searches in INFORMATION_SCHEMA string columns are also affected by file system case sensitivity. For values that correspond to objects that are represented in the file system, such as names of databases and tables, searches may be case sensitive if the file system is case sensitive. This section describes how to work around this issue if necessary; see also Bug #34921.

Suppose that a query searches the SCHEMATA.SCHEMA_NAME column for the test database. On Linux, file systems are case sensitive, so comparisons of SCHEMATA.SCHEMA_NAME with 'test' match, but comparisons with 'TEST'do not:

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME = 'test';
+-------------+
| SCHEMA_NAME |
+-------------+
| test        |
+-------------+
1 row in set (0.01 sec)

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME = 'TEST';
Empty set (0.00 sec)

On Windows or Mac OS X where file systems are not case sensitive, comparisons match both 'test' and 'TEST':

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME = 'test';
+-------------+
| SCHEMA_NAME |
+-------------+
| test        |
+-------------+
1 row in set (0.00 sec)

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME = 'TEST';
+-------------+
| SCHEMA_NAME |
+-------------+
| TEST        |
+-------------+
1 row in set (0.00 sec)

The value of the lower_case_table_names system variable makes no difference in this context.

This behavior occurs because the utf8_general_ci collation is not used for INFORMATION_SCHEMA queries when searching the file system for database objects. It is a result of optimizations implemented for INFORMATION_SCHEMAsearches in MySQL. For information about these optimizations, see Section 7.2.4, “OptimizingINFORMATION_SCHEMA Queries”.

Searches in INFORMATION_SCHEMA string columns for values that refer to INFORMATION_SCHEMA itself do use theutf8_general_ci collation because INFORMATION_SCHEMA is a “virtual” database and is not represented in the file system. For example, comparisons with SCHEMATA.SCHEMA_NAME match 'information_schema' or'INFORMATION_SCHEMA' regardless of platform:

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME = 'information_schema';
+--------------------+
| SCHEMA_NAME        |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME = 'INFORMATION_SCHEMA';
+--------------------+
| SCHEMA_NAME        |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)

If the result of a string operation on an INFORMATION_SCHEMA column differs from expectations, a workaround is to use an explicit COLLATE clause to force a suitable collation (Section 9.1.7.2, “Using COLLATE in SQL Statements”). For example, to perform a case-insensitive search, use COLLATE with the INFORMATION_SCHEMA column name:

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME COLLATE utf8_general_ci = 'test';
+-------------+
| SCHEMA_NAME |
+-------------+
| test        |
+-------------+
1 row in set (0.00 sec)

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
    -> WHERE SCHEMA_NAME COLLATE utf8_general_ci = 'TEST';
| SCHEMA_NAME |
+-------------+
| test        |
+-------------+
1 row in set (0.00 sec)

You can also use the UPPER() or LOWER() function:

WHERE UPPER(SCHEMA_NAME) = 'TEST'
WHERE LOWER(SCHEMA_NAME) = 'test'

詳細MySQL字符集參考幫助手冊:http://dev.mysql.com/doc/refman/5.5/en/globalization.html 


轉自: http://www.cnblogs.com/freewater/archive/2011/12/17/2289431.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章