如何讓UTF-8在Java webapps中運行?

本文翻譯自:How to get UTF-8 working in Java webapps?

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ЦжФ for special cases. 我需要UTF-8的工作我的Java Web應用程序(servlet的JSP +,不使用框架),以支持äöå等定期芬蘭文字和西裏爾字母像ЦжФ特殊情況。

My setup is the following: 我的設置如下:

  • Development environment: Windows XP 開發環境:Windows XP
  • Production environment: Debian 製作環境:Debian

Database used: MySQL 5.x 使用的數據庫:MySQL 5.x.

Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site. 用戶主要使用Firefox2,但Opera 9.x,FF3,IE7和谷歌Chrome也用於訪問該網站。

How to achieve this? 怎麼做到這一點?


#1樓

參考:https://stackoom.com/question/a96/如何讓UTF-在Java-webapps中運行


#2樓

For my case of displaying Unicode character from message bundles, I don't need to apply "JSP page encoding" section to display Unicode on my jsp page. 對於我從消息包中顯示Unicode字符的情況,我不需要應用“JSP頁面編碼”部分來在我的jsp頁面上顯示Unicode。 All I need is "CharsetFilter" section. 我只需要“CharsetFilter”部分。


#3樓

Answering myself as the FAQ of this site encourages it. 回答我自己作爲本網站的常見問題解答鼓勵它。 This works for me: 這對我有用:

Mostly characters äåö are not a problematic as the default character set used by browsers and tomcat/java for webapps is latin1 ie. 大多數字符äåö不是問題,因爲瀏覽器使用的默認字符集和用於webapps的tomcat / java是latin1即。 ISO-8859-1 which "understands" those characters. ISO-8859-1“理解”這些角色。

To get UTF-8 working under Java+Tomcat+Linux/Windows+Mysql requires the following: 要使UTF-8在Java + Tomcat + Linux / Windows + Mysql下工作,需要以下內容:

Configuring Tomcat's server.xml 配置Tomcat的server.xml

It's necessary to configure that the connector uses UTF-8 to encode url (GET request) parameters: 有必要配置連接器使用UTF-8來編碼url(GET請求)參數:

<Connector port="8080" maxHttpHeaderSize="8192"
 maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
 enableLookups="false" redirectPort="8443" acceptCount="100"
 connectionTimeout="20000" disableUploadTimeout="true" 
 compression="on" 
 compressionMinSize="128" 
 noCompressionUserAgents="gozilla, traviata" 
 compressableMimeType="text/html,text/xml,text/plain,text/css,text/ javascript,application/x-javascript,application/javascript"
 URIEncoding="UTF-8"
/>

The key part being URIEncoding="UTF-8" in the above example. 在上面的例子中,關鍵部分是URIEncoding =“UTF-8” This quarantees that Tomcat handles all incoming GET parameters as UTF-8 encoded. 這保證了Tomcat將所有傳入的GET參數處理爲UTF-8編碼。 As a result, when the user writes the following to the address bar of the browser: 因此,當用戶將以下內容寫入瀏覽器的地址欄時:

 https://localhost:8443/ID/Users?action=search&name=*ж*

the character ж is handled as UTF-8 and is encoded to (usually by the browser before even getting to the server) as %D0%B6 . 字符被處理爲UTF-8並被編碼爲(通常在瀏覽器之前通過瀏覽器) %D0%B6

POST request are not affected by this. POST請求不受此影響。

CharsetFilter CharsetFilter

Then it's time to force the java webapp to handle all requests and responses as UTF-8 encoded. 然後是時候強制java webapp以UTF-8編碼處理所有請求和響應。 This requires that we define a character set filter like the following: 這要求我們定義一個字符集過濾器,如下所示:

package fi.foo.filters;

import javax.servlet.*;
import java.io.IOException;

public class CharsetFilter implements Filter {

    private String encoding;

    public void init(FilterConfig config) throws ServletException {
        encoding = config.getInitParameter("requestEncoding");
        if (encoding == null) encoding = "UTF-8";
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
            throws IOException, ServletException {
        // Respect the client-specified character encoding
        // (see HTTP specification section 3.4.1)
        if (null == request.getCharacterEncoding()) {
            request.setCharacterEncoding(encoding);
        }

        // Set the default response content type and encoding
        response.setContentType("text/html; charset=UTF-8");
        response.setCharacterEncoding("UTF-8");

        next.doFilter(request, response);
    }

    public void destroy() {
    }
}

This filter makes sure that if the browser hasn't set the encoding used in the request, that it's set to UTF-8. 此過濾器確保如果瀏覽器未設置請求中使用的編碼,則將其設置爲UTF-8。

The other thing done by this filter is to set the default response encoding ie. 此過濾器完成的另一件事是設置默認響應編碼即。 the encoding in which the returned html/whatever is. 返回的html /是什麼的編碼。 The alternative is to set the response encoding etc. in each controller of the application. 另一種方法是在應用程序的每個控制器中設置響應編碼等。

This filter has to be added to the web.xml or the deployment descriptor of the webapp: 必須將此過濾器添加到web.xml或webapp的部署描述符中:

 <!--CharsetFilter start--> 

  <filter>
    <filter-name>CharsetFilter</filter-name>
    <filter-class>fi.foo.filters.CharsetFilter</filter-class>
      <init-param>
        <param-name>requestEncoding</param-name>
        <param-value>UTF-8</param-value>
      </init-param>
  </filter>

  <filter-mapping>
    <filter-name>CharsetFilter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

The instructions for making this filter are found at the tomcat wiki ( http://wiki.apache.org/tomcat/Tomcat/UTF-8 ) 有關製作此過濾器的說明,請訪問tomcat wiki( http://wiki.apache.org/tomcat/Tomcat/UTF-8

JSP page encoding JSP頁面編碼

In your web.xml , add the following: 在您的web.xml中 ,添加以下內容:

<jsp-config>
    <jsp-property-group>
        <url-pattern>*.jsp</url-pattern>
        <page-encoding>UTF-8</page-encoding>
    </jsp-property-group>
</jsp-config>

Alternatively, all JSP-pages of the webapp would need to have the following at the top of them: 或者,webapp的所有JSP頁面都需要在它們的頂部有以下內容:

 <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

If some kind of a layout with different JSP-fragments is used, then this is needed in all of them. 如果使用某種具有不同JSP片段的佈局,則所有這些都需要這樣。

HTML-meta tags HTML元標記

JSP page encoding tells the JVM to handle the characters in the JSP page in the correct encoding. JSP頁面編碼告訴JVM以正確的編碼處理JSP頁面中的字符。 Then it's time to tell the browser in which encoding the html page is: 然後是時候告訴瀏覽器html頁面的編碼是:

This is done with the following at the top of each xhtml page produced by the webapp: 這是通過webapp生成的每個xhtml頁面頂部的以下內容完成的:

   <?xml version="1.0" encoding="UTF-8"?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi">
   <head>
   <meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
   ...

JDBC-connection JDBC連接

When using a db, it has to be defined that the connection uses UTF-8 encoding. 使用db時,必須定義連接使用UTF-8編碼。 This is done in context.xml or wherever the JDBC connection is defiend as follows: 這可以在context.xml中完成,也可以在JDBC連接的任何地方完成,如下所示:

      <Resource name="jdbc/AppDB" 
        auth="Container"
        type="javax.sql.DataSource"
        maxActive="20" maxIdle="10" maxWait="10000"
        username="foo"
        password="bar"
        driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/      ID_development?useEncoding=true&amp;characterEncoding=UTF-8"
    />

MySQL database and tables MySQL數據庫和表

The used database must use UTF-8 encoding. 使用過的數據庫必須使用UTF-8編碼。 This is achieved by creating the database with the following: 這是通過使用以下內容創建數據庫來實現的:

   CREATE DATABASE `ID_development` 
   /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_swedish_ci */;

Then, all of the tables need to be in UTF-8 also: 然後,所有表格也必須是UTF-8:

   CREATE TABLE  `Users` (
    `id` int(10) unsigned NOT NULL auto_increment,
    `name` varchar(30) collate utf8_swedish_ci default NULL
    PRIMARY KEY  (`id`)
   ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci ROW_FORMAT=DYNAMIC;

The key part being CHARSET=utf8 . 關鍵部分是CHARSET = utf8

MySQL server configuration MySQL服務器配置

MySQL serveri has to be configured also. 還必須配置MySQL serveri。 Typically this is done in Windows by modifying my.ini -file and in Linux by configuring my.cnf -file. 通常,這可以通過修改my.ini -file在Windows中完成,在Linux中通過配置my.cnf -file來完成。 In those files it should be defined that all clients connected to the server use utf8 as the default character set and that the default charset used by the server is also utf8. 在這些文件中,應該定義連接到服務器的所有客戶端都使用utf8作爲默認字符集,並且服務器使用的默認字符集也是utf8。

   [client]
   port=3306
   default-character-set=utf8

   [mysql]
   default-character-set=utf8

Mysql procedures and functions Mysql程序和功能

These also need to have the character set defined. 這些還需要定義字符集。 For example: 例如:

   DELIMITER $$

   DROP FUNCTION IF EXISTS `pathToNode` $$
   CREATE FUNCTION `pathToNode` (ryhma_id INT) RETURNS TEXT CHARACTER SET utf8
   READS SQL DATA
   BEGIN

    DECLARE path VARCHAR(255) CHARACTER SET utf8;

   SET path = NULL;

   ...

   RETURN path;

   END $$

   DELIMITER ;

GET requests: latin1 and UTF-8 GET請求:latin1和UTF-8

If and when it's defined in tomcat's server.xml that GET request parameters are encoded in UTF-8, the following GET requests are handled properly: 如果在tomcat的server.xml中定義了GET請求參數以UTF-8編碼,則可以正確處理以下GET請求:

   https://localhost:8443/ID/Users?action=search&name=Petteri
   https://localhost:8443/ID/Users?action=search&name=ж

Because ASCII-characters are encoded in the same way both with latin1 and UTF-8, the string "Petteri" is handled correctly. 由於ASCII字符的編碼方式與latin1和UTF-8相同,因此正確處理字符串“Petteri”。

The Cyrillic character ж is not understood at all in latin1. latin1中完全沒有理解西里爾字符。 Because Tomcat is instructed to handle request parameters as UTF-8 it encodes that character correctly as %D0%B6 . 因爲Tomcat被指示以UTF-8的形式處理請求參數,所以它正確地將該字符編碼爲%D0%B6

If and when browsers are instructed to read the pages in UTF-8 encoding (with request headers and html meta-tag), at least Firefox 2/3 and other browsers from this period all encode the character themselves as %D0%B6 . 如果指示瀏覽器以UTF-8編碼(帶有請求標頭和html元標記)讀取頁面,則至少Firefox 2/3和此期間的其他瀏覽器都將字符本身編碼爲%D0%B6

The end result is that all users with name "Petteri" are found and also all users with the name "ж" are found. 最終結果是找到名稱爲“Petteri”的所有用戶,並且還找到名爲“ж”的所有用戶。

But what about äåö? 但是äåö呢?

HTTP-specification defines that by default URLs are encoded as latin1. HTTP規範定義默認情況下URL被編碼爲latin1。 This results in firefox2, firefox3 etc. encoding the following 這導致firefox2,firefox3等編碼如下

    https://localhost:8443/ID/Users?action=search&name=*Päivi*

in to the encoded version 進入編碼版本

    https://localhost:8443/ID/Users?action=search&name=*P%E4ivi*

In latin1 the character ä is encoded as %E4 . 在latin1中,角色ä編碼爲%E4 Even though the page/request/everything is defined to use UTF-8 . 即使頁面/請求/所有內容都定義爲使用UTF-8 The UTF-8 encoded version of ä is %C3%A4 UT的UTF-8編碼版本是%C3%A4

The result of this is that it's quite impossible for the webapp to correly handle the request parameters from GET requests as some characters are encoded in latin1 and others in UTF-8. 結果是,webapp很難從GET請求中正確處理請求參數,因爲某些字符在latin1中編碼,而其他字符在UTF-8中編碼。 Notice: POST requests do work as browsers encode all request parameters from forms completely in UTF-8 if the page is defined as being UTF-8 注意:POST請求確實有效,因爲如果頁面被定義爲UTF-8,瀏覽器將完全以UTF-8格式編碼表單中的所有請求參數

Stuff to read 東西要讀

A very big thank you for the writers of the following for giving the answers for my problem: 非常感謝下列作者爲我的問題提供答案:

  • http://tagunov.tripod.com/i18n/i18n.html http://tagunov.tripod.com/i18n/i18n.html
  • http://wiki.apache.org/tomcat/Tomcat/UTF-8 http://wiki.apache.org/tomcat/Tomcat/UTF-8
  • http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
  • http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html
  • http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html
  • http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html
  • http://jeppesn.dk/utf-8.html http://jeppesn.dk/utf-8.html
  • http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html
  • http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
  • http://www.utf8-chartable.de/ http://www.utf8-chartable.de/

Important Note 重要的提示

supports the Basic Multilingual Plane using 3-byte UTF-8 characters. 支持使用3字節UTF-8字符的Basic Multilingual Plane If you need to go outside of that (certain alphabets require more than 3-bytes of UTF-8), then you either need to use a flavor of VARBINARY column type or use the utf8mb4 character set (which requires MySQL 5.5.3 or later). 如果你需要超出它(某些字母表需要超過3個字節的UTF-8),那麼你需要使用VARBINARY列類型或使用utf8mb4字符集 (這需要MySQL 5.5.3或更高版本) )。 Just be aware that using the utf8 character set in MySQL won't work 100% of the time. 請注意,在MySQL中使用utf8字符集將無法在100%的時間內正常工作。

Tomcat with Apache Tomcat與Apache

One more thing If you are using Apache + Tomcat + mod_JK connector then you also need to do following changes: 還有一件事如果您使用的是Apache + Tomcat + mod_JK連接器,那麼您還需要進行以下更改:

  1. Add URIEncoding="UTF-8" into tomcat server.xml file for 8009 connector, it is used by mod_JK connector. 將URIEncoding =“UTF-8”添加到8009連接器的tomcat server.xml文件中,它由mod_JK連接器使用。 <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8"/>
  2. Goto your apache folder ie /etc/httpd/conf and add AddDefaultCharset utf-8 in httpd.conf file . 轉到你的apache文件夾,即/etc/httpd/conf並在httpd.conf file添加AddDefaultCharset utf-8 Note: First check that it is exist or not. 注意:首先檢查它是否存在。 If exist you may update it with this line. 如果存在,您可以使用此行更新它。 You can add this line at bottom also. 您也可以在底部添加此行。

#4樓

I think you summed it up quite well in your own answer. 我想你在自己的答案中總結得很好。

In the process of UTF-8-ing(?) from end to end you might also want to make sure java itself is using UTF-8. 在端到端的UTF-8-ing(?)過程中,您可能還需要確保java本身使用的是UTF-8。 Use -Dfile.encoding=utf-8 as parameter to the JVM (can be configured in catalina.bat). 使用-Dfile.encoding = utf-8作爲JVM的參數(可以在catalina.bat中配置)。


#5樓

This is for Greek Encoding in MySql tables when we want to access them using Java: 當我們想要使用Java訪問它們時,這是針對MySql表中的希臘語編碼:

Use the following connection setup in your JBoss connection pool (mysql-ds.xml) 在JBoss連接池中使用以下連接設置(mysql-ds.xml)

<connection-url>jdbc:mysql://192.168.10.123:3308/mydatabase</connection-url>
<driver-class>com.mysql.jdbc.Driver</driver-class>
<user-name>nts</user-name>
<password>xaxaxa!</password>
<connection-property name="useUnicode">true</connection-property>
<connection-property name="characterEncoding">greek</connection-property>

If you don't want to put this in a JNDI connection pool, you can configure it as a JDBC-url like the next line illustrates: 如果您不想將它放在JNDI連接池中,可以將其配置爲JDBC-url,如下一行所示:

jdbc:mysql://192.168.10.123:3308/mydatabase?characterEncoding=greek

For me and Nick, so we never forget it and waste time anymore..... 對我和尼克來說,我們永遠不會忘記它,浪費時間......


#6樓

In case you have specified in connection pool (mysql-ds.xml), in your Java code you can open the connection as follows: 如果您已在連接池(mysql-ds.xml)中指定,則在Java代碼中可以按如下方式打開連接:

DriverManager.registerDriver(new com.mysql.jdbc.Driver());
Connection conn = DriverManager.getConnection(
    "jdbc:mysql://192.168.1.12:3308/mydb?characterEncoding=greek",
    "Myuser", "mypass");
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章