报错信息:
java.lang.RuntimeException: java.sql.SQLException:
Incorrect string value: '\\xF0\\x9F\\x9A\\xB9\\xE4\\xBC...' for column 'xxx' ...
引起的原因:
在varchar类型的字段里面插入的text包含emoj表情。
MySQL 5.5.3 版本以下,CHARACTER SET 为 utf8 的字段最长只有3个字节。
查看MySQL版本可用以下命令:
mysql> status;
--------------
mysql Ver 14.14 Distrib 5.7.17, for Linux (x86_64) using EditLine wrapper
Connection id: 5
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.7.17-0ubuntu0.16.04.1 (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8mb4
Db characterset: utf8mb4
Client characterset: utf8mb4
Conn. characterset: utf8mb4
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 1 hour 28 min 58 sec
Threads: 1 Questions: 28 Slow queries: 0 Opens: 116 Flush tables: 1 Open tables: 29 Queries per second avg: 0.005
--------------
查看相关字符集设置信息(下面是我修改后的,修改前utf8mb4处都为utf8):
mysql> show variables like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
查看所有字符集信息:
mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
+----------+---------------------------------+---------------------+--------+
由上表可见,utf8最长3个字节。
在5.5.3版本以后,增加了 [utf8mb4] (https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html) 字符集。
这是官方原文:
The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplementary characters:
- For a BMP character, utf8 and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
- For a supplementary character, utf8 cannot store the character at all, whereas utf8mb4 requires four bytes to store it. Because utf8 cannot store the character at all, you have no supplementary characters in utf8 columns and need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.
因此,只需要将字符集由utf8修改为utf8mb4即可支持4字节的BPM等数据。
如何修改:
修改MySQL配置文件,修改mysql的客户端默认连接字符集,服务端以及数据库的默认字符集:
sudo vim /etc/mysql/my.cnf
在末尾或者已有的中括号里的块中加上如下配置:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
重启MySQL服务:
sudo service mysql restart
重启服务后,通过以下SQL可以查看结果:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
MySQL方面修改好后,需要修改已有的数据库或者表或者字段:
修改表的字符集:
ALTER TABLE tablename charset=utf8mb4;
修改表字段:
ALTER TABLE tablename MODIFY COLUMN columnname VARCHAR(255) CHARACTER SET utf8mb4;
修改表的字符集和修改字段可以用一条SQL完成。
如果使用的java连接MySQL,需要修改原有的连接设置:
原jdbcUrl:jdbc:mysql:///dbname?useUnicode=true&characterEncoding=UTF-8
需要去掉编码参数:
jdbcUrl:jdbc:mysql:///dbname
** 此处不要想当然的将参数useUnicode=true&characterEncoding=UTF-8修改为useUnicode=true&characterEncoding=utf8mb4
,否则会报出以下错误:
Could not get JDBC Connection; nested exception is com.MySQL.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
Unknown character set: 'utf8mb4'
我们在MySQL的配置文件里已经配置了[client]的默认访问字符集了,所以,此处将参数去掉即可。
吃水不忘打井人:
http://blog.csdn.net/secretx/article/details/21253559
http://stackoverflow.com/questions/7814293/how-to-insert-utf-8-mb4-characteremoji-in-ios5-in-mysql