MySQL在 5.5.3 之后增加了 utf8mb4 字符编码,mb4即 most bytes 4。简单说 utf8mb4 是 utf8 的超集并完全兼容utf8,能够用四个字节存储更多的字符。
The utfmb4 character set has these characteristics:
Supports BMP and supplementary characters.
Requires a maximum of four bytes per multibyte character.
utf8mb4 contrasts with the utf8mb3 character set, which supports only BMP characters and uses a maximum of three bytes per character:
For a BMP character, utf8mb4 and utf8mb3 have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8mb4 requires four bytes to store it, whereas utf8mb3 cannot store the character at all. When converting utf8mb3 columns to utf8mb4, you need not worry about converting supplementary characters because there will be none.
在Java中使用
Connector/J did not support utf8mb4 for servers 5.5.2 and newer.
Connector/J now auto-detects servers configured with character_set_server=utf8mb4 or treats the Java encoding utf-8 passed
using characterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. (Bug #54175)
意思是:java驱动会自动检测服务端 character_set_server 的配置,如果为utf8mb4,驱动在建立连接的时候设置 SET NAMES utf8mb4。然而其他语言没有依赖于这样的特性。