最近在hive建立视图的时候,发现如果column出现中文会出现乱码;
如:CREATE VIEW encoded_test AS SELECT '你好' FROM customers;
执行查询语句出现下面情况:
这里我们发现中文出现乱码,了解hive元数据的都知道肯定是hive元数据表的编码的问题。
进一步了解发现虽然hive元数据库编码设置为utf8,但是TBLS表的定义却有自己的编码,查看DDL发现
CREATE TABLE `TBLS` (
`TBL_ID` bigint(20) NOT NULL,
`CREATE_TIME` int(11) NOT NULL,
`DB_ID` bigint(20) DEFAULT NULL,
`LAST_ACCESS_TIME` int(11) NOT NULL,
`OWNER` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`RETENTION` int(11) NOT NULL,
`SD_ID` bigint(20) DEFAULT NULL,
`TBL_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`TBL_TYPE` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`VIEW_EXPANDED_TEXT` mediumtext,
`VIEW_ORIGINAL_TEXT` mediumtext,
PRIMARY KEY (`TBL_ID`),
UNIQUE KEY `UNIQUETABLE` (`TBL_NAME`,`DB_ID`),
KEY `TBLS_N50` (`SD_ID`),
KEY `TBLS_N49` (`DB_ID`),
CONSTRAINT `TBLS_FK1` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`),
CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
表的默认编码为latin1,因此找到问题所在,因此修改两个字段的编码即可;
ALTER TABLE `TBLS` MODIFY COLUMN VIEW_EXPANDED_TEXT mediumtext CHARACTER SET utf8;
ALTER TABLE `TBLS` MODIFY COLUMN VIEW_ORIGINAL_TEXT mediumtext CHARACTER SET utf8;
重新建立视图我们发现: