- Spark 读取GBK文件
sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], 1)
.map(p => new String(p._2.getBytes, 0, p._2.getLength, "GBK"))
- Spark写GBK文件
val result: RDD[(NullWritable, Text)] = totalData.map {
item =>
val line = s"${item.query}"
(NullWritable.get(), new Text(line.getBytes("GBK")))
}
//设置输出格式,以GBK存储
result.saveAsNewAPIHadoopFile(path, classOf[NullWritable],
classOf[Text], classOf[TextOutputFormat[NullWritable, Text]])
参考:
RDD行动Action操作(6)–saveAsHadoopFile
Spark多文件输出(MultipleOutputFormat)
Hadoop多文件输出:MultipleOutputFormat和MultipleOutputs深究(一)