打开ParquetWriter发现大部分构造方法都是过时的(@Deprecated),经过仔细的百度,和读源码,才发现原来创建ParquetWriter对象采用内部类Builder来build();
实例:(Apache parquet1.9.0)
ExampleParquetWriter.Builder builder = ExampleParquetWriter
.builder(file).withWriteMode(ParquetFileWriter.Mode.CREATE)
.withWriterVersion(ParquetProperties.WriterVersion.PARQUET_1_0)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withConf(configuration).withType(schema);
ParquetWriterwriter = builder.build();
SimpleGroupFactory groupFactory = new SimpleGroupFactory(schema);
String[] access_log = { "111111", "22222", "33333", "44444", "55555",
"666666", "777777", "888888", "999999", "101010" };
for(int i=0;i<1000;i++){
writer.write(groupFactory.newGroup()
.append("log_id", Long.parseLong(access_log[0]))
.append("idc_id", access_log[1])
.append("house_id", Long.parseLong(access_log[2]))
.append("src_ip_long", Long.parseLong(access_log[3]))
.append("dest_ip_long", Long.parseLong(access_log[4]))
.append("src_port", Long.parseLong(access_log[5]))
.append("dest_port", Long.parseLong(access_log[6]))
.append("protocol_type", Integer.parseInt(access_log[7]))
.append("url64", access_log[8])
.append("access_time", access_log[9]));
}
writer.close();
参考链接:http://www.programcreek.com/java-api-examples/index.php?api=parquet.hadoop.api.WriteSupport