2.5 一个综合范例
注:这个范例就是一串实验,在安装好虚拟机后可以直接做。
【 在虚拟机测试环境下做GP操作的实验 】
# made in shenyang.liaoning. create by hcb. 20120318.
# 大纲:
#一、试验外部表(只读):
#二、实验构建STAGE层
#三、实验构建ODS层
#四、导出数据为文本
===============================================================
一、试验外部表(只读):
步骤1:启动gpfdist 服务
nohup gpfdist -d /data/etl_data -p 8082 -l gpfdist.log&
步骤2:创建只读外部表
drop external table stage.ts_mytest_r_external;
create external table stage.ts_mytest_r_external
(
caa varchar(20),
cbb varchar(20),
ccc varchar(20),
cdd varchar(2)
)
LOCATION ('gpfdist://wmd:8082/xmx.txt*')
FORMAT 'TEXT'( DELIMITER '|')
encoding 'UTF8'
;
步骤3:建一个与外部表相对应的文本文件xmx.txt(路径:/data/etl_data)
输入以下内容
aaaa|bbbbbbb|cccccccc|1
2a2a|2b2b2b2b|2c2c2c|2
3a3a|3b3b3b3b|3c3c3c|3
4a4a|4b4b4b4b|4c4c4c|4
步骤4:查询外部表
select * from stage.ts_mytest_r_external limit 10;
===============================================================
二、实验构建STAGE层
步骤1:创建stage表(采用简单的随机分布)
drop table stage.ts_mytest;
create table stage.ts_mytest
(
caa varchar(20),
cbb varchar(20),
ccc varchar(20),
cdd varchar(2)
)
DISTRIBUTED RANDOMLY
;
步骤2:创建stage函数,将数据从外部表读取上来
create or replace function stage.p_ts_mytest()
returns void as
$body$
declare
v_schemaname varchar(40) := 'stage'; --这个东西没有使用到,纯粹无聊
begin
--清空表
EXECUTE 'TRUNCATE TABLE stage.ts_mytest;';
--进数据
insert into stage.ts_mytest(caa,cbb,ccc,cdd) select caa,cbb,ccc,cdd from stage.ts_mytest_r_external;
end; $body$
language plpgsql volatile;
将以上function代码保存到/home/gpadmin/_hcb目录下
然后用gpadmin用户(只要是有权限创建function的用户就可以)执行:
psql -f /home/gpadmin/_hcb/crt_fun.sql
(也可以在GPADMIN3上或Aginity Workbench for EMC Greenplum上创建)
然后:
psql -c "select stage.p_ts_mytest()"
(也可以在GPADMIN3上或Aginity Workbench for EMC Greenplum上输入select stage.p_ts_mytest();来执行)
这样就是实现了把外部表指向的xmx.txt文件中的数据读取到了stage.ts_mytest表里面来了。
===============================================================
三、实验构建ODS层
步骤1:创建ods表
drop table ods.ods_d_mytest;
create table ods.ods_d_mytest
(
month_id varchar(6),
day_id varchar(2),
caa varchar(20),
cbb varchar(20),
ccc varchar(20),
cdd numeric
)
WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz)
DISTRIBUTED BY (caa)
PARTITION BY LIST (month_id)
SUBPARTITION BY LIST (day_id)
SUBPARTITION TEMPLATE
(
SUBPARTITION d01 VALUES ('01'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d02 VALUES ('02'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d03 VALUES ('03'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d04 VALUES ('04'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d05 VALUES ('05'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
......
SUBPARTITION d26 VALUES ('26'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d27 VALUES ('27'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d28 VALUES ('28'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d29 VALUES ('29'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d30 VALUES ('30'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz),
SUBPARTITION d31 VALUES ('31'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz)
)
(
PARTITION m201203 VALUES ('201203'::character varying) WITH (appendonly=true, compresslevel=1, orientation=column, compresstype=quicklz)
);
COMMENT ON TABLE ods.ods_d_mytest IS '学习测试表';
COMMENT ON COLUMN ods.ods_d_mytest.month_id IS '月份';
COMMENT ON COLUMN ods.ods_d_mytest.day_id IS '日期';
COMMENT ON COLUMN ods.ods_d_mytest.caa IS 'a字段';
COMMENT ON COLUMN ods.ods_d_mytest.cbb IS 'b字段';
COMMENT ON COLUMN ods.ods_d_mytest.ccc IS 'c字段';
步骤2:创建ods函数,将stage层数据读上来,同时做些加工
create or replace function ods.p_ods_d_mytest(in v_day_id character varying,out v_retcode character varying,out v_retinfo character varying)
returns record as
$body$
declare
v_month varchar(6);
v_day varchar(2);
begin
v_month := substring(v_day_id,1,6);
v_day := substring(v_day_id,7,2);
--清空表分区
execute 'alter table ods.ods_d_mytest alter partition m'|| v_month ||' truncate partition d' ||v_day ;
--将数据从stage层读上来,同时做些处理
insert into ods.ods_d_mytest(
month_id,
day_id,
--caa,
cbb,
ccc,
cdd
)
select
v_month,
v_day,
caa,
case when cbb='bbbbbbb' then '1bbbbbb' else cbb end as cbb,
ccc,
cdd::numeric
from stage.ts_mytest
;
V_RETCODE := 'SUCCESS';
V_RETINFO := 'OK';
EXCEPTION
WHEN OTHERS THEN
V_RETCODE := 'HCB_FAIL_HCB';
V_RETINFO := SQLERRM;
RAISE NOTICE 'ERROR INFORMATION is %,%', V_RETINFO, V_RETCODE;
end;
$body$
language plpgsql volatile;
执行:
select ods.p_ods_d_mytest('20120318');
这样就实现了把数据从stage 层读到ods层并做一些加工的目的.
放到SHELL里面执行:
#! /bin/bash
v_rtn=psql -atc "select ods.p_ods_d_mytest('20120318')"
echo "==========="
echo "$v_rtn"
===============================================================
四、导出数据为文本
(一)实验用命令copy实现导出:
创建shell脚本: my_copy_out.sh
#####shell脚本--start#######
#!/bin/bash
psql -d bi -h xx -p dd -t -c "\copy (select * from ods.ods_d_mytest) to /home/gpadmin/_hcb/my_data.txt"
echo "导出完成."
#####shell脚本-- end #######
给shell脚本赋予执行权限: chmod +x my_copy_out.sh
执行shell脚本: ./my_copy_out.sh
好了可以在目录 /home/gpadmin/_hcb下看到生成了文本的数据文件my_data.txt (字段默认是以TAB为分隔符)
查看数据文件: cat my_data.txt
(二)实验用可写外部表实现导出:
步骤1:建立可写外部表
drop external table ods.ods_d_mytest_w_external;
create writable external table ods.ods_d_mytest_w_external
(
month_id varchar(6),
day_id varchar(2),
caa varchar(20),
cbb varchar(20),
ccc varchar(20),
cdd varchar(20)
)
LOCATION ('gpfdist://wmd:8082/out_xmx.txt')
FORMAT 'TEXT' (delimiter '|' null '' escape 'off')
ENCODING 'UTF8'
DISTRIBUTED RANDOMLY;
步骤2:向外部表INSERT数据,实现文本文件的生成
执行SQL:
insert into ods.ods_d_mytest_w_external(month_id,day_id,caa,cbb,ccc,cdd)
select month_id,day_id,caa,cbb,ccc,cdd from ods.ods_d_mytest
where month_id='201203' and day_id='18';
查看gpfdist所指向的目录下生成文件:out_xmx.txt (路径:/data/etl_data)
查看文件内容:
cat out_xmx.txt