Greeplum 系列(四) 实战分析

访客 阅读:245 2020-02-20 19:11:47 评论:0

Greeplum 系列(四) 实战

表结构

(1) 拉链表结构

create table public.member_fatdt0 ( 
    member_id varchar(64), 
    phoneno varchar(64), 
    dw_beg_date date, 
    dw_end_date date, 
    dtype char(1), 
    dw_status char(1), 
    dw_ins_date date 
) with (appendonly=true, compresslevel=5) 
distributed by(member_id) 
partition by range (dw_end_date) 
( 
    partition p20111201 start (date '2011-12-01') inclusive, 
    partition p20111202 start (date '2011-12-02') inclusive, 
    partition p20111203 start (date '2011-12-03') inclusive, 
    partition p20111204 start (date '2011-12-04') inclusive, 
    partition p20111205 start (date '2011-12-05') inclusive, 
    partition p20111206 start (date '2011-12-06') inclusive, 
    partition p20111207 start (date '2011-12-07') inclusive, 
    partition p30001231 start (date '3000-12-31') inclusive 
    end (date '3001-01-01') exclusive 
);

(2) 增量表结构

create table public.member_delta ( 
    member_id varchar(64), 
    phoneno varchar(64), 
    action char(1), 
    dw_ins_date date 
) with (appendonly=true, compresslevel=5) 
distributed by(member_id);

(3) 临时表结构

create table public.member_tmp0 ( 
    member_id varchar(64), 
    phoneno varchar(64), 
    dw_beg_date date, 
    dw_end_date date, 
    dtype char(1), 
    dw_status char(1), 
    dw_ins_date date 
) with (appendonly=true, compresslevel=5) 
distributed by(member_id) 
partition by list (dtype) 
( 
    partition phis values('H'), 
    partition pcur values('C'), 
    default partition other 
);

(4) 临时表结构

create table public.member_tmp1 ( 
    member_id varchar(64), 
    phoneno varchar(64), 
    dw_beg_date date, 
    dw_end_date date, 
    dtype char(1), 
    dw_status char(1), 
    dw_ins_date date 
) with (appendonly=true, compresslevel=5) 
distributed by(member_id);

数据导入

1. insert

insert into public.member_delta values('mem006', '13100000006', 'I', date'2011-12-03'); 
insert into public.member_delta values('mem002', '13100000002', 'D', date'2011-12-03'); 
insert into public.member_delta values('mem003', '13100000003', 'U', date'2011-12-03');

2. insert

mem001,13100000001,2011-12-01,3000-12-31,C,I,2011-12-01 
mem002,13100000002,2011-12-01,3000-12-31,C,I,2011-12-01 
mem003,13100000003,2011-12-01,3000-12-31,C,I,2011-12-01 
mem004,13100000004,2011-12-01,3000-12-31,C,I,2011-12-01 
mem005,13100000005,2011-12-01,3000-12-31,C,I,2011-12-01
copy public.member_fatdt0_1_prt_p30001231 from '/home/gpadmin/member_his_init.dat' with delimiter ',';

3. 外部表

首先,启动 gpfdist 服务

nohup gpfdist -d /home/gpadmin/data -p 8888 -l /home/gpadmin/data/gpfdist.log &

其次,创建外部表

drop external table if exists public.member_ext; 
create external table public.member_ext ( 
    member_id varchar(64), 
    phoneno varchar(64), 
    action char(1), 
    dw_ins_date date 
)  
location ('gpfdist://localhost:8888/member_delta.dat') 
format 'text' (delimiter ',' null as '' escape 'off') 
encoding 'utf-8' 
log errors into member_err segment reject limit 2 rows;

再编辑 /home/gpadmin/data/member_delta.dat 文件

mem001,13100000001,I,2011-12-01

最后,将外部表的数据加载到内部表

insert into public.member_delta select * from public.member_ext;

4. gpload

gpload 是对外部表的一层封装,首先编写 gpload 控制文件 gpload.yml,代码如下:

---   
VERSION: 1.0.0.1   
DATABASE: test   
USER: gpadmin   
HOST: localhost  
PORT: 5432   
GPLOAD:   
  INPUT:   
    - SOURCE:   
        LOCAL_HOSTNAME:   
          - master   
        PORT: 8888   
        FILE:   
          - /home/gpadmin/data/member_delta.dat 
    - COLUMNS:   
        - member_id: varchar(64), 
        - phoneno: varchar(20), 
        - action: char(1), 
        - dw_ins_date: date 
    - FORMAT: text   
    - DELIMITER: ','   
    - ERROR_LIMIT: 2 
    - ERROR_TABLE: public.member_err  
  OUTPUT:   
    - TABLE: public.member_delta   
    - MODE: INSERT   
  SQL: 
    - BEFORE: "truncate table public.member_delta" 
    - AFTER: "analyze public.member_delta"

执行 gpload 命令,结果如下:

gpload -f gpload.yml
2018-05-18 20:53:10|INFO|gpload session started 2018-05-18 20:53:10 
2018-05-18 20:53:10|INFO|started gpfdist -p 8888 -P 8889 -f "/home/gpadmin/data/member_delta.dat" -t 30 
2018-05-18 20:53:11|INFO|running time: 0.85 seconds 
2018-05-18 20:53:11|INFO|rows Inserted          = 1 
2018-05-18 20:53:11|INFO|rows Updated           = 0 
2018-05-18 20:53:11|INFO|data formatting errors = 0 
2018-05-18 20:53:11|INFO|gpload succeeded

5. 可执行的外部表

drop external table if exists public.member_ext; 
create external web table public.member_ext ( 
    member_id varchar(64), 
    phoneno varchar(64), 
    action char(1), 
    dw_ins_date date 
)  
execute 'cat /home/gpadmin/data/member_delta.dat' on master 
format 'text' (delimiter ',' null as '' escape 'off') 
encoding 'utf-8';

create table public.test_partition_list (
member_id numeric,
city character varying(32)
) distributed by(member_id)
partition by list(city)
(
partition guangzhou values('guangzhou'),
partition hangzhou values('hangzhou'),
partition shanghai values('shanghai'),
partition beijing values('beijing'),
default partition other_city
);


每天用心记录一点点。内容也许不重要,但习惯很重要!

标签:大数据
声明

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

发表评论
搜索
排行榜
关注我们

扫一扫关注我们,了解最新精彩内容