Citus实时分析
Citus实时分析
应用程序将插入大量事件数据,并对具有亚秒级延迟要求的数据启用分析查询。
实时分析需要大规模并行的计算,协调数百个CPU核心,以便快速获得计算结果
建议分片数设置为CPU核心数的2-4倍
创建表
CREATE TABLE github_events
(
event_id bigint ,
event_type text ,
event_public boolean ,
repo_id bigint ,
payload jsonb ,
repo jsonb ,
user_id bigint ,
org jsonb ,
created_at timestamp
);
CREATE TABLE github_users
(
user_id bigint ,
url text ,
login text ,
avatar_url text ,
gravatar_id text ,
display_login text
);
CREATE INDEX event_type_index ON github_events (event_type );
CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops );
设置分发列
此示例中,我们使用user_id列
SELECT create_distributed_table ('github_users' , 'user_id' );
SELECT create_distributed_table ('github_events' , 'user_id' );
运行查询
- 查询有多少用户
SELECT count (* ) FROM github_users ;
- 查询每分钟的PushEvent的事件数量并排序
SELECT date_trunc('minute', created_at) AS minute,
sum((payload->>'distinct_size')::int) AS num_commits
FROM github_events
WHERE event_type = 'PushEvent'
GROUP BY minute
ORDER BY minute;
- 找到创建最多仓库的前十个用户
SELECT login, count(*)
FROM github_events ge
JOIN github_users gu
ON ge.user_id = gu.user_id
WHERE event_type = 'CreateEvent' AND payload @> '{"ref_type": "repository"}'
GROUP BY login
ORDER BY count(*) DESC LIMIT 10;
[CitusDB中国]站主,PostgreSQL粉丝,现从事Citus研发工作
愿Citus在中国发展的越来越好