Fork me on GitHub

关于ClickHouse OLAP数据库下Bitmap数据存储问题研究

Bitmap File sink to ClickHouse

IMG_4358

一、针对HDFS下的parquet窄表结构下的文件进行CK明细推送、Bitmap文件推送

1、目标文件:hdfs://ns1/user/fandonglai/table1 2、标签数量:如下图所示 3、CK读请求sql

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
select bat_ts,tag_code,tag_value,sign,bitmapToArray(bitmapSubsetLimit(offset_bitmap,toUInt32(0),toUInt32(10))) from test_narrow_bitmap_dist;

select bat_ts,tag_code,tag_value,sign,bitmapCardinality(offset_bitmap) from test_narrow_bitmap where bat_ts='2021-01-25 00:00:00'; -- 单节点数据覆盖量统计

select sum(cnt) from (select bat_ts,tag_code,tag_value,sign,bitmapCardinality(offset_bitmap) as cnt from test_narrow_bitmap_dist where bat_ts='2021-01-25 00:00:00' and tag_code = 'pin004002001001') as a; -- 标签枚举值分布式数据验证

select groupBitmapOrState(offset_bitmap) from test.test_narrow_bitmap where bat_ts = '2021-01-28 00:00:00' and tag_code = 'pin0001016' and tag_value = '26'; -- 标签枚举值多分组Bitmap合并计算验证




----------------------------------------------- 分组推送Bitmap后续表操作流程 需要测试性能跟时效 -----------------------------------------------


create table if not exists test.test_narrow_bitmap_final ( `tag_code` String, `tag_value` Int32, `offset_bitmap` AggregateFunction(groupBitmap, UInt64), `bat_ts` DateTime, `sign` Int8 ) engine= ReplicatedCollapsingMergeTree('/clickhouse/tables/{shard}/test_narrow_bitmap_final', '{replica}', sign) partition by bat_ts order by (tag_code, tag_value, bat_ts) settings index_granularity = 8192;


insert into test.test_narrow_bitmap_final SELECT tag_code,tag_value,groupBitmapOrState(offset_bitmap),bat_ts,sign FROM test.test_narrow_bitmap WHERE bat_ts = '2021-01-28 00:00:00' AND tag_code = 'pin0001016' AND tag_value = '26' group by bat_ts,tag_code,tag_value,sign;

image2021-1-26_17-55-14

image2021-1-26_17-57-21

二、各阶段优化流程

实战优化流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

资源配比 --driver-cores 1 --driver-memory 10G --executor-cores 1 --executor-memory 20G shuffle read 325.6GB shuffle read 223.2GB shuffle write 23.4GB
第一阶段 key : [offset%20_Random.nextInt(1)] value : [ArrayBuffer[Long]()]
mapPartitions + (offsetMap.iterator) + reduceByKey(_ ++ _) 数组极大内存的占用,导致内存溢出


第二阶段 key : [offset%20_Random.nextInt(1):tagCode_tagValue] value : [offset.mkString("\n")]
tagCode和tagValue同分组key进行组合作为统一key,value采用字符串拼接的方式,放弃数组存储,极大降低中间结果的数据集大小,但foreach阶段耗时过长,CK放大倍数强制为1,无法发挥推送阶段的并发度


第三阶段 key : [offset%20_Random.nextInt(1):tagCode_tagValue] value : [encode(offset.mkString("\n"))] 8.3 + 2.1 + 3.4 + 4 + 14 = 30min pin0001016
reduceByKey((pre, after) => Seq(pre, after).mkString("\n")) 序列化数组函数抽象化,函数形式在foreach阶段对tagCode_tagValue下的offset数组进行ClickHouse序列化


第四阶段 key : [offset%20_Random.nextInt(1):tagCode_tagValue] value : [encodeString] 15min pin004002001001 pin004002001002 pin004002001003
前置数据归一化处理只记录tagCode,tagValue,offset三个字段,对于sign字段和dt字段不再进行存储及赋值,统一在ClickHouse-insert阶段进行
数组ClickHouse RBM方式 : new String(Base64.getEncoder.encode(serialize(Roaring64NavigableMap.bitmapOf(partValues: _*)).array()))
采用aggregateByKey("", new PushBitmapPartitioner(hpMapping.size, CKPusherPO.ckMultiple))((initStr, x) => initStr + "\n" + x,(str1, str2) => Seq(str1, str2).mkString("\n"))
-------------本文结束感谢您的阅读-------------
大 吉 大 利!