Fork me on GitHub

ClickHouse推送实战及集群部署

高版本ClickHouse管道服务实战及集群调优

11584362890_.pic

一、ZK集群优化:ZK日志压力不再成为推送瓶颈

1、ZK集群与CK集群隔离,不参与混布,现阶段节点个数为5,后期可切换成标准3节点ZK

2、ZK集群配置信息调整

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
A、事务日志和快照日志进行分盘存储,同时调整日志挂载盘所在位置,由原先的/export路径下的200G空间调整为/export/grid下的5.5T
事务日志存储路径:/export/grid/01/zookeeper/data
快照日志存储路径:/export/grid/02/zookeeper/log
12天运行事务日志信息存储量为60G、快照日志信息存储量为5.3G

B、针对日志目录下的数据文件,配置crontab定时清理任务,默认保留30天的日志信息记录。
0 0 * * * sudo find /export/grid/01/zookeeper/data/version-2 -mtime +30 -name "log.*" -exec rm -Rf {} \;
0 0 * * * sudo find /export/grid/02/zookeeper/log/version-2 -mtime +30 -name "snapshot.*" -exec rm -Rf {} \;

C、快照日志保留数量由5调整为2,日志过期策略时间由24H调整为1H
autopurge.snapRetainCount=2
autopurge.purgeInterval=1

D、其它配置项参考官方文档及社区经验,针对CK集群下的ZK使用进行数值调整
tickTime=2000 initLimit=300 syncLimit=10 maxClientCnxns=2000 maxSessionTimeout=60000000 snapCount=3000000 preAllocSize=131072 forceSync=no leaderServes=yes standaloneEnabled=false
clientPort=2181
# Zookeeper Cluster
server.1=10.220.232.170:2888:3888
server.2=10.220.232.156:2888:3888
server.3=10.220.232.158:2888:3888
server.4=10.220.232.164:2888:3888
server.5=10.220.232.157:2888:3888

E、针对zoo.cfg,zoo_sample.cfg,log4j.properties三个配置文件进行统一的日志存储路径调整

F、CK删除数据目录后,谨记ZCli下进行相应ZK数据目录及文件的删除

3、测试过程中的坑

1
日志目录的手动创建、myid文件的缺失(根据 Zookeeper Cluster的对应关系进行节点myid文件配置)、version-2文件缺失创建、zookpeer-server.pid文件赋权操作

4、ZK 配置项调整内容 【采用三台标N机器 - 10.220.45.72、10.220.45.73、10.220.45.74】

*log4j.properties *

1
2
3
4
5
6
7
zookeeper.root.logger=INFO, ROLLINGFILE 
zookeeper.console.threshold=INFO
zookeeper.log.dir=/export/zookeeper/log4j_dir
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=DEBUG
zookeeper.tracelog.dir=.
zookeeper.tracelog.file=zookeeper_trace.log

zoo_sample.cfg

1
2
3
4
tickTime=2000 
initLimit=100
syncLimit=5
dataDir=/export/zookeeper/data_sample

zoo.cfg

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
tickTime=2000
initLimit=300
syncLimit=10
maxClientCnxns=2000
maxSessionTimeout=60000000
snapCount=3000000
preAllocSize=131072
clientPort=2181
dataDir=/export/data/zookeeper/data
dataLogDir=/export/data/zookeeper/log
autopurge.snapRetainCount=2
autopurge.purgeInterval=1
4lw.commands.whitelist=*
forceSync=no
leaderServes=yes
standaloneEnabled=false


==================================生产配置=============================================
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=300
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10

# The directory where the snapshot is stored.
dataDir=/export/data/zookeeper/data
dataLogDir=/export/data/zookeeper/log

# The number transact log number of a snapshot
snapCount=3000000

maxSessionTimeout=60000000

# The port at which the clients will connect
clientPort=2181
# The maximum number of client connections.
# increase this if you need to handle more clients
maxClientCnxns=2000

preAllocSize=131072

# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=2
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1
4lw.commands.whitelist=*

forceSync=no
leaderServes=yes
standaloneEnabled=false

# Zookeeper Cluster
server.1=10.220.45.74:2888:3888
server.2=10.220.45.73:2888:3888
server.3=10.220.45.72:2888:3888


#authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
#requireClientAuthScheme=sasl
#sessionRequireClientSASLAuth=true

定时清理日志脚本

1
2
0 0 * * * sudo find /export/data/zookeeper/data/version-2 -mtime +30 -name "log.*" -exec rm -Rf {} \;
0 0 * * * sudo find /export/data/zookeeper/log/version-2 -mtime +30 -name "snapshot.*" -exec rm -Rf {} \;

二、CK集群优化:【标D机器-12块*5.5T磁盘容量,CK推送write平均速率保持在50MB/s ~ 100MB/s,2.6Kw/s ~ 2.9Kw/s】

1、集群调优内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
A、CK单机双实例模式,9000端口和13000端口,01-06为主端口存储盘,07-12为副端口存储盘

B、CK节点配置crontab自启动脚本 sh /etc/clickhouse-server/restart_clickhouse.sh

C、CK节点搭载监控Granfana监控,插件部署内容包含clickhouse_exporter_v2、node_exporter-1.0.1.linux-amd64.tar.gz,值得注意:解压、nohup过程采用root用户执行

D、测试过程中的坑
a、Drop表删除失败问题:针对复制表模式下的CK表,Dropping readonly replicated table 通过detach命令
b、删除数据目录CK无法启动问题:通过数据目录权限新建、赋权及启动脚本配置项修改
c、查询数据量较大,单机Client查询失败问题:修改clickhouse user.xml文件,内存限制调整为max_memory_usage = 40000000000
d、Clickhouse 配置文件包含 config9000.xml、config13000.xml、metrika9000.xml、metrika13000.xml、users9000.xml、users13000.xml
e、CK底层数据目录删除后,需要重新创建/clickhouse/data、
/clickhouse/jbod、/clickhouse/log文件,同时针对上述文件夹进行 sudo chown -R clickhouse:clickhouse、sudo chmod -R 775 赋权操作

E、CK 配置项调整内容 【采用40台标D机器 - 20台主、20台备;单机单实例,单实例采用默认端口9000】

2、生产同步配置

1
2
3
4
5
6
7
<max_insert_block_size>4194304</max_insert_block_size>
<merge_tree>
<parts_to_delay_insert>600</parts_to_delay_insert>
<parts_to_throw_insert>800</parts_to_throw_insert>
<max_delay_to_insert>2</max_delay_to_insert>
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</merge_tree>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
sudo mkdir -p /export/grid/01/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/01/clickhouse/data/9000/
sudo mkdir -p /export/grid/01/clickhouse/log/9000/
sudo mkdir -p /export/grid/02/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/02/clickhouse/tmp/9000/
sudo mkdir -p /export/grid/03/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/04/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/05/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/06/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/07/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/08/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/09/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/10/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/11/clickhouse/jbod/9000/
sudo mkdir -p /export/grid/12/clickhouse/jbod/9000/
sudo chown -R clickhouse:clickhouse /export/grid/01/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/01/clickhouse/data/9000
sudo chown -R clickhouse:clickhouse /export/grid/01/clickhouse/log/9000
sudo chown -R clickhouse:clickhouse /export/grid/02/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/02/clickhouse/tmp/9000
sudo chown -R clickhouse:clickhouse /export/grid/03/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/04/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/05/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/06/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/07/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/08/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/09/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/10/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/11/clickhouse/jbod/9000
sudo chown -R clickhouse:clickhouse /export/grid/12/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/01/clickhouse/data/9000
sudo chmod -R 775 /export/grid/01/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/01/clickhouse/log/9000
sudo chmod -R 775 /export/grid/02/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/02/clickhouse/tmp/9000
sudo chmod -R 775 /export/grid/03/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/04/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/05/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/06/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/07/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/08/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/09/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/10/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/11/clickhouse/jbod/9000
sudo chmod -R 775 /export/grid/12/clickhouse/jbod/9000
sudo cp /etc/rc.d/init.d/clickhouse-server /etc/rc.d/init.d/clickhouse-server-9000
sudo chown clickhouse:clickhouse /etc/rc.d/init.d/clickhouse-server-9000
sudo chmod 775 /etc/rc.d/init.d/clickhouse-server-9000
sudo mv /etc/clickhouse-server/clickhouse-server-9000 /etc/rc.d/init.d/clickhouse-server-9000
sudo service clickhouse-server-9000 start

config.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
<logger> 
<level>trace</level>
<log>/export/grid/01/clickhouse/log/9000/clickhouse-server.log</log>
<errorlog>/export/grid/01/clickhouse/log/9000/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
</logger>

<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<max_connections>4096</max_connections>
<keep_alive_timeout>3</keep_alive_timeout>
<max_concurrent_queries>100</max_concurrent_queries>
<uncompressed_cache_size>8589934592</uncompressed_cache_size>
<mark_cache_size>5368709120</mark_cache_size>
<path>/export/grid/01/clickhouse/data/9000/</path>
<tmp_path>/export/grid/02/clickhouse/tmp/9000/</tmp_path>
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>
<distributed_ddl>
<!-- Path in ZooKeeper to queue with DDL queries -->
<path>/clickhouse/task_queue/ddl</path>
</distributed_ddl>

<format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>
<storage_configuration>
<disks>
<disk_one>
<path>/export/grid/01/clickhouse/jbod/9000/</path>
</disk_one>
<disk_two>
<path>/export/grid/02/clickhouse/jbod/9000/</path>
</disk_two>
<disk_three>
<path>/export/grid/03/clickhouse/jbod/9000/</path>
</disk_three>
<disk_four>
<path>/export/grid/04/clickhouse/jbod/9000/</path>
</disk_four>
<disk_five>
<path>/export/grid/05/clickhouse/jbod/9000/</path>
</disk_five>
<disk_six>
<path>/export/grid/06/clickhouse/jbod/9000/</path>
</disk_six>
<disk_seven>
<path>/export/grid/07/clickhouse/jbod/9000/</path>
</disk_seven>
<disk_eight>
<path>/export/grid/08/clickhouse/jbod/9000/</path>
</disk_eight>
<disk_nine>
<path>/export/grid/09/clickhouse/jbod/9000/</path>
</disk_nine>
<disk_ten>
<path>/export/grid/10/clickhouse/jbod/9000/</path>
</disk_ten>
<disk_eleven>
<path>/export/grid/11/clickhouse/jbod/9000/</path>
</disk_eleven>
<disk_twelve>
<path>/export/grid/12/clickhouse/jbod/9000/</path>
</disk_twelve>
</disks>
<policies>
<policy_jbod>
<volumes>
<volume_one>
<disk>disk_one</disk>
<disk>disk_two</disk>
<disk>disk_three</disk>
<disk>disk_four</disk>
<disk>disk_five</disk>
<disk>disk_six</disk>
<disk>disk_seven</disk>
<disk>disk_eight</disk>
<disk>disk_nine</disk>
<disk>disk_ten</disk>
<disk>disk_eleven</disk>
<disk>disk_twelve</disk>
</volume_one>
</volumes>
<move_factor>0.3</move_factor>
</policy_jbod>
</policies>
</storage_configuration>
<max_insert_block_size>4194304</max_insert_block_size>
<merge_tree>
<parts_to_delay_insert>600</parts_to_delay_insert>
<parts_to_throw_insert>800</parts_to_throw_insert>
<max_delay_to_insert>2</max_delay_to_insert>
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
<old_parts_lifetime>120</old_parts_lifetime>
<use_minimalistic_part_header_in_zookeeper>1</use_minimalistic_part_header_in_zookeeper>
</merge_tree>
<distributed_ddl>
<path>/clickhouse/tables/task_queue/ddl</path>
<cleanup_delay_period>60</cleanup_delay_period>
<task_max_lifetime>172800</task_max_lifetime>
<max_tasks_in_queue>500</max_tasks_in_queue>
</distributed_ddl>
<max_table_size_to_drop>0</max_table_size_to_drop>
<max_partition_size_to_drop>0</max_partition_size_to_drop>




===============================================生产配置=============================================================
<?xml version="1.0"?>
<yandex>
<logger>
<level>trace</level>
<log>/export/servers/data/clickhouse/9000/log/clickhouse-server.log</log>
<errorlog>/export/servers/data/clickhouse/9000/log/clickhouse-server.err.log</errorlog>
<size>100M</size>
<count>10</count>
</logger>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<mysql_port>9004</mysql_port>
<interserver_http_port>9009</interserver_http_port>
<openSSL>
<server> <!-- Used for https server AND secure tcp port -->
<certificateFile>/export/servers/data/clickhouse/9000/run/server.crt</certificateFile>
<privateKeyFile>/export/servers/data/clickhouse/9000/run/server.key</privateKeyFile>
<dhParamsFile>/export/servers/data/clickhouse/9000/run/dhparam.pem</dhParamsFile>
<verificationMode>none</verificationMode>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
</server>

<client> <!-- Used for connecting to https dictionary source -->
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>

<listen_host>0.0.0.0</listen_host>
<!-- Default values - try listen localhost on ipv4 and ipv6: -->
<!--
<listen_host>::1</listen_host>
<listen_host>127.0.0.1</listen_host>
-->
<max_connections>1024</max_connections>

<keep_alive_timeout>60</keep_alive_timeout>
<max_concurrent_queries>100</max_concurrent_queries>
<uncompressed_cache_size>8589934592</uncompressed_cache_size>
<mark_cache_size>10737418240</mark_cache_size>
<access_control_path>/export/servers/data/clickhouse/9000/access/</access_control_path>
<path>/export/servers/data/clickhouse/9000/data/</path>
<tmp_path>/export/servers/data/clickhouse/9000/tmp/</tmp_path>
<user_files_path>/export/servers/data/clickhouse/9000/user_files/</user_files_path>
<format_schema_path>/export/servers/data/clickhouse/9000/format_schemas/</format_schema_path>
<users_config>users.xml</users_config>
<default_profile>default</default_profile>
<default_database>default</default_database>
<timezone>Asia/Shanghai</timezone>
<mlock_executable>false</mlock_executable>
<remote_servers incl="clickhouse_remote_servers"/>
<include_from>/export/servers/data/clickhouse/9000/metrika.xml</include_from>
<zookeeper incl="zookeeper-servers" optional="true" />
<macros incl="macros" optional="true" />
<builtin_dictionaries_reload_interval>30</builtin_dictionaries_reload_interval>
<max_session_timeout>3600</max_session_timeout>
<default_session_timeout>60</default_session_timeout>
<query_log>
<database>system</database>
<table>query_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
<trace_log>
<database>system</database>
<table>trace_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</trace_log>
<query_thread_log>
<database>system</database>
<table>query_thread_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_thread_log>
<dictionaries_config>*_dictionary.xml</dictionaries_config>
<compression incl="clickhouse_compression">
</compression>
<merge_tree>
<parts_to_delay_insert>600</parts_to_delay_insert>
<parts_to_throw_insert>800</parts_to_throw_insert>
<max_delay_to_insert>2</max_delay_to_insert>
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
<old_parts_lifetime>120</old_parts_lifetime>
<use_minimalistic_part_header_in_zookeeper>1</use_minimalistic_part_header_in_zookeeper>
</merge_tree>
<distributed_ddl>
<path>/clickhouse/tables/task_queue/ddl</path>
<cleanup_delay_period>60</cleanup_delay_period>
<task_max_lifetime>172800</task_max_lifetime>
<max_tasks_in_queue>500</max_tasks_in_queue>
</distributed_ddl>
<max_table_size_to_drop>0</max_table_size_to_drop>
<max_partition_size_to_drop>0</max_partition_size_to_drop>
</yandex>

metrika.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<zookeeper-servers>
<node index="1">
<host>10.220.232.170</host>
<port>2181</port>
</node>
<node index="2">
<host>10.220.232.156</host>
<port>2181</port>
</node>
<node index="3">
<host>10.220.232.158</host>
<port>2181</port>
</node>
</zookeeper-servers>
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>

users.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
<?xml version="1.0"?>
<yandex>
<profiles>
<default>
<max_memory_usage>68719476736</max_memory_usage>
<max_memory_usage_for_user>68719476736</max_memory_usage_for_user>
<use_uncompressed_cache>1</use_uncompressed_cache>
<load_balancing>random</load_balancing>
<log_queries>1</log_queries>
<max_threads>24</max_threads>
<max_execution_time>1000</max_execution_time>
<!-- force_index_by_date>1</force_index_by_date -->
<force_index_by_date>0</force_index_by_date>
<max_columns_to_read>200</max_columns_to_read>
<!-- max_rows_to_read>100000000000</max_rows_to_read -->
<max_rows_to_group_by>100000000000</max_rows_to_group_by>
<max_rows_to_sort>100000000000</max_rows_to_sort>
<max_rows_in_distinct>10000000000</max_rows_in_distinct>
<!-- max_rows_to_transfer>10000000000</max_rows_to_transfer -->
<skip_unavailable_shards>1</skip_unavailable_shards>
<max_bytes_before_external_group_by>60000000000</max_bytes_before_external_group_by>
<max_partitions_per_insert_block>5000</max_partitions_per_insert_block>
<distributed_aggregation_memory_efficient>1</distributed_aggregation_memory_efficient>
<max_insert_block_size>4194304</max_insert_block_size>
</default>
<readonly>
<readonly>2</readonly>
<log_queries>1</log_queries>
<max_threads>24</max_threads>
<force_index_by_date>1</force_index_by_date>
<max_execution_time>300</max_execution_time>
<max_memory_usage>68719476736</max_memory_usage>
<max_memory_usage_for_user>68719476736</max_memory_usage_for_user>
<use_uncompressed_cache>1</use_uncompressed_cache>
<max_columns_to_read>60</max_columns_to_read>
<!-- max_rows_to_read>100000000000</max_rows_to_read -->
<max_rows_to_group_by>100000000000</max_rows_to_group_by>
<max_rows_to_sort>100000000000</max_rows_to_sort>
<max_rows_in_distinct>30000000000</max_rows_in_distinct>
<!-- max_rows_to_transfer>10000000000</max_rows_to_transfer -->
<skip_unavailable_shards>1</skip_unavailable_shards>
<max_bytes_before_external_group_by>60000000000</max_bytes_before_external_group_by>
<distributed_aggregation_memory_efficient>1</distributed_aggregation_memory_efficient>
</readonly>
</profiles>
<users>
<default>
<password_sha256_hex>87ccaf4a5f72d09512b1311a1ed293b8d5ba5af6fc6b1afa326e52c8f311dcbe</password_sha256_hex>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<access_management>1</access_management>
<profile>default</profile>
<quota>default</quota>
</default>
</users>
<quotas>
<default>
<interval>
<duration>3600</duration>
<queries>0</queries>
<errors>0</errors>
<result_rows>0</result_rows>
<read_rows>0</read_rows>
<execution_time>0</execution_time>
</interval>
</default>
</quotas>
</yandex>

整个离线表CK推送过程氛围三个阶段:1)RDD :Hive数据源获取及结构转换(时长受限于:数据源标结构、数据质量、有效数据量、5K集群稳定-节点NameNode是否正常)2)Map:待推送数据区间Hash、重分片及推送字段字符串StringTokenizer格式化(Spark最大并发数、repartition数量、CK Hash方式)3)Foreach:数据真实Insert阶段(Insert bacth大小、Thread休眠时间、Spark最大并发数、CK CPU有效承受范围、CK Merge性能-磁盘IO读写速度、推送数据量-offset量+标签数量+标签枚举量

三、推送逻辑优化

代码层面优化 重要

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
1、建表语句添加use_minimalistic_part_header_in_zookeeper=1参数配置

2、添加失败队列,统计每日推送失败的表信息,同时针对加工失败的推送表进行自动二次加工(原因为:Merges are processing significantly slower than inserts)

3、添加删除CK历史分区逻辑,保证磁盘存储量控制在一定范围。(针对17张业务宽表,保留3天的数据分区,单机无备份存储量3.6T保持在附近,以10.220.38.147节点为例,01-588G、02-692G、03-601G、04-614G、05-625G、06-605G、07-607G、08-595G、09-625G、10-597G、11-660G、12-622G,其中01-06盘为主数据盘、07-12盘为同步备份数据盘)

4、适配不同宽表数据量的情况下,spark使用资源分为高、中、低三档进行动态调整

5、CK推送添加全局锁,同时采用sample.count的方式,避免spark懒执行问题,准确获取推送阶段真正与CK进行数据交互时的时间段,同应用端的位图计算及预估、画像生成等内容进行读写隔离

6、CK Hash规则进行调整,从全局Hash方式x%n调整为区间Hash方式(x/y)%n,结合CK底层位图的存储方式,以2^20次方作为区间间隔,使得单机offset分布更连续,降低CK底层Bitmap稀疏度,应用端计算位图时,能针对更小数据区间offset位图进行读操作

7、插入数据集按StringTokenizer进行格式化,同时针对-9999、null值用"&"字符替代,极大降低数据传输过程中的IO量及shuffle大小

8、引入Thread休眠时间,同时executor-cores控制为1,保证最多500 executors执行推送操作,尽量避免插入速度多快引起的CK Merge异常

9、Insert-Batch同CK-mulitiple参数组合调优,Insert-Batch极大影响CK Foreach阶段耗时,Batch块、CK-mulitiple越大,耗时越短,同时CPU指标越大,也越容易产生 Merges are processing significantly slower than inserts 错误。值得注意:CK-mulitiple参数的引入,解决了单节点数据连续推送导致的机器热点问题,有效保证了Spark按partitionID顺序推送时,较短时间内的partition能分发到不同机器节点执行,而不是单机连续推送,同时进一步提高了CK推送阶段的数据并发能力。

ckMulitiple 100 Batch 20000000 CPU利用率提升60%,推送速度提升62%

四、前置数据优化:【前置数据的标准化及数据质量尤为关键、涉及到推送阶段1)和阶段3)的运行时长】

代码层面优化 重要

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
整体推送结论如下:全流程宽表推送服务已线上平稳运行一周,整体能保证下午15:00前保证所有表的最新分区维护到OLAP,CK实现T+1的数据更新。
以2020-11-14为例,12:00能完成75%、12:30完成87.5%;2020-11-16为例,13:30完成81.25%、14:00完成87.5%
其中推送时间超60min的表仅有两张 dmxupf_width_jd_usr_pin_jdmall_all_alpha_add_s_d(126min)、dmxupf_width_biz_jd_usr_pin_csf_s_d(100min)、dmxupf_width_biz_jd_usr_pin_pay_s_d(90min)- 建议:行数修正、枚举值修正
1、同一行内,某一JDPIN下的所有标签值,不可全为null、-9999或者"-9999",加工逻辑请针对指定枚举进行有效打标。
针对 dmxupf_width_jd_usr_pin_crd_s_d 测试结论如下:数据行数急剧缩小的基础上74%,CK推送数据量不变,整个RDD、Map阶段时间24.3min缩短至14min,实效提升42.3%

2、宽表行数:保证宽表的行数控制在有效真实活跃的JDPIN上,数据量不可随全量PIN池大小进行膨胀。
针对 dmxupf_width_biz_jd_usr_pin_csf_s_d 测试结论如下:Hive表行数减少60%,列数增加40%,同样CK推送量下,时效提升22% ~ 33%

3、宽表列数:宽表列数可根据实际业务线标签量自主进行调整,当行数相同时,列数影响不会太大,结合推送调度资源分配及后期标签开发对外开放而言,建议业务宽表列数可以控制在200~400列区间内,尽量做到宽表列数不要过小也不要过大,提高标签质量,及时下线无用标签,有限资源留给业务真实需要的标签推送,同时便于分析师后期维护,具体规则可自行内部协商统一。

4、标签枚举值个数:现有CK推送表架构下,标签枚举值会急剧影响推送数据量,需要再次衡量确认是否枚举标签含有无效、特殊枚举信息。标签枚举值个数超过100的标签总共46个标签,占总标签 46 / 3445 = 1.34%,
具体明细如下 :dmxupf_width_jd_usr_pin_jdmall_all_alpha_add_s_d:33、dmxupf_width_com_jd_usr_pin_basic_s_d:6、dmxupf_width_biz_jd_usr_pin_csf_s_d:2、dmxupf_width_biz_jd_usr_pin_fin_s_d:2、 dmxupf_width_biz_jd_usr_pin_jdjr_app_s_d:2、dmxupf_width_cf_biz_usr_pin_two_s_d:1
需要重点关注加粗两表表名,基础表:4645、4645、459、364、328、174 商城表:986(13个标签)、407、174、134(18个标签)

5、针对标签覆盖量、标签所在表、标签枚举值覆盖量统计如下:
A、标签覆盖量:仅有25个标签的覆盖量超过10亿,占比0.72%,来自7张业务宽表,其中四张表占比达到84%,具体明细如下:(pin0000612, pin010000001, pin0001738, pin0002368, pin0001985, pin002003002, pin004002001007, pin0000276, pin0002477, pin001004001, pin0002195, pin002003001, pin0002045, pin0002046, pin001001001, pin004007002037, pin004007002001, pin004007002003, pin004007002005, pin004007002007, pin004007002023, pin004007002031, pin004007002010, pin004007002047, pin004002001005)dmxupf_width_biz_jd_usr_pin_pay_s_d占比40%,dmxupf_width_com_jd_usr_pin_basic_s_d占比20%,dmxupf_width_jd_usr_pin_jdmall_all_alpha_add_s_d占比12%,dmxupf_width_cf_usr_risk_prod_width_s_d占比12%,dmxupf_width_cf_biz_usr_pin_two_s_d占比8%,dmxupf_width_biz_jd_usr_pin_insu_s_d占比4%,dmxupf_width_biz_jd_usr_pin_csf_s_d占比4%
B、枚举值覆盖量:有效枚举 仅有6个枚举值覆盖量超过15亿,仅有31个枚举值覆盖量超过10亿;无效枚举-9999覆盖量超过10亿的为878个,占总枚举的0.16%

结论:覆盖量超过10亿的标签是否可以调整逻辑,或者拆表的时候,可以单独拆分放到一张表;针对case when等逻辑,需要评估是否还需要添加then -9999逻辑;枚举值过多的标签需要进一步评估口径,同时可将其与高覆盖量的标签放到一张宽表中进行加工。

五、新CK集群 40节点(20主、20备) + 单节点单实例 + 拆分业务宽表推送时间记录

资源参数:–driver-cores 1 –driver-memory 6G –executor-cores 1 –executor-memory 20G

配置参数:parallelism = 2001;ckMultiple = 55;batchInsert = 500000000;threadTime = 0s

结论:标签宽表的数据结构较大程度地影响推送第一阶段RDD生成时间及shuffle量的大小,(单表400列,下线200列)和(单表200列)两种Hive表结构进行测试对比,后者明显shuffle量会下降,第一阶段耗时也会削减50%左右,同时Hive表的存储方式如果能从ORC调整为PARQUET,预计整体推送时间仍能提升50%以上。

目前推送测试情况如下 标签宽表10亿行*300列,CK推送数据量膨胀比为10倍的情况下,整表推送时间的基线保持在(25min + 5min = 30min)

标准表表名 2020-12-06 宽表行数 宽表列数 Hive数据量 CK_dist数据量 推送时间
dmxupf_width_biz_v3_jd_usr_pin_sec_01_s_d 8.13min(历史) 3143411674 26 41.0G 8355452 2020-12-08 16:10:26 (4.6min + 24s + 9s + 10s = 5.3min)CPU 25%
dmxupf_width_biz_v3_jd_usr_pin_jdjr_01_s_d 63.3min(历史) 1036113721 305 35.2G 21057585966 2020-12-08 14:40:08(14min + 13min + 1min + 3min = 31min)CPU 40%
dmxupf_width_biz_v3_jd_usr_pin_basic_01_s_d 71.5min(历史) 1522254399 26 32.3G 8565912577 2020-12-08 15:36:22(6.2min + 3.9min + 51s + 3min = 14min)CPU 25%
dmxupf_width_biz_v3_jd_usr_pin_fin_01_s_d 476631524 590 22.6G 11953291531 2020-12-08 16:23:50(17min + 5.9min + 38s + 4.3min = 28min)CPU 30%
dmxupf_width_biz_v3_jd_usr_pin_fin_02_s_d 39.25min(历史) 391740122 149 7.3G 2813360320 2020-12-08 16:33:39(14min + 2.2min + 26s + 55s = 18min)CPU 15%
dmxupf_width_biz_v3_jd_usr_pin_mix_01_s_d 是否要用22G??? 3174724968 30 70.3G 45786681688 2020-12-08 17:04:06(9.4min + 13min + 2.7min + 16min = 41min)CPU 40%
dmxupf_width_biz_v3_jd_usr_pin_mix_02_s_d 978157942 40 29.7G 5725585312 2020-12-08 17:07:44 (7.3min + 3.2min + 1.6min + 2min = 16min)CPU 15%
dmxupf_width_biz_v3_jd_usr_pin_cf_01_s_d 892646924 391 27.6G 13301186218 2020-12-08 17:30:15 (16min + 8.1min + 41s + 4.6min = 30min)CPU 20%
dmxupf_width_biz_v3_jd_usr_pin_cf_02_s_d 181929673 137 2.5G 1077344533 2020-12-08 17:51:24 (13min + 2.6min + 20s + 40s = 17min)CPU 15%
dmxupf_width_biz_v3_jd_usr_pin_cf_03_s_d 3051212081 238 27.5G 10371289201 2020-12-08 18:06:24 (21min + 4.9min + 38s + 3.6min = 31min)CPU 25%
dmxupf_width_biz_v3_jd_usr_pin_cf_04_s_m 99.25min / 51.6min(历史) 3021392287 17 20.6G 2020-12-08 18:31:04(5.6min + 2.6min + 18s + 1min = 10min)CPU 15%
dmxupf_width_biz_v3_jd_usr_pin_jdmall_01_s_d 3175643883 269 82.5G 34435203007 2020-12-08 19:15:23 (7.3min + 13min + 1.4min + 9min = 32min)CPU 30%
dmxupf_width_biz_v3_jd_usr_pin_jdmall_02_s_d 112.12min(历史) 1380243174 239 57.7G 32467424440 2020-12-08 18:51:05 (7.9min + 13min + 1.5min + 10min = 32min)CPU 35%
dmxupf_width_biz_jd_usr_pin_pay_s_d 72.37min(历史) 1936821583 144 49.1G 43633474737 2020-12-03 17:51:02 500 + 15G + 100 + 200000 + 2s(12min + 7.2min + 12min + 1.1min + 30min = 62.3min)75%2020-12-03 18:51:30 500 + 17G + 55 + 20000000 + 0s(11min + 7.2min + 11min + 2.2min + 17min = 48.4min)75%2020-12-03 21:58:36 1000 + 20G + 150 + 20000000 + 0s(8.9min + 7.4min + 13min + 2.2min + 21min = 54min)60%2020-12-04 15:44:25 500 + 17G + 55 + 50000000 + 0s(11min + 10min + 14min + 4.1min + 20min = 59.1min)40%2020-12-07 11:20:08 500 + 20G + 55 + 100000000 + 0s(41min + 14min = 55min)30%2020-12-07 11:20:08 500 + 20G + 55 + 500000000 + 0s(11min + 7.2min + 11min + 14min = 43min)CPU 40%(11 + 13 + 2 + 13 = 40min最终参数确认组
dmxupf_width_biz_jd_usr_pin_ftc_s_d 16min(历史) 246458197 125 1.8G 793037332 2020-12-08 10:59:01 (10min + 2.2min + 22s + 38s = 13min)CPU 15%
dmxupf_width_biz_jd_usr_pin_insu_s_d 65min(历史) 29847749741016161997 300 33.3G13.3G 7681909161 2020-12-08 10:59:01 (37min + 23min + 1.5min + 8min = 69.5min)CPU 20%pin0002368问题标签业务已经下线,待明日底表数据更新后再次推送(32min + 5.2min + 20s + 1.5min = 39min)CPU 20%(27min + 3.5min + 18s + 1.6min = 32min)CPU 20%
dmxupf_width_jd_usr_pin_crd_s_d 27.75min(历史) 825078098 243 15.1 G 3593827335 2020-12-08 12:55:05 (19min + 3.1min + 22s + 1min = 23min)CPU 20%
dmxupf_width_cs_jd_usr_pin_s_d 6.625min(历史) 17282332 51 127.9M 后期标准化表名后再推送,无推送风险
dmxupf_biz_lh_impt_tm_s_d 7.875min(历史) 107191918 25 3.7G 后期标准化表名后再推送,无推送风险
dmxupf_dmt_biz_lhzt_model_s_d 3min(历史) 213681367 6 3.2G 后期标准化表名后再推送,无推送风险
-------------本文结束感谢您的阅读-------------
大 吉 大 利!