HABSE扩容缩容

初始化集群

初始化集群:hadoop三个DataNode,HBASE集群只有一个regionserver。
后面我们不停服的情况下,动态添加一个机器node4,在node4上启动Hregionserver服务。

在这里插入图片描述

hbase动态扩容

配置regionservers

配置每台机器的regionservers,添加node4:

[root@node1 conf]# cat regionservers 
node3
node4

这一步不是必须的,但是为了更好的维护集群,在master上进行一些ssh相关操作,最好执行这一步。

在node4上启动regionserver

[root@node4 bin]# ./hbase-daemon.sh start regionserver
running regionserver, logging to /data/program/hbase-2.1.5/bin/../logs/hbase-root-regionserver-node4.out
[root@node4 bin]# jps
2148 HRegionServer
1910 DataNode
2422 Jps

开启自动负载均衡balance

balancer_enabled看一下是自动负载均衡状态,如果是关闭的,则开启。
在node4机器上开启负载均衡。命令:

hbase(main):017:0> balance_switch true
Previous balancer state : false                                                                                                                                                                                                                                               
Took 0.0262 seconds                                                                                                                                                                                                                                                           
=> "false"
hbase(main):018:0> balancer_enabled
true                                                                                                                                                                                                                                                                          
Took 0.0232 seconds                                                                                                                                                                                                                                                           
=> true

注意不要用balance_switch status这个命令,balance_switch status 经测试, 这个命令的意思是将balance状态强制转换为false, 无论之前是true还是false, 返回的是之前的状态, 所以一会儿true, 一会儿false, 所以这个命令很鸡肋, 别乱用, 查看当前状态的命令是balancer_enabled

自动balance存在的问题及解决方案

上述自动balance比较简单,但是存在严重的问题:在中小集群上,会存在很多region同时处于离线状态,因此对HBASE集群整体可用性造成比较大的影响。

region迁移过程中会出现RIT:
在这里插入图片描述
为了解决上述问题,我们最好禁用自动balance,手动迁移region

在没有开启自动负载均衡,但是开启了node4的regionserver时,这个regionserver会添加到集群中,但是region数量是0
在这里插入图片描述

我们现在将node3节点的regionserver上的部分region手动迁移到node4。
先看一下node3节点的regionserver上的region:
在这里插入图片描述
可以看到有很多region,上面红线标识的就是regionID。

hbase(main):009:0> move '369a74e128cebe5765745aa6f1c2bd0b','node4,16020,1572921608641'
Took 1.9108 seconds
hbase(main):010:0> move '8dba2021c05515b03764f12bb2df1263','node4,16020,1572921608641'
Took 2.1796 seconds 

我们随机选了两个region迁移到node4上后:
在这里插入图片描述
由于region比较多,可以写脚本去实现。

注意:迁移到新机器的region本地化率为0%,并且缓存中没有数据,所以发送到改节点的请求耗时会比较多,过一会等缓存中有数据了就好了。

我们将20个region迁移到node4后,看看node3和node4的本地化率情况:
在这里插入图片描述
在这里插入图片描述
我们可以使用压缩来提高本地化率。

关于压缩请参考笔者博客

hbase动态缩容

这里我们把node3下掉。

停止node3:graceful_stop

[root@node3 bin]# ./graceful_stop.sh node3
2019-11-04T18:33:20 Disabling load balancer
2019-11-04 18:33:35,579 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-11-04T18:33:39 Previous balancer state was true
2019-11-04T18:33:39 Unloading node3 region(s)
2019-11-04 18:33:40,608 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-11-04 18:33:41,542 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:host.name=node3
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_91
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:java.home=/data/program/jdk1.8.0_91/jre
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: data/program/hbase-2.1.5/bin/../lib/spymemcached-2.12.2.jar:/data/program/hbase-2.1.5/bin/../lib/validation-api-1.1.0.Final.jar:/data/program/hbase-2.1.5/bin/../lib/xmlenc-0.52.jar:/data/program/hbase-2.1.5/bin/../lib/xz-1.0.jar:/data/program/hbase-2.1.5/bin/../lib/zookeeper-3.4.10.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar:/data/program/hbase-2.1.5/bin/../lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:os.name=Linux
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-693.el7.x86_64
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:user.name=root
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:user.home=/root
2019-11-04 18:33:41,543 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Client environment:user.dir=/data/program/hbase-2.1.5/bin
2019-11-04 18:33:41,546 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181,node2:2181,node3:2181 sessionTimeout=3600000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$Lambda$13/451125583@368e3ff3
2019-11-04 18:33:41,622 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb-SendThread(node3:2181)] zookeeper.ClientCnxn: Opening socket connection to server node3/192.168.229.102:2181. Will not attempt to authenticate using SASL (unknown error)
2019-11-04 18:33:41,627 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb-SendThread(node3:2181)] zookeeper.ClientCnxn: Socket connection established to node3/192.168.229.102:2181, initiating session
2019-11-04 18:33:41,661 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb-SendThread(node3:2181)] zookeeper.ClientCnxn: Session establishment complete on server node3/192.168.229.102:2181, sessionid = 0x36e35baf9fb002f, negotiated timeout = 40000
2019-11-04 18:33:42,898 INFO  [pool-2-thread-1] util.RegionMover: Moving 21 regions from node3 to 1 servers using 1 threads .Ack Mode:true
2019-11-04 18:33:43,178 INFO  [pool-3-thread-1] util.RegionMover: Moving region:369a74e128cebe5765745aa6f1c2bd0b from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:33:45,015 INFO  [pool-3-thread-1] util.RegionMover: Moved Region PERSON,,1572523808553.369a74e128cebe5765745aa6f1c2bd0b. cost:2.103
2019-11-04 18:33:45,034 INFO  [pool-3-thread-1] util.RegionMover: Moving region:1588230740 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:33:47,398 INFO  [pool-3-thread-1] util.RegionMover: Moved Region hbase:meta,,1.1588230740 cost:2.382
2019-11-04 18:33:47,461 INFO  [pool-3-thread-1] util.RegionMover: Moving region:35c5536e6bb608c9e3387387216795c8 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:33:51,983 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.CATALOG,,1571994812282.35c5536e6bb608c9e3387387216795c8. cost:4.584
2019-11-04 18:33:52,024 INFO  [pool-3-thread-1] util.RegionMover: Moving region:c0fe5d3e3dc972ade56f59550797c673 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:33:53,308 INFO  [pool-3-thread-1] util.RegionMover: Moved Region hbase:namespace,,1571994739459.c0fe5d3e3dc972ade56f59550797c673. cost:1.325
2019-11-04 18:33:53,996 INFO  [pool-3-thread-1] util.RegionMover: Moving region:8dba2021c05515b03764f12bb2df1263 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:33:55,315 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x02\x00\x00\x00,1571994824933.8dba2021c05515b03764f12bb2df1263. cost:2.006
2019-11-04 18:33:55,483 INFO  [pool-3-thread-1] util.RegionMover: Moving region:da0b0f537784cb19292fc0c6d508d7c4 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:33:56,819 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x03\x00\x00\x00,1571994824933.da0b0f537784cb19292fc0c6d508d7c4. cost:1.504
2019-11-04 18:33:57,114 INFO  [pool-3-thread-1] util.RegionMover: Moving region:6c9cf401f80eed32e8e1f2e1ef201550 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:33:58,431 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x05\x00\x00\x00,1571994824933.6c9cf401f80eed32e8e1f2e1ef201550. cost:1.612
2019-11-04 18:33:58,551 INFO  [pool-3-thread-1] util.RegionMover: Moving region:e7c9f1368ca8a9174f7e1cd629a39f59 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:00,020 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x06\x00\x00\x00,1571994824933.e7c9f1368ca8a9174f7e1cd629a39f59. cost:1.589
2019-11-04 18:34:00,158 INFO  [pool-3-thread-1] util.RegionMover: Moving region:4c7548e9b198d9ec1e57cf4b517b1ebb from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:01,388 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x0A\x00\x00\x00,1571994824933.4c7548e9b198d9ec1e57cf4b517b1ebb. cost:1.368
2019-11-04 18:34:01,480 INFO  [pool-3-thread-1] util.RegionMover: Moving region:514943bcb21714497d14dc281caa9fb2 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:02,758 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x0B\x00\x00\x00,1571994824933.514943bcb21714497d14dc281caa9fb2. cost:1.369
2019-11-04 18:34:02,817 INFO  [pool-3-thread-1] util.RegionMover: Moving region:48371c940ea51ebbfd990c44add97c0f from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:04,132 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x0C\x00\x00\x00,1571994824933.48371c940ea51ebbfd990c44add97c0f. cost:1.374
2019-11-04 18:34:04,201 INFO  [pool-3-thread-1] util.RegionMover: Moving region:3bf058f5853fcdf12d58b933ba96c2b9 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:05,469 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x0E\x00\x00\x00,1571994824933.3bf058f5853fcdf12d58b933ba96c2b9. cost:1.336
2019-11-04 18:34:05,539 INFO  [pool-3-thread-1] util.RegionMover: Moving region:90608adb5077809a2897aaa38062fa75 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:07,767 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x10\x00\x00\x00,1571994824933.90608adb5077809a2897aaa38062fa75. cost:2.298
2019-11-04 18:34:07,814 INFO  [pool-3-thread-1] util.RegionMover: Moving region:c9a703a28803089342896fd6581f65b0 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:09,038 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x13\x00\x00\x00,1571994824933.c9a703a28803089342896fd6581f65b0. cost:1.271
2019-11-04 18:34:09,101 INFO  [pool-3-thread-1] util.RegionMover: Moving region:dae717d89104c02c006872e6ec32dd65 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:11,325 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x14\x00\x00\x00,1571994824933.dae717d89104c02c006872e6ec32dd65. cost:2.287
2019-11-04 18:34:11,366 INFO  [pool-3-thread-1] util.RegionMover: Moving region:b61768d26982052eb2b106cc00c1dc85 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:12,620 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x16\x00\x00\x00,1571994824933.b61768d26982052eb2b106cc00c1dc85. cost:1.295
2019-11-04 18:34:12,660 INFO  [pool-3-thread-1] util.RegionMover: Moving region:12aa3ec8ff9699de5543fa560e009630 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:14,904 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x17\x00\x00\x00,1571994824933.12aa3ec8ff9699de5543fa560e009630. cost:2.283
2019-11-04 18:34:14,929 INFO  [pool-3-thread-1] util.RegionMover: Moving region:f07dd988da903ef56c19b80506bb104b from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:16,136 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x18\x00\x00\x00,1571994824933.f07dd988da903ef56c19b80506bb104b. cost:1.231
2019-11-04 18:34:16,148 INFO  [pool-3-thread-1] util.RegionMover: Moving region:dc870483899b31d0fb45ab1f7c935f4a from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:17,371 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x1B\x00\x00\x00,1571994824933.dc870483899b31d0fb45ab1f7c935f4a. cost:1.235
2019-11-04 18:34:17,374 INFO  [pool-3-thread-1] util.RegionMover: Moving region:326d415fab55f79d54fb061fff319d21 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:18,545 INFO  [pool-3-thread-1] util.RegionMover: Moved Region SYSTEM.LOG,\x1F\x00\x00\x00,1571994824933.326d415fab55f79d54fb061fff319d21. cost:1.174
2019-11-04 18:34:18,561 INFO  [pool-3-thread-1] util.RegionMover: Moving region:e9f006fd022bb12bd653492127f24c53 from node3,16020,1572860544858 to node4,16020,1572861995536
2019-11-04 18:34:19,771 INFO  [pool-3-thread-1] util.RegionMover: Moved Region hbase:rsgroup,,1571994742444.e9f006fd022bb12bd653492127f24c53. cost:1.226
2019-11-04 18:34:19,773 INFO  [pool-2-thread-1] util.RegionMover: No Regions to move....Quitting now
2019-11-04 18:34:19,796 INFO  [main] client.ConnectionImplementation: Closing master protocol: MasterService
2019-11-04 18:34:19,844 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb] zookeeper.ZooKeeper: Session: 0x36e35baf9fb002f closed
2019-11-04 18:34:19,846 INFO  [ReadOnlyZKClient-node1:2181,node2:2181,node3:2181@0x2ddc8ecb-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x36e35baf9fb002f
2019-11-04T18:34:20 Unloaded node3 region(s)
2019-11-04T18:34:20 Stopping regionserver on node3
running regionserver, logging to /data/program/hbase-2.1.5/bin/../logs/hbase-root-regionserver-node3.out
stopping regionserver.
2019-11-04T18:34:21 Restoring balancer state to true

graceful_stop命令支持以下语法,可以停止,重启regionserver,对于集群维护有很大帮助:

[root@node1 bin]# ./graceful_stop.sh
Usage: graceful_stop.sh [--config <conf-dir>] [-e] [--restart [--reload]] [--thrift] [--rest]  [-nob |--nobalancer ] <hostname>
 thrift         If we should stop/start thrift before/after the hbase stop/start
 rest           If we should stop/start rest before/after the hbase stop/start
 restart        If we should restart after graceful stop
 reload         Move offloaded regions back on to the restarted server
 n|noack        Enable noAck mode in RegionMover. This is a best effort mode for moving regions
 maxthreads xx  Limit the number of threads used by the region mover. Default value is 1.
 movetimeout xx Timeout for moving regions. If regions are not moved by the timeout value,exit with error. Default value is INT_MAX.
 hostname       Hostname of server we are to stop
 e|failfast     Set -e so exit immediately if any command exits with non-zero status
 nob| nobalancer Do not manage balancer states. This is only used as optimization in rolling_restart.sh to avoid multiple calls to hbase shell

graceful_stop做了那么几件事情:

  • 停止负载均衡 Disabling load balancer
  • 迁移region
  • 停止regionserverStopping regionserver on node3
  • 开启负载均衡

再次查看负载均衡状态:

hbase(main):036:0> balancer_enabled
true                                                                                                                                                                                                                                                                          
Took 0.0613 seconds                                                                                                                                                                                                                                                           
=> true

在这里插入图片描述

region重启过程中,动态均衡会被关闭一段时间,此期间会显示警告:
在这里插入图片描述

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注