设为首页收藏本站

小牛社区-大数据学习交流社区|大数据免费学习资源

 找回密码
 立即注册!

QQ登录

只需一步,快速开始

扫一扫,访问微社区

查看: 1517|回复: 0

Hbase设置多个hmaster

[复制链接]

3203

主题

3533

帖子

1万

积分

管理员

Rank: 18Rank: 18Rank: 18Rank: 18Rank: 18

积分
14297
发表于 2016-6-4 23:33:10 | 显示全部楼层 |阅读模式
为了保证HBase集群的高可靠性,HBase支持多Backup Master 设置。当Active Master挂掉后,Backup Master可以自动接管整个HBase的集群。
该配置极其简单:
在$HBASE_HOME/conf/ 目录下新增文件配置backup-masters,在其内添加要用做Backup Master的节点hostname。如下:
  1. [hbase@master conf]$ cat backup-masters node1
复制代码

之后,启动整个集群,我们会发现,在master和node1上,都启动了HMaster进程:
  1. [hbase@master conf]$ jps25188 NameNode3319 QuorumPeerMain31725 Jps25595 ResourceManager31077 HMaster25711 NodeManager25303 DataNode31617 Main31220 HRegionServer
复制代码
  1. [hbase@node1 root]$ jps11560 DataNode11762 NodeManager20769 Jps415 QuorumPeerMain11675 SecondaryNameNode20394 HRegionServer20507 HMaster
复制代码
此时查看node1上master节点的log,可以看到如下的信息:
  1. [hbase@node1 logs]$ tail -f hbase-hbase-master-node1.log2015-10-10 05:35:09,609 INFO  [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:600102015-10-10 05:35:09,613 INFO  [main] master.HMaster: hbase.rootdir=hdfs://master:9000/hbase, hbase.cluster.distributed=true2015-10-10 05:35:09,631 INFO  [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/node1,60000,14444553077002015-10-10 05:35:09,806 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, master,60000,1444455305852; waiting to become the next active master2015-10-10 05:35:09,858 INFO  [master/node1/10.0.52.145:60000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x10135dbc connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:21812015-10-10 05:35:09,858 INFO  [master/node1/10.0.52.145:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x10135dbc0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase2015-10-10 05:35:09,859 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Opening socket connection to server node2/10.0.52.146:2181. Will not attempt to authenticate using SASL (unknown error)2015-10-10 05:35:09,860 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Socket connection established to node2/10.0.52.146:2181, initiating session2015-10-10 05:35:09,885 INFO  [master/node1/10.0.52.145:60000-SendThread(node2:2181)] zookeeper.ClientCnxn: Session establishment complete on server node2/10.0.52.146:2181, sessionid = 0x350463058c10017, negotiated timeout = 400002015-10-10 05:35:09,920 INFO  [master/node1/10.0.52.145:60000] regionserver.HRegionServer: ClusterId : c309a039-eb35-400c-bb13-0b6ed939cc5e
复制代码
该信息说明,当前hbase集群有活动的master节点,该master节点为master,所以node1节点开始等待,直到master节点上的hmaster挂掉。slave1会变成新的Active 的 Master节点。
此时,直接kill掉master节点上HMaster进程,查看node1上master节点log会发现:
  1. 2015-10-10 05:42:17,173 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/node1,60000,1444455307700 from backup master directory2015-10-10 05:42:17,194 INFO  [node1:60000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=node1,60000,14444553077002015-10-10 05:42:17,758 INFO  [node1:60000.activeMasterManager] fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks2015-10-10 05:42:17,776 INFO  [node1:60000.activeMasterManager] coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes2015-10-10 05:42:17,880 INFO  [node1:60000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x29d405f7 connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:21812015-10-10 05:42:17,880 INFO  [node1:60000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x29d405f70x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase2015-10-10 05:42:17,883 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Opening socket connection to server node2/10.0.52.146:2181. Will not attempt to authenticate using SASL (unknown error)2015-10-10 05:42:17,884 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Socket connection established to node2/10.0.52.146:2181, initiating session2015-10-10 05:42:17,904 INFO  [node1:60000.activeMasterManager-SendThread(node2:2181)] zookeeper.ClientCnxn: Session establishment complete on server node2/10.0.52.146:2181, sessionid = 0x350463058c1001b, negotiated timeout = 400002015-10-10 05:42:17,942 INFO  [node1:60000.activeMasterManager] balancer.StochasticLoadBalancer: loading config2015-10-10 05:42:18,061 INFO  [node1:60000.activeMasterManager] master.HMaster: Server active/primary master=node1,60000,1444455307700, sessionid=0x150463058ac001a, setting cluster-up flag (Was=true)2015-10-10 05:42:18,154 INFO  [node1:60000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort2015-10-10 05:42:18,184 INFO  [node1:60000.activeMasterManager] procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/flush-table-proc/acquired /hbase/flush-table-proc/reached /hbase/flush-table-proc/abort2015-10-10 05:42:18,256 INFO  [node1:60000.activeMasterManager] master.MasterCoprocessorHost: System coprocessor loading is enabled2015-10-10 05:42:18,286 INFO  [node1:60000.activeMasterManager] procedure2.ProcedureExecutor: Starting procedure executor threads=52015-10-10 05:42:18,288 INFO  [node1:60000.activeMasterManager] wal.WALProcedureStore: Starting WAL Procedure Store lease recovery2015-10-10 05:42:18,296 INFO  [node1:60000.activeMasterManager] util.FSHDFSUtils: Recovering lease on dfs file hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log2015-10-10 05:42:18,307 INFO  [node1:60000.activeMasterManager] util.FSHDFSUtils: recoverLease=true, attempt=0 on file=hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log after 9ms2015-10-10 05:42:18,324 WARN  [node1:60000.activeMasterManager] wal.WALProcedureStore: Unable to read tracker for hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log - Missing trailer: size=9 startPos=92015-10-10 05:42:18,373 INFO  [node1:60000.activeMasterManager] wal.WALProcedureStore: Lease acquired for flushLogId: 282015-10-10 05:42:18,383 WARN  [node1:60000.activeMasterManager] wal.ProcedureWALFormatReader: nothing left to decode. exiting with missing EOF2015-10-10 05:42:18,383 INFO  [node1:60000.activeMasterManager] wal.ProcedureWALFormatReader: No active entry found in state log hdfs://master:9000/hbase/MasterProcWALs/state-00000000000000000027.log. removing it2015-10-10 05:42:18,405 INFO  [node1:60000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:21812015-10-10 05:42:18,405 INFO  [node1:60000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=replicationLogCleaner0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase2015-10-10 05:42:18,407 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Opening socket connection to server node1/10.0.52.145:2181. Will not attempt to authenticate using SASL (unknown error)2015-10-10 05:42:18,408 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Socket connection established to node1/10.0.52.145:2181, initiating session2015-10-10 05:42:18,426 INFO  [node1:60000.activeMasterManager-SendThread(node1:2181)] zookeeper.ClientCnxn: Session establishment complete on server node1/10.0.52.145:2181, sessionid = 0x250463058780018, negotiated timeout = 400002015-10-10 05:42:18,464 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.2015-10-10 05:42:19,970 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1506 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.2015-10-10 05:42:21,475 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3011 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.2015-10-10 05:42:22,980 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 4516 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.2015-10-10 05:42:23,058 INFO  [PriorityRpcServer.handler=3,queue=1,port=60000] master.ServerManager: Registering server=node1,16020,14444553065452015-10-10 05:42:23,059 INFO  [PriorityRpcServer.handler=5,queue=1,port=60000] master.ServerManager: Registering server=master,16020,14444553067632015-10-10 05:42:23,060 INFO  [PriorityRpcServer.handler=1,queue=1,port=60000] master.ServerManager: Registering server=node2,16020,14444553058862015-10-10 05:42:23,081 INFO  [node1:60000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 3, slept for 4617 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.2015-10-10 05:42:24,586 INFO  [node1:60000.activeMasterManager] master.ServerManager: Finished waiting for region servers count to settle; checked in 3, slept for 6122 ms, expecting minimum of 1, maximum of 2147483647, master is running2015-10-10 05:42:24,610 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/master,16020,1444455306763 belongs to an existing region server2015-10-10 05:42:24,619 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/node1,16020,1444455306545 belongs to an existing region server2015-10-10 05:42:24,625 INFO  [node1:60000.activeMasterManager] master.MasterFileSystem: Log folder hdfs://master:9000/hbase/WALs/node2,16020,1444455305886 belongs to an existing region server2015-10-10 05:42:24,757 INFO  [node1:60000.activeMasterManager] master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1444455744651, server=null} to {1588230740 state=OPEN, ts=1444455744756, server=node2,16020,1444455305886}2015-10-10 05:42:24,757 INFO  [node1:60000.activeMasterManager] master.ServerManager: AssignmentManager hasn't finished failover cleanup; waiting2015-10-10 05:42:24,760 INFO  [node1:60000.activeMasterManager] master.HMaster: hbase:meta with replicaId 0 assigned=0, rit=false, location=node2,16020,14444553058862015-10-10 05:42:24,895 INFO  [node1:60000.activeMasterManager] hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization2015-10-10 05:42:24,985 INFO  [node1:60000.activeMasterManager] master.AssignmentManager: Found regions out on cluster or in RIT; presuming failover2015-10-10 05:42:25,000 INFO  [node1:60000.activeMasterManager] master.AssignmentManager: Joined the cluster in 104ms, failover=true2015-10-10 05:42:25,216 INFO  [node1:60000.activeMasterManager] master.HMaster: Master has completed initialization2015-10-10 05:42:25,234 INFO  [node1:60000.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled
复制代码
可见,node1节点上Backup Master 已经结果HMaster,成为Active HMaster
重新启动master节点上的hmaster
  1. [hbase@master bin]$ ./hbase-daemon.sh start master starting master, logging to /usr/local/hbase//logs/hbase-hbase-master-master.out[hbase@master bin]$ jps25188 NameNode32351 Jps3319 QuorumPeerMain32265 HMaster25595 ResourceManager25711 NodeManager25303 DataNode31220 HRegionServer
复制代码
查看master节点的log发现,它变为了backup master
  1. [hbase@master logs]$ tail -f  hbase-hbase-master-master.log2015-10-10 05:53:15,329 INFO  [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:600102015-10-10 05:53:15,333 INFO  [main] master.HMaster: hbase.rootdir=hdfs://master:9000/hbase, hbase.cluster.distributed=true2015-10-10 05:53:15,348 INFO  [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/master,60000,14444563938192015-10-10 05:53:15,488 INFO  [master:60000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, node1,60000,1444455307700; waiting to become the next active master2015-10-10 05:53:15,522 INFO  [master/master/10.0.52.144:60000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x323b7deb connecting to ZooKeeper ensemble=master:2181,node1:2181,node2:21812015-10-10 05:53:15,522 INFO  [master/master/10.0.52.144:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node1:2181,node2:2181 sessionTimeout=90000 watcher=hconnection-0x323b7deb0x0, quorum=master:2181,node1:2181,node2:2181, baseZNode=/hbase2015-10-10 05:53:15,524 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Opening socket connection to server master/10.0.52.144:2181. Will not attempt to authenticate using SASL (unknown error)2015-10-10 05:53:15,525 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Socket connection established to master/10.0.52.144:2181, initiating session2015-10-10 05:53:15,536 INFO  [master/master/10.0.52.144:60000-SendThread(master:2181)] zookeeper.ClientCnxn: Session establishment complete on server master/10.0.52.144:2181, sessionid = 0x150463058ac001c, negotiated timeout = 400002015-10-10 05:53:15,567 INFO  [master/master/10.0.52.144:60000] regionserver.HRegionServer: ClusterId : c309a039-eb35-400c-bb13-0b6ed939cc5e
复制代码
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册!

本版积分规则

快速回复 返回顶部 返回列表