【zookeeper】 集群搭建和测试

本文记录zookeeper集群的搭建过程及简单测试。

配置


zk1

tickTime=2000
dataDir=/foo/data
dataLogDir=/foo/logs
clientPort=2181
initLimit=5
syncLimit=2
server.1=zk1:2888:3888
server.2=zk2:2889:3889
server.3=zk3:2890:3890

The entries of the form server.X list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the file myid in the data directory. That file has the contains the server number, in ASCII.

Finally, note the two port numbers after each server name: " 2888" and "3888". Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for leader election. This is the second port in the server entry.

zk2

tickTime=2000
dataDir=/foo/data
dataLogDir=/foo/logs
clientPort=2182
initLimit=5
syncLimit=2
server.1=zk1:2888:3888
server.2=zk2:2889:3889
server.3=zk3:2890:3890

zk3

tickTime=2000
dataDir=/foo/data
dataLogDir=/foo/logs
clientPort=2183
initLimit=5
syncLimit=2
server.1=zk1:2888:3888
server.2=zk2:2889:3889
server.3=zk3:2890:3890

启动


zk1

echo 1 > data/myid

.../zookeeper-3.4.6_server-01$ bin/zkServer.sh start
JMX enabled by default
Using config: /home/niko/dev/tools/servers/zookeeper/zookeeper-3.4.6_server-01/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

日志在./zookeeper.out文件中。


2015-12-29 22:07:45,673 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /home/niko/dev/tools/servers/zookeeper/zookeeper-3.4.6_server-01/bin/../conf/zoo.cfg
2015-12-29 22:07:45,675 [myid:] - INFO [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2015-12-29 22:07:45,677 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2015-12-29 22:07:45,677 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2015-12-29 22:07:45,678 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2015-12-29 22:07:45,684 [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2015-12-29 22:07:45,693 [myid:1] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2015-12-29 22:07:45,706 [myid:1] - INFO [main:QuorumPeer@959] - tickTime set to 2000
2015-12-29 22:07:45,706 [myid:1] - INFO [main:QuorumPeer@979] - minSessionTimeout set to -1
2015-12-29 22:07:45,706 [myid:1] - INFO [main:QuorumPeer@990] - maxSessionTimeout set to -1
2015-12-29 22:07:45,706 [myid:1] - INFO [main:QuorumPeer@1005] - initLimit set to 5
2015-12-29 22:07:45,714 [myid:1] - INFO [main:QuorumPeer@473] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2015-12-29 22:07:45,717 [myid:1] - INFO [main:QuorumPeer@488] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2015-12-29 22:07:45,721 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener@504] - My election bind port: zk1/127.0.1.1:3888
2015-12-29 22:07:45,726 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2015-12-29 22:07:45,727 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id = 1, proposed zxid=0x0
2015-12-29 22:07:45,729 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2015-12-29 22:07:45,731 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address zk2/127.0.1.1:3889
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:745)
2015-12-29 22:07:45,732 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 3 at election address zk3/127.0.1.1:3890
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:745)
2015-12-29 22:07:45,931 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address zk2/127.0.1.1:3889
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)

......

2015-12-29 22:09:27,956 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address zk2/127.0.1.1:3889
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-12-29 22:09:27,957 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 3 at election address zk3/127.0.1.1:3890
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-12-29 22:09:27,958 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 60000

zk2

echo 2 > data/myid
.../zookeeper-3.4.6_server-02$ bin/zkServer.sh start

zk3

echo 3 > data/myid
.../zookeeper-3.4.6_server-03$ bin/zkServer.sh start

检查 status


zk1

.../zookeeper-3.4.6_server-01$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/niko/dev/tools/servers/zookeeper/zookeeper-3.4.6_server-01/bin/../conf/zoo.cfg
Mode: follower

zk2

.../zookeeper-3.4.6_server-02$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/niko/dev/tools/servers/zookeeper/zookeeper-3.4.6_server-02/bin/../conf/zoo.cfg
Mode: leader

zk3

.../zookeeper-3.4.6_server-03$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/niko/dev/tools/servers/zookeeper/zookeeper-3.4.6_server-03/bin/../conf/zoo.cfg
Mode: follower

可知, zk2 被选为 leader。

宕机测试


把 zk2 kill掉后, 查看各个 zk server 的 status, 发现 zk3 变成 master :

zookeeper-3.4.6_server-03$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/niko/dev/tools/servers/zookeeper/zookeeper-3.4.6_server-03/bin/../conf/zoo.cfg
Mode: leader

java客户端api连接三个服务器, 然后用zkCli.sh测试, java client 工作正常:

[zk: 127.0.0.1:2182(CONNECTED) 3] set /test niko2
$ cat /tmp/niko/zk_test
niko2

然后关掉master zk3后, 集群将停止服务, 因为选举不能获得不能超过3/2=1张票了, 无法得出leader :

  <2015-12-29 22:42:57,889> <ClientCnxn:INFO>	main-SendThread(127.0.0.1:2181)	Unable to read additional data from server sessionid 0x3585e1413050000, likely server has closed socket, closing socket connection and attempting reconnect
<2015-12-29 22:42:58,324> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2182) Opening socket connection to server 127.0.0.1/127.0.0.1:2182. Will not attempt to authenticate using SASL (unknown error)
<2015-12-29 22:42:58,325> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2182) Socket connection established to 127.0.0.1/127.0.0.1:2182, initiating session
<2015-12-29 22:42:58,327> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2182) Unable to read additional data from server sessionid 0x3585e1413050000, likely server has closed socket, closing socket connection and attempting reconnect
<2015-12-29 22:42:58,699> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2183) Opening socket connection to server 127.0.0.1/127.0.0.1:2183. Will not attempt to authenticate using SASL (unknown error)
<2015-12-29 22:42:58,700> <ClientCnxn:WARN> main-SendThread(127.0.0.1:2183) Session 0x3585e1413050000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
<2015-12-29 22:43:00,543> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2181) Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
<2015-12-29 22:43:00,543> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2181) Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
<2015-12-29 22:43:00,545> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2181) Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x3585e1413050000, negotiated timeout = 4000

我们重新启动zk3, 集群将恢复服务:

<2015-12-29 22:53:34,232> <ClientCnxn:INFO>	main-SendThread(127.0.0.1:2183)	Opening socket connection to server 127.0.0.1/127.0.0.1:2183. Will not attempt to authenticate using SASL (unknown error)
<2015-12-29 22:53:34,232> <ClientCnxn:WARN> main-SendThread(127.0.0.1:2183) Session 0x3585e2c18ed0000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
<2015-12-29 22:53:34,485> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2181) Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
<2015-12-29 22:53:34,486> <ClientCnxn:WARN> main-SendThread(127.0.0.1:2181) Session 0x3585e2c18ed0000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
<2015-12-29 22:53:36,450> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2182) Opening socket connection to server 127.0.0.1/127.0.0.1:2182. Will not attempt to authenticate using SASL (unknown error)
<2015-12-29 22:53:36,451> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2182) Socket connection established to 127.0.0.1/127.0.0.1:2182, initiating session
<2015-12-29 22:53:36,451> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2182) Unable to read additional data from server sessionid 0x3585e2c18ed0000, likely server has closed socket, closing socket connection and attempting reconnect
<2015-12-29 22:53:37,115> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2183) Opening socket connection to server 127.0.0.1/127.0.0.1:2183. Will not attempt to authenticate using SASL (unknown error)
<2015-12-29 22:53:37,116> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2183) Socket connection established to 127.0.0.1/127.0.0.1:2183, initiating session
<2015-12-29 22:53:37,164> <ClientCnxn:INFO> main-SendThread(127.0.0.1:2183) Session establishment complete on server 127.0.0.1/127.0.0.1:2183, sessionid = 0x3585e2c18ed0000, negotiated timeout = 4000

接下来将会了解基于docker搭建zookeeper集群及其zookeeper集群的选举算法, 另起博客介绍.

参考


https://zookeeper.apache.org/doc/trunk/zookeeperStarted.html#sc_RunningReplicatedZooKeeper