ElasticSearch - 运行集群

在生产环境中, 为了保证es的可用性, 需要一个failover的方案, es默认就支持集群模式, 下面就来试用一下 es 集群模式.

配置集群


首先要有三个 es 实例, 放在不同机器上, 假设三台机器分别叫node1/node2/node3, 分别修改各个 node 的 config/elasticsearch.yml, 设置如下:

node1:

# 集群名称 每个node的应该一致
cluster.name: niko-elasticsearch
# node名称
node.name: niko-node1
#
discovery.zen.ping.unicast.hosts: ["10.2.10.209:9300", "10.2.10.210:9300", "10.2.10.211:9300"]

node2:

# 集群名称 每个node的应该一致
cluster.name: niko-elasticsearch
# node名称
node.name: niko-node2
#
discovery.zen.ping.unicast.hosts: ["10.2.10.209:9300", "10.2.10.210:9300", "10.2.10.211:9300"]

node3:

# 集群名称 每个node的应该一致
cluster.name: niko-elasticsearch
# node名称
node.name: niko-node3
#
discovery.zen.ping.unicast.hosts: ["10.2.10.209:9300", "10.2.10.210:9300", "10.2.10.211:9300"]

启动集群


启动node1


bin/elasticsearch
[2016-05-08 21:55:12,053][INFO ][node                     ] [niko-node1] version[1.7.5], pid[2632], build[00f95f4/2016-02-02T09:55:30Z]
[2016-05-08 21:55:12,054][INFO ][node ] [niko-node1] initializing ...
[2016-05-08 21:55:12,164][INFO ][plugins ] [niko-node1] loaded [], sites []
[2016-05-08 21:55:12,210][INFO ][env ] [niko-node1] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [86.8gb], net total_space [96.3gb], types [ext4]
[2016-05-08 21:55:16,339][INFO ][node ] [niko-node1] initialized
[2016-05-08 21:55:16,339][INFO ][node ] [niko-node1] starting ...
[2016-05-08 21:55:16,567][INFO ][transport ] [niko-node1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.2.10.209:9300]}
[2016-05-08 21:55:16,644][INFO ][discovery ] [niko-node1] niko-elasticsearch/iZzjAHGURxW3orWute-PtQ
[2016-05-08 21:55:20,463][INFO ][cluster.service ] [niko-node1] new_master [niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]], reason: zen-disco-join (elected_as_master)
[2016-05-08 21:55:20,506][INFO ][http ] [niko-node1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.2.10.209:9200]}
[2016-05-08 21:55:20,506][INFO ][node ] [niko-node1] started
[2016-05-08 21:55:20,516][INFO ][gateway ] [niko-node1] recovered [0] indices into cluster_state

启动node2


log of node2:

$ bin/elasticsearch
[2016-05-08 21:55:32,314][INFO ][node ] [niko-node2] version[1.7.5], pid[2409], build[00f95f4/2016-02-02T09:55:30Z]
[2016-05-08 21:55:32,320][INFO ][node ] [niko-node2] initializing ...
[2016-05-08 21:55:32,486][INFO ][plugins ] [niko-node2] loaded [], sites []
[2016-05-08 21:55:32,542][INFO ][env ] [niko-node2] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [86.8gb], net total_space [96.3gb], types [ext4]
[2016-05-08 21:55:36,886][INFO ][node ] [niko-node2] initialized
[2016-05-08 21:55:36,887][INFO ][node ] [niko-node2] starting ...
[2016-05-08 21:55:37,119][INFO ][transport ] [niko-node2] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.2.10.210:9300]}
[2016-05-08 21:55:37,141][INFO ][discovery ] [niko-node2] niko-elasticsearch/7Fc97mnTTZGEWiLZgm4WmA
[2016-05-08 21:55:40,239][INFO ][cluster.service ] [niko-node2] detected_master [niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]], added {[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]],}, reason: zen-disco-receive(from master [[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]]])
[2016-05-08 21:55:40,290][INFO ][http ] [niko-node2] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.2.10.210:9200]}
[2016-05-08 21:55:40,293][INFO ][node ] [niko-node2] started

log of node1:

[2016-05-08 21:55:40,144][INFO ][cluster.service          ] [niko-node1] added {[niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]],}, reason: zen-disco-receive(join from node[[niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]]])

启动node3:


log of node3:

$ bin/elasticsearch
[2016-05-08 21:57:31,672][INFO ][node ] [niko-node3] version[1.7.5], pid[2431], build[00f95f4/2016-02-02T09:55:30Z]
[2016-05-08 21:57:31,676][INFO ][node ] [niko-node3] initializing ...
[2016-05-08 21:57:31,857][INFO ][plugins ] [niko-node3] loaded [], sites []
[2016-05-08 21:57:31,912][INFO ][env ] [niko-node3] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [86.8gb], net total_space [96.3gb], types [ext4]
[2016-05-08 21:57:36,632][INFO ][node ] [niko-node3] initialized
[2016-05-08 21:57:36,633][INFO ][node ] [niko-node3] starting ...
[2016-05-08 21:57:36,843][INFO ][transport ] [niko-node3] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.2.10.211:9300]}
[2016-05-08 21:57:36,913][INFO ][discovery ] [niko-node3] niko-elasticsearch/UzK8fUZJRs-mrOWKJnxHBQ
[2016-05-08 21:57:40,062][INFO ][cluster.service ] [niko-node3] detected_master [niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]], added {[niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]],[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]],}, reason: zen-disco-receive(from master [[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]]])
[2016-05-08 21:57:40,109][INFO ][http ] [niko-node3] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.2.10.211:9200]}
[2016-05-08 21:57:40,110][INFO ][node ] [niko-node3] started

log of node1:

[2016-05-08 21:57:40,004][INFO ][cluster.service          ] [niko-node1] added {[niko-node3][UzK8fUZJRs-mrOWKJnxHBQ][ubuntu][inet[/10.2.10.211:9300]],}, reason: zen-disco-receive(join from node[[niko-node3][UzK8fUZJRs-mrOWKJnxHBQ][ubuntu][inet[/10.2.10.211:9300]]])

log of node2:

[2016-05-08 21:57:40,081][INFO ][cluster.service          ] [niko-node2] added {[niko-node3][UzK8fUZJRs-mrOWKJnxHBQ][ubuntu][inet[/10.2.10.211:9300]],}, reason: zen-disco-receive(from master [[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]]])

查看集群


curl http://10.2.10.209:9200/_nodes/process?pretty :

{
"cluster_name" : "niko-elasticsearch",
"nodes" : {
"iZzjAHGURxW3orWute-PtQ" : {
"name" : "niko-node1",
"transport_address" : "inet[/10.2.10.209:9300]",
"host" : "ubuntu",
"ip" : "127.0.1.1",
"version" : "1.7.5",
"build" : "00f95f4",
"http_address" : "inet[/10.2.10.209:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 2632,
"max_file_descriptors" : 65536,
"mlockall" : false
}
},
"7Fc97mnTTZGEWiLZgm4WmA" : {
"name" : "niko-node2",
"transport_address" : "inet[/10.2.10.210:9300]",
"host" : "ubuntu",
"ip" : "127.0.1.1",
"version" : "1.7.5",
"build" : "00f95f4",
"http_address" : "inet[/10.2.10.210:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 2409,
"max_file_descriptors" : 65536,
"mlockall" : false
}
},
"UzK8fUZJRs-mrOWKJnxHBQ" : {
"name" : "niko-node3",
"transport_address" : "inet[/10.2.10.211:9300]",
"host" : "ubuntu",
"ip" : "127.0.1.1",
"version" : "1.7.5",
"build" : "00f95f4",
"http_address" : "inet[/10.2.10.211:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 2431,
"max_file_descriptors" : 65536,
"mlockall" : false
}
}
}
}

宕机测试 (node1)


由上可知, 集群的 masetr 是 niko-node1, 现在关闭它, 然后查看其他 node 的变化:

log of node2:

[2016-05-08 22:04:55,748][INFO ][discovery.zen            ] [niko-node2] master_left [[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]]], reason [shut_down]
[2016-05-08 22:04:55,750][WARN ][discovery.zen ] [niko-node2] master left (reason = shut_down), current nodes: {[niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]],[niko-node3][UzK8fUZJRs-mrOWKJnxHBQ][ubuntu][inet[/10.2.10.211:9300]],}
[2016-05-08 22:04:55,752][INFO ][cluster.service ] [niko-node2] removed {[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]],}, reason: zen-disco-master_failed ([niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]])
[2016-05-08 22:04:58,791][INFO ][cluster.service ] [niko-node2] new_master [niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]], reason: zen-disco-join (elected_as_master)

log of node3:

[2016-05-08 22:04:55,721][INFO ][discovery.zen            ] [niko-node3] master_left [[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]]], reason [shut_down]
[2016-05-08 22:04:55,726][WARN ][discovery.zen ] [niko-node3] master left (reason = shut_down), current nodes: {[niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]],[niko-node3][UzK8fUZJRs-mrOWKJnxHBQ][ubuntu][inet[/10.2.10.211:9300]],}
[2016-05-08 22:04:55,727][INFO ][cluster.service ] [niko-node3] removed {[niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]],}, reason: zen-disco-master_failed ([niko-node1][iZzjAHGURxW3orWute-PtQ][ubuntu][inet[/10.2.10.209:9300]])
[2016-05-08 22:04:58,789][INFO ][cluster.service ] [niko-node3] detected_master [niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]], reason: zen-disco-receive(from master [[niko-node2][7Fc97mnTTZGEWiLZgm4WmA][ubuntu][inet[/10.2.10.210:9300]]])

从上可知, node2 被选举为新的 master.

脑裂问题


接着上面, 因为集群中只有两个 node 了, 这时如果这两个 node 之间的连接中断, 那么它们都会认为对方已经 down 了, 然后选举自己为 master, 从而导致有两个 master, 带来了数据不一致的问题.

为了解决这个问题, 需要修改一个设置:

# Set to ensure a node sees N other master eligible nodes to be considered
# operational within the cluster. This should be set to a quorum/majority of
# the master-eligible nodes in the cluster.
#
#discovery.zen.minimum_master_nodes: 1

设置为最小需要2个:

discovery.zen.minimum_master_nodes: 2

Next


至于集群的工作原理将会在接下来的博客中介绍 ~~

参考


https://www.digitalocean.com/community/tutorials/how-to-set-up-a-production-elasticsearch-cluster-on-ubuntu-14-04
http://www.wklken.me/posts/2016/06/29/deploy-es.html