最近开始着手把scrala做成分布式的,采取的方法是做成一个Master/Slave的架构,Slave是一个Long running的Actor,用Consul来帮助Master发现Slave。
于是就开始填坑了,第一步遇到的问题就是,如何启动Consul服务,这里我采取的是用docker的image,progrium/consul。为了更好地模拟分布式的环境,我想试试在Docker Machine之间能不能搭建起Consul Cluster。目前的尝试,在Docker Machine下的Docker中跑Consul的container的方法是做不到的。原因是虚拟层次太深了,不同Docker Machine中的Container互相发现不了IP。
各种遇到的坑就不说了,就说说我是如何跑整个Cluster的吧~
首先是创建两个Docker Machine,用来测试
docker-machine create -d virtualbox <name>
之后,在第一个Machine上,我启动了三个Consul Server容器,一个Consul Client容器,其中只有一个Server暴露端口给host,Client也会暴露8500等等端口给host。
$ docker run -d --name node1 -h node1 progrium/consul -server -bootstrap-expect 3
$ JOIN_IP="$(docker inspect -f '' node1)"
$ docker run -d --name node2 -h node2 progrium/consul -server -join $JOIN_IP
$ docker run -d -p 8300:8300 -p 8301:8301 -p 8301:8301/udp -p 8302:8302 -p 8302:8302/udp --name node3 -h node3 progrium/consul -server -join $JOIN_IP
$ docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp --name node4 -h node4 progrium/consul -join $JOIN_IP
这样就可以通过访问host的8500端口下的/ui来进行图形化的访问。这时候在service下应该有一个名字是consul的service,是跑在三个Server上的,在node下则应该有4个。
第一个Machine很成功,跑一个registrator,再跑redis,就会看到多一个service,那之后试试能不能在第二台上跑一个Client。在新的Machine上执行:
$ docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp --name node5 -h node5 progrium/consul -join <IP of the Docker Machine 1>
docker给出的日志是这样的:
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
Join completed. Synced with 1 initial agents
==> Consul agent running!
Node name: 'node5'
Datacenter: 'dc1'
Server: false (bootstrap: false)
Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400)
Cluster Addr: 172.17.0.4 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2016/01/28 10:19:03 [INFO] serf: EventMemberJoin: node5 172.17.0.4
2016/01/28 10:19:03 [INFO] agent: (LAN) joining: [192.168.99.102]
2016/01/28 10:19:03 [INFO] serf: EventMemberJoin: node1 172.17.0.11
2016/01/28 10:19:03 [INFO] serf: EventMemberJoin: node4 172.17.0.14
2016/01/28 10:19:03 [INFO] serf: EventMemberJoin: node2 172.17.0.12
2016/01/28 10:19:03 [INFO] serf: EventMemberJoin: node3 172.17.0.15
2016/01/28 10:19:03 [INFO] agent: (LAN) joined: 1 Err: <nil>
2016/01/28 10:19:03 [ERR] agent: failed to sync remote state: No known Consul servers
2016/01/28 10:19:03 [INFO] consul: adding server node1 (Addr: 172.17.0.11:8300) (DC: dc1)
2016/01/28 10:19:03 [INFO] consul: adding server node2 (Addr: 172.17.0.12:8300) (DC: dc1)
2016/01/28 10:19:03 [INFO] consul: adding server node3 (Addr: 172.17.0.15:8300) (DC: dc1)
2016/01/28 10:19:05 [INFO] memberlist: Suspect node2 has failed, no acks received
2016/01/28 10:19:06 [INFO] memberlist: Suspect node4 has failed, no acks received
2016/01/28 10:19:06 [ERR] agent: failed to sync remote state: rpc error: failed to get conn: dial tcp 172.17.0.11:8300: no route to host
2016/01/28 10:19:07 [INFO] memberlist: Suspect node3 has failed, no acks received
2016/01/28 10:19:08 [INFO] memberlist: Suspect node1 has failed, no acks received
回到之前的ui,会发现node多了一个,但是是fail掉的,原因是新的node得到的其他node的IP是在Docker Machine 1上的docker container内部的IP,出了Docker Machine 1就找不到了。
那如何解决这个问题,之后想明白了。之前的尝试,是每个Machine上跑多个node,所以很难处理跨Machine间的node的互相发现问题。那改进下之前的方法,一个Machine上跑一个node,把这个node需要用到的端口全部映射到host上,这样就可以保证不会出现node只能与其他Machine的一个node互相发现的问题。因为现在每个Machine上只有一个node。
# the ip of the docker machine is 192.168.99.102 and the docker0 is 172.17.42.1
$ docker run -d -h node1 \
-p 192.168.99.102:8300:8300 \
-p 192.168.99.102:8301:8301 \
-p 192.168.99.102:8301:8301/udp \
-p 192.168.99.102:8302:8302 \
-p 192.168.99.102:8302:8302/udp \
-p 192.168.99.102:8400:8400 \
-p 192.168.99.102:8500:8500 \
-p 172.17.42.1:53:53/udp \
progrium/consul -server -advertise 192.168.99.102 -bootstrap-expect 2
# the ip of the docker machine is 192.168.99.100 and the docker0 is 172.17.42.1
$ docker run -d -h node2 \
-p 192.168.99.100:8300:8300 \
-p 192.168.99.100:8301:8301 \
-p 192.168.99.100:8301:8301/udp \
-p 192.168.99.100:8302:8302 \
-p 192.168.99.100:8302:8302/udp \
-p 192.168.99.100:8400:8400 \
-p 192.168.99.100:8500:8500 \
-p 172.17.42.1:53:53/udp \
progrium/consul -server -advertise 192.168.99.100 -join 192.168.99.102
这样就在两个Docker Machine上做成了一个Consul Cluster。这也是dockerhub上的那个镜像提出的在生产环境上使用docker构建Consul Cluster的方法。