注意: etcd不支持直接降级,强烈建议升级前作数据备份
为了使用kubernetes最新稳定版本带来的新的特性,我们需要将kubernetes从1.6.4版本升级到1.9.1版本,在升级之前需要升级etcd版本。这里记录一下手动升级过程
在一般情况下,从etcd 3.1升级到3.2可以做到零停机,滚动升级:
- 逐个停止etcd v3.1进程,并用etcd v3.2进程替换
- 在运行所有v3.2进程之后,v3.2中的新功能可用于新的集群。
客户端升级检查
主要是一些API client的改变,不影响我们服务端升级
服务端升级检查
升级要求
要将现有的etcd升级到3.2,正在运行的集群必须是3.1或更高版本。如果现有版本低于3.1,需要先升级到3.1,再升级到3.2。
同时,为确保滚动升级过程中现有集群任然健康运行。在升级之前,使用etcdctl endpoint health命令检查集群的健康状况。
升级前准备
在升级etcd之前,在将升级部署到生产环境之前,应始终在测试环境中测试依赖于etcd的服务是否正常。在开始之前,请备份etcd数据。如果升级出现问题,可以使用此备份文件降级到现有的etcd版本。请注意,快照命令只备份v3数据。对于v2数据,请参阅备份v2数据存储。
混合版本
在升级的同时,一个etcd集群支持etcd成员的混合版本,并且运行在最低版本的协议上。一旦所有成员升级到版本3.2,集群被视为升级完毕。在集群内部,etcd成员会彼此协商确定整个集群版本,控制报告的版本以及支持的功能。
局限性
注意:如果集群只有v3数据,没有v2数据,则不受此限制。
如果集群服务的数据集大于50mb,则每个新升级的成员最多可能需要两分钟才能追上现有集群。可以通过检查最近快照的大小以估计总数据大小。换句话说,升级每个etcd成员之后等待2分钟再逐个升级是最安全的。
对于一个更大的数据总量,100MB或更多,这个一次性过程可能需要更多的时间。如此大规模的etcd集群的管理员可以在升级之前随时联系etcd团队,他们会给一些建议帮助。
降级
如果所有成员都升级到v3.2,则整个集群将升级到v3.2,并且不能从此升级完成状态降级。如果任何单个成员仍然是v3.1,那么集群及其操作仍然是“v3.1”,并且可以从该混合集群状态返回到在所有成员上使用v3.1 etcd二进制文件。
请备份所有etcd成员的数据目录,以便在完全升级后降级集群。
升级步骤
以下示例显示如何升级在本地计算机上运行的3个成员的v3.1版本 ectd集群。跨机器集群及使用SSL认证通信集群可参考修改。
- 检查升级要求
# 确保集群是健康运行在v3.1版本上
$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms
localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms
localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms
$ curl http://localhost:2379/version
{"etcdserver":"3.1.7","etcdcluster":"3.1.0"}
-
停止现有的etcd进程
当每个etcd进程停止时,其他集群成员将记录预期的错误。这是正常的,因为集群成员连接已经(暂时)中断:
2017-04-27 14:13:31.491746 I | raft: c89feb932daef420 [term 3] received MsgTimeoutNow from 6d4f535bae3ab960 and starts an election to get leadership.
2017-04-27 14:13:31.491769 I | raft: c89feb932daef420 became candidate at term 4
2017-04-27 14:13:31.491788 I | raft: c89feb932daef420 received MsgVoteResp from c89feb932daef420 at term 4
2017-04-27 14:13:31.491797 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 6d4f535bae3ab960 at term 4
2017-04-27 14:13:31.491805 I | raft: c89feb932daef420 [logterm: 3, index: 9] sent MsgVote request to 9eda174c7df8a033 at term 4
2017-04-27 14:13:31.491815 I | raft: raft.node: c89feb932daef420 lost leader 6d4f535bae3ab960 at term 4
2017-04-27 14:13:31.524084 I | raft: c89feb932daef420 received MsgVoteResp from 6d4f535bae3ab960 at term 4
2017-04-27 14:13:31.524108 I | raft: c89feb932daef420 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
2017-04-27 14:13:31.524123 I | raft: c89feb932daef420 became leader at term 4
2017-04-27 14:13:31.524136 I | raft: raft.node: c89feb932daef420 elected leader c89feb932daef420 at term 4
2017-04-27 14:13:31.592650 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream MsgApp v2 reader)
2017-04-27 14:13:31.592825 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message reader)
2017-04-27 14:13:31.693275 E | rafthttp: failed to dial 6d4f535bae3ab960 on stream Message (dial tcp [::1]:2380: getsockopt: connection refused)
2017-04-27 14:13:31.693289 I | rafthttp: peer 6d4f535bae3ab960 became inactive
2017-04-27 14:13:31.936678 W | rafthttp: lost the TCP streaming connection with peer 6d4f535bae3ab960 (stream Message writer)
如果出现任何问题,备份etcd数据以提供降级是一个好方法。
-
替换为etcd v3.2二进制文件并启动新的etcd进程
新的v3.2 etcd会将其信息发布到集群:
2017-04-27 14:14:25.363225 I | etcdserver: published {Name:s1 ClientURLs:[http://localhost:2379]} to cluster a9ededbffcb1b1f1
验证每个etcd成员,然后验证整个集群是否健康:
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms
localhost:32379 is healthy: successfully committed proposal: took = 7.321771ms
localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
升级后的成员将会记录下列警告,直到整个集群升级完毕。这是预期之内的,直到所有etcd集群成员升级到v3.2后将停止警告:
2017-04-27 14:15:17.071804 W | etcdserver: member c89feb932daef420 has a higher version 3.2.0
2017-04-27 14:15:21.073110 W | etcdserver: the local etcd version 3.1.7 is not up-to-date
2017-04-27 14:15:21.073142 W | etcdserver: member 6d4f535bae3ab960 has a higher version 3.2.0
2017-04-27 14:15:21.073157 W | etcdserver: the local etcd version 3.1.7 is not up-to-date
2017-04-27 14:15:21.073164 W | etcdserver: member c89feb932daef420 has a higher version 3.2.0
- 逐个升级其他成员重复步骤2至步骤3
- 升级完成
当所有成员升级时,集群将成功报告已经升级到3.2:
2017-04-27 14:15:54.536901 N | etcdserver/membership: updated the cluster version from 3.1 to 3.2
2017-04-27 14:15:54.537035 I | etcdserver/api: enabled capabilities for version 3.2
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms
localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms
localhost:32379 is healthy: successfully committed proposal: took = 2.517902ms