一,Etcd数据备份及恢复
etcd的数据默认会存放在我们的命令工作目录中,我们发现数据所在的目录,会被分为两个文件夹中:
- snap: 存放快照数据,etcd防止WAL文件过多而设置的快照,存储etcd数据状态。
- wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前,都要先写入到WAL中。
准备工作:
建立备份存放目录
mkdir -p /backup_$(date +%Y%m%d)
备份/etc/kubernetes目录
cp -r /etc/kubernetes/ /backup_$(date +%Y%m%d)/
备份/var/lib/etcd目录
cp -r /var/lib/etcd/ /backup_$(date +%Y%m%d)/
备份 /var/lib/kubelet目录
cp -r /var/lib/kubelet/ /backup_$(date +%Y%m%d)/
使用kubeadm创建的k8s集群,etcd是使用容器运行的,因此备份和还原数据库需要将容器中的etcdctl命令拷贝到操作节点系统下的/usr/bin/目录下
docker cp $(docker ps | grep -v etcd-mirror | grep -w etcd | awk '{print $1}'):/usr/local/bin/etcdctl /usr/bin/
A,单节点etcd数据备份和恢复
这种方式的备份和恢复,用基于文件的备份即可。Kubeadm的默认安装时,将etcd的存储数据落地到了宿主机的/var/lib/etcd/目录,将此目录下的文件定期备份起来,如果以后etcd的数据出现问题,需要恢复时,直接将文件还原到此目录下,就实现了单节点的etcd数据恢复。
(tips:如果etcd容器正在启动,是不能覆盖的,这时只需要将etcd的manifest文件[/etc/kubernetes/manifests/etcd.yaml]里的etcd版本号更改一下,然后,用docker stop命令停止etcd容器,就不会自动重启了。数据还原后,将etcd版本再回到正确的版本,kubelet服务就会自动将etcd容器重启起来)
Kubeadm安装的单master集群
V3版api:
备份ETCDCTL_API为3的etcd数据到之前的备份目录下。
ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" snapshot save /backup_$(date +%Y%m%d)/snap-$(date +%Y%m%d%H%M).db
恢复
恢复步骤:
需要先停掉Master节点的kube-apiserver和etcd容器,确保kube-apiserver已经停止了。
备注:/etc/kubernetes/manifests 这个目录就是master自动运行的一些容器,将其移走或者重命名,自然就会停止了。
cd /etc/kubernetes/
ll
停掉Master机器的kube-apiserver和etcd
mv manifests manifests.bak
查看etcd、api是否up,等待全部停止
docker ps|grep k8s_
未停之前
停止之后
重命名manifests目录后,无法再获取到pods了
可以发现manifests目录的重要性,因此建议对此目录也进行定期备份
# kubectl get pods -A
The connection to the server 192.168.100.201:6443 was refused - did you specify the right host or port?
变更/var/lib/etcd
mv /var/lib/etcd /var/lib/etcd.bak
恢复etcd数据
ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" snapshot restore /backup_20220108/snap-202201081337.db
备注:
1)ETCDCTL_API=3,指定使用 Etcd 的 v3 版本的 API;
2)endponits可以通过下面的命令查找,一般会有两个IP,一个是127.0.0.1,另外一个本机的局域网IP,如:192.168.100.201
#kubectl describe pod etcd-master -n kube-system| grep listen-client-urls
--listen-client-urls=https://127.0.0.1:2379,https://192.168.100.201:2379
恢复manifests
mv manifests.bak manifests
查看pod是否恢复正常了
kubectl get pod -n kube-system
V2版api:
备份ETCDCTL_API为2的etcd数据到之前的备份目录下。(未验证)
# etcdctl backup --data-dir /home/etcd/ --backup-dir /home/etcd_backup
恢复
# etcdctl -data-dir=/home/etcd_backup/ -force-new-cluster
二进制集群Etcd数据库的备份
首先需要安装etcdctl 命令行命令
yum install -y etcd
V3版api:
# ETCDCTL_API=3 etcdctl snapshot save snap.20220107.db --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://192.168.119.72:2379"
{"level":"info","ts":1630499882.9289303,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"snap.db.part"}
{"level":"info","ts":"2022-01-07T20:38:02.933+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1630499882.933808,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://192.168.119.72:2379"}
{"level":"info","ts":"2022-01-07T20:38:03.040+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1630499883.0697453,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://192.168.119.72:2379","size":"13 MB","took":0.140736973}
{"level":"info","ts":1630499883.0698237,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"snap.db"}
Snapshot saved at snap.20220107.db
# ls -ltr
-rw------- 1 root root 12906528 1月 7 20:38 snap.20220107.db
二进制集群Etcd数据库的还原
下面的二进制etcd集群数据库的还原操作没有在二进制集群实际验证,只是理论步骤,请勿在生产环境直接操作!
systemctl stop kube-apiserver
systemctl stop etcd
mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak
--->如果不知道二进制集群的etcd数据库放在哪里了,可以这样查看
# systemctl cat etcd.service
# ETCDCTL_API=3 etcdctl snapshot restore /data/backup/etcd-snapshot-previous.db --data-dir=/var/lib/etcd/default.etcd
# chown -R etcd:etcd /var/lib/etcd
# systemctl start kube-apiserver
# systemctl start etcd.service
B,etcd集群数据的备份和恢复
Kubeadm安装的多master集群
V3版api:
备份ETCDCTL_API为3的etcd数据到之前的备份目录下。
可以在多个master节点上执行备份操作
ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" snapshot save /backup_$(date +%Y%m%d)/snap-$(date +%Y%m%d%H%M).db
恢复
恢复步骤:
需要先停掉所有Master节点的kube-apiserver和etcd,确保kube-apiserver已经停止了。
需要分别在master1、master2、master3上进行同样的操作
cd /etc/kubernetes/
ll
停掉Master机器的kube-apiserver和etcd
mv manifests manifests.bak
查看etcd、api是否up,等待全部停止
docker ps|grep k8s_
未停之前
停止之后
变更/var/lib/etcd
mv /var/lib/etcd /var/lib/etcd.bak
恢复etcd数据
etcd集群用同一份snapshot恢复;
scp /backup_20220108/snap-202201081337.db root@192.168.100.172:/backup_20220108/
scp /backup_20220108/snap-202201081337.db root@192.168.100.173:/backup_20220108/
在master1上执行
ETCDCTL_API=3 etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
--endpoints=192.168.100.171:2379 \
--name=master1 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--initial-advertise-peer-urls=https://192.168.100.171:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
--data-dir=/var/lib/etcd
在master2上执行
ETCDCTL_API=3 etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
--endpoints=192.168.100.172:2379 \
--name=master2 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--initial-advertise-peer-urls=https://192.168.100.172:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
--data-dir=/var/lib/etcd
在master3上执行
ETCDCTL_API=3
etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
--endpoints=192.168.100.173:2379 \
--name=master3 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--initial-advertise-peer-urls=https://192.168.100.173:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
--data-dir=/var/lib/etcd
备注:
1)ETCDCTL_API=3,指定使用 Etcd 的 v3 版本的 API;
2)如果不知道 --name= 则可以用如下命令查看
集群列出成员
ETCDCTL_API=3 etcdctl --endpoints 192.168.100.171:2379,192.168.100.172:2379,192.168.100.173:2379 --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list --write-out=table
返回结果:
+------------------+---------+---------+------------------------------+------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+---------+------------------------------+------------------------------+------------+
| 442ee8f1d97e7dcd | started | master3 | https://192.168.100.173:2380 | https://192.168.100.173:2379 | false |
| 4972579f39eb9468 | started | master1 | https://192.168.100.171:2380 | https://192.168.100.171:2379 | false |
| 4bff6a42b677cc19 | started | master2 | https://192.168.100.172:2380 | https://192.168.100.172:2379 | false |
+------------------+---------+---------+------------------------------+------------------------------+------------+
在三台master节点上恢复manifests
mv manifests.bak manifests
查看pod是否恢复正常了
kubectl get pod -n kube-system
刚恢复时的结果
恢复后1分钟后的结果
Kubeadm安装的多master集群
二进制部署方式安装的多etcd节点集群备份:
首先需要安装etcdctl 命令行命令
yum install -y etcd
备份
ETCDCTL_API=3 etcdctl \
snapshot save snap.db \
--endpoints=https://192.168.10.160:2379 \
--cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem
恢复
先暂停kube-apiserver和etcd
systemctl stop kube-apiserver
systemctl stop etcd etcd
mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak
在每个节点上恢复
节点一恢复
ETCDCTL_API=3 etcdctl snapshot restore snap.db \
--name etcd-1 \
--initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
--initial-advertise-peer-url=https://192.168.10.160:2380 \
--data-dir=/var/lib/etcd/default.etcd
ETCDCTL_API=3 etcdctl snapshot restore snap.db \
--name etcd-2 \
--initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
--initial-advertise-peer-url=https://192.168.10.162:2380 \
--data-dir=/var/lib/etcd/default.etcd
ETCDCTL_API=3 etcdctl snapshot restore snap.db \
--name etcd-3 \
--initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
--initial-advertise-peer-url=https://192.168.10.162:2380 \
--data-dir=/var/lib/etcd/default.etcd
启动kube-apiserver和etcd
mv /var/lib/etcd/default.etcd.bak /var/lib/etcd/default.etcd
systemctl start kube-apiserver
systemctl start etcd.service
参考:https://blog.csdn.net/cnskylee/article/details/120048464
参考:etcd 灾难恢复文档。
参考k8s中文文档
https://kubernetes.io/zh/docs/tasks/administer-cluster/configure-upgrade-etcd/
参考:https://blog.csdn.net/qq_27234433/article/details/113731407
Kubernetes的ETCD集群备份、恢复
https://blog.csdn.net/heian_99/article/details/123398209