k8s之etcd

etcd(https://etcd.io/)是k8s的数据库,内部采用了raft协议作为一致性算法。etcd里面存放了k8s所有相关的数据,比如pod信息 service信息等
etcd存储数据的类型 /a/b/c/d value 路径 值

etcd属性

完全复制:集群中每个节点都拥有完成的文档
高可用性:多节点集群
一致性:raft算法实现一致性 选举leader
安全:支持TLS身份验证
快速:每秒万次的写入速度

etcd硬件配置官方建议https://etcd.io/docs/v3.5/op-guide/hardware/

  • 8c 8g  数据盘ssd  数百个pod的集群
  • 8c 16g 数据盘ssd  数千个pod的集群
  • 16c 32g 数据盘ssd  数万个pod的集群

看一下etcd的service文件

root@etcd-2:~# vi /etc/systemd/system/etcd.service 

[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/usr/local/bin//etcd \
  --name=etcd-192.168.10.108 \
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.10.108:2380 \
  --listen-peer-urls=https://192.168.10.108:2380 \
  --listen-client-urls=https://192.168.10.108:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.10.108:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-192.168.10.107=https://192.168.10.107:2380,etcd-192.168.10.108=https://192.168.10.108:2380,etcd-192.168.10.109=https://192.168.10.109:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd \
  --wal-dir= \
  --snapshot-count=50000 \
  --auto-compaction-retention=1 \
  --auto-compaction-mode=periodic \
  --max-request-bytes=10485760 \
  --quota-backend-bytes=8589934592
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999

[Install]

image.png

查看etcd的数据目录

root@etcd-2:~# ll /var/lib/etcd/
total 4
drwx------  3 root root   20 Apr 19 17:44 ./
drwxr-xr-x 43 root root 4096 Apr 20 06:11 ../
drwx------  4 root root   29 Apr 19 17:44 member/
root@etcd-2:~# ll /var/lib/etcd/member/
total 0
drwx------ 4 root root  29 Apr 19 17:44 ./
drwx------ 3 root root  20 Apr 19 17:44 ../
drwx------ 2 root root 246 Apr 21 17:36 snap/
drwx------ 2 root root 199 Apr 21 10:13 wal/

snap 存放的是数据
wal 存放的是预写式日志(在插入数据的时候先写日志在写数据,如果日志没写成功那么数据也就没插入成功,后期通过日志恢复数据)

查看etcd集群的节点有哪些

root@etcd-1:~# etcdctl member list
71745e1fe53ea3d2, started, etcd-192.168.10.107, https://192.168.10.107:2380, https://192.168.10.107:2379, false
b3497c3662525c94, started, etcd-192.168.10.108, https://192.168.10.108:2380, https://192.168.10.108:2379, false
cff05c5d2e5d7019, started, etcd-192.168.10.109, https://192.168.10.109:2380, https://192.168.10.109:2379, false

cff05c5d2e5d7019, started, etcd-192.168.10.109, https://192.168.10.109:2380, https://192.168.10.109:2379, false
id , 状态,名称,集群端口,客户端端口,是否在同步数据

etcd的健康检查
此种方法适用于本机

root@etcd-1:~# etcdctl endpoint health
127.0.0.1:2379 is healthy: successfully committed proposal: took = 3.52306ms

集群监控可以写一个for循环

root@etcd-1:~# export NODE_IPS="192.168.10.107 192.168.10.108 192.168.10.109"
root@etcd-1:~#  for ip in ${NODE_IPS}; do   ETCDCTL_API=3 /usr/local/bin/etcdctl   --endpoints=https://${ip}:2379    --cacert=/etc/kubernetes/ssl/ca.pem   --cert=/etc/kubernetes/ssl/etcd.pem   --key=/etc/kubernetes/ssl/etcd-key.pem   endpoint health; done
https://192.168.10.107:2379 is healthy: successfully committed proposal: took = 15.070727ms
https://192.168.10.108:2379 is healthy: successfully committed proposal: took = 9.874537ms
https://192.168.10.109:2379 is healthy: successfully committed proposal: took = 8.872484ms

以表格形式输出

oot@etcd-1:~# for ip in ${NODE_IPS}; do   ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status  --endpoints=https://${ip}:2379    --cacert=/etc/kubernetes/ssl/ca.pem   --cert=/etc/kubernetes/ssl/etcd.pem   --key=/etc/kubernetes/ssl/etcd-key.pem   endpoint health; done
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.10.107:2379 | 71745e1fe53ea3d2 |  3.4.13 |  2.7 MB |     false |      false |         4 |     521493 |             521493 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.10.108:2379 | b3497c3662525c94 |  3.4.13 |  2.7 MB |     false |      false |         4 |     521493 |             521493 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.10.109:2379 | cff05c5d2e5d7019 |  3.4.13 |  2.7 MB |      true |      false |         4 |     521493 |             521493 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

etcd的基础操作

查看etc中所有的key

root@etcd-1:~# etcdctl get / --prefix --keys-only
/calico/ipam/v2/assignment/ipv4/block/10.200.205.192-26

/calico/ipam/v2/assignment/ipv4/block/10.200.247.0-26

/calico/ipam/v2/assignment/ipv4/block/10.200.39.0-26

/calico/ipam/v2/assignment/ipv4/block/10.200.84.128-26

/calico/ipam/v2/handle/ipip-tunnel-addr-master-1

/calico/ipam/v2/handle/ipip-tunnel-addr-master-2

/calico/ipam/v2/handle/ipip-tunnel-addr-node-1

/calico/ipam/v2/handle/ipip-tunnel-addr-node-2

/calico/ipam/v2/handle/k8s-pod-network.3844a5799fbfdd20ab3ee16c6b176626d04c635ed8aa57a36d9e43a11b028713

/calico/ipam/v2/handle/k8s-pod-network.52d7e2ca8546bf0739c79c425ea421c63be1653fe74811c2d4b6c9242111fb22

/calico/ipam/v2/handle/k8s-pod-network.ab0b92bfc89fef7eb4486080bff1aa4e6f28109a105a70aceafb325d1d514d23

/calico/ipam/v2/handle/k8s-pod-network.c8dc5605cd5ed0a43a6169cf74d2f1738ddde1d5e72f2b4bd0cbfffe14a1232e

查看某个pod的key
net-test1为pod名字

root@master-1:~# kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
net-test1   1/1     Running   0          2d21h
net-test2   1/1     Running   0          2d21h

root@etcd-1:~# etcdctl get / --prefix --keys-only|grep net-test1
/registry/pods/default/net-test1

查看/registry/pods/default/net-test1这个key的信息

root@etcd-1:~# etcdctl get /registry/pods/default/net-test1
/registry/pods/default/net-test1
k8s
    
v1Podې
    net-test1default"*$d9e53134-3638-4e51-bb43-b944037bd5652¯䏚 
run net-test1z󿾃

kubectl-runUpdatev¯FieldsV1: 
{"f:metadata":{"f:labels":{".":{},"f:run":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"net-test1\"}":{".":{},"f:args":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}· 
kubeletUpdatev®¯FieldsV1: 
{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:phase":{},"f:podIP":{},"f:podIPs":{".":{},"k:{\"ip\":\"10.200.84.129\"}":{".":{},"f:ip":{}}},"f:startTime":{}}}« 
 
kube-api-access-tc2lvkЁh
"

token
(&
 
kube-root-ca.crt 
ca.crtca.crt
)'
%
    namespace 
v1metadata.namespace¤± 
    net-test1centos:7.9.2009"sleep"300000*BJL
kube-api-access-tc2lv-/var/run/secrets/kubernetes.io/serviceaccount"2j/dev/termination-logr
                                                                                           IfNotPresent¢FileAlways 2
                                                                                                                    ClusterFirstBdefaultJdefaultR192.168.10.104X`hrdefault-scheduler²6
node.kubernetes.io/not-readyExists" NoExecute(¬²8
node.kubernetes.io/unreachableExists"   NoExecute(¬Ɓ񿂺PreemptLowerPriorityȃ
Running#

InitializedTru¯䎪2 
ReadyTru®¯䎪2'
ContainersReadyTru®¯䎪2$

10.200.84.12¯䏂݁u¯䎪2"*192.168.10.1042
    net-test1 
                  
                                                                                                                                                                                                                                                                ®¯䎚 (2centos:7.9.2009:`docker-pullable://centos@sha256:9d4bcbbb213dfd745b58be38b13b996ebb5ac315fe75711bd618426a630e0987BIdocker://d261d1933b0740fb2d478d4248371b89cbb95422fc75b05b88e3b7f032e6c818HJ
BestEffortZb
10.200.84.129"


删除pod就是删除他所对应的key

root@etcd-1:~# etcdctl del /registry/pods/default/net-test1
1
root@master-1:~# kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
net-test2   1/1     Running   0          2d21h

此时看到已经没有了net-test1这个pod,此项操作十分危险,请谨慎操作

上传数据

root@etcd-1:~# etcdctl put /qijia "0324"
OK
root@etcd-1:~# etcdctl get /qijia
/qijia
0324

etcd数据的watch机制
机遇不间断监控数据,发生变化就主动处罚通知客户端
测试一下etcd的watch机制
右边我上传key的值左边的watch窗口就可以实时获取到,更新完key值后也会被watch实时获取


image.png

etcd单机备份和恢复
备份

root@etcd-1:~# etcdctl snapshot save /data/backup/etcd-backup-`date +%F%H%M`
{"level":"info","ts":1650619578.2251773,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/data/backup/etcd-backup-2022-04-221726.part"}
{"level":"info","ts":"2022-04-22T17:26:18.225+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1650619578.2258816,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2022-04-22T17:26:18.245+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1650619578.2582552,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"2.7 MB","took":0.032995421}
{"level":"info","ts":1650619578.258354,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/data/backup/etcd-backup-2022-04-221726"}
Snapshot saved at /data/backup/etcd-backup-2022-04-221726

恢复数据 /data/etcd 一定要是空目录 etcd是自动创建的无需自己创建

root@etcd-1:~# etcdctl snapshot restore /data/backup/etcd-backup-2022-04-221726 --data-dir=/data/etcd
{"level":"info","ts":1650619683.6522489,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/data/backup/etcd-backup-2022-04-221726","wal-dir":"/data/etcd/member/wal","data-dir":"/data/etcd","snap-dir":"/data/etcd/member/snap"}
{"level":"info","ts":1650619683.6754777,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":421636}
{"level":"info","ts":1650619683.682951,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1650619683.6885314,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/data/backup/etcd-backup-2022-04-221726","wal-dir":"/data/etcd/member/wal","data-dir":"/data/etcd","snap-dir":"/data/etcd/member/snap"}
root@etcd-1:~# ll /data/etcd/
total 0
drwx------ 3 root root 20 Apr 22 17:28 ./
drwxr-xr-x 4 root root 32 Apr 22 17:28 ../
drwx------ 4 root root 29 Apr 22 17:28 member/
root@etcd-1:~# ll /data/etcd/member/
total 0
drwx------ 4 root root 29 Apr 22 17:28 ./
drwx------ 3 root root 20 Apr 22 17:28 ../
drwx------ 2 root root 62 Apr 22 17:28 snap/
drwx------ 2 root root 51 Apr 22 17:28 wal/

此时数据已经恢复在了/data/etcd目录中 我们只需要将etcd.service文件中的 WorkingDirectory和 --data-dir改成恢复后的目录并重启etcd就可以

root@etcd-1:~# vi /etc/systemd/system/etcd.service 

[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/usr/local/bin//etcd \
  --name=etcd-192.168.10.107 \
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.10.107:2380 \
  --listen-peer-urls=https://192.168.10.107:2380 \
  --listen-client-urls=https://192.168.10.107:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.10.107:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-192.168.10.107=https://192.168.10.107:2380,etcd-192.168.10.108=https://192.168.10.108:2380,etcd-192.168.10.109=https://192.168.10.109:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd \
  --wal-dir= \
  --snapshot-count=50000 \
  --auto-compaction-retention=1 \
  --auto-compaction-mode=periodic \
  --max-request-bytes=10485760 \
  --quota-backend-bytes=8589934592
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

etcd集群备份和还原
因为我们的k8s集群是通过kubeasz工具安装的,所以我们可以通过kubeasz自带的脚本来备份和恢复etcd的数据
所需要的脚本如下
94.backup.yml 备份脚本
95.restore.yml 恢复脚本

root@master-1:~# ll /etc/kubeasz/playbooks/
total 92

-rw-rw-r--  1 root root 1786 Apr 26  2021 94.backup.yml
-rw-rw-r--  1 root root  999 Apr 26  2021 95.restore.yml

备份 ezctl backup k8s集群名

root@master-1:~# ezctl --help

    backup      <cluster>            to backup the cluster state (etcd snapshot)
    restore     <cluster>            to restore the cluster state from backups

我们的集群名为qijia01

root@master-1:~# ll /etc/kubeasz/clusters/
total 0
drwxr-xr-x  3 root root  21 Feb 23 16:22 ./
drwxrwxr-x 12 root root 225 Feb 23 16:22 ../
drwxr-xr-x  5 root root 203 Apr 20 18:44 qijia01/

开始备份

root@master-1:~# ezctl backup qijia01
ansible-playbook -i clusters/qijia01/hosts -e @clusters/qijia01/config.yml playbooks/94.backup.yml
2022-04-22 18:02:54 INFO cluster:qijia01 backup begins in 5s, press any key to abort:


PLAY [localhost] *************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************
ok: [localhost]

TASK [set NODE_IPS of the etcd cluster] **************************************************************************************************************************************************************************************************************************
ok: [localhost]

TASK [get etcd cluster status] ***********************************************************************************************************************************************************************************************************************************
changed: [localhost]

TASK [debug] *****************************************************************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "ETCD_CLUSTER_STATUS": {
        "changed": true,
        "cmd": "for ip in 192.168.10.107 192.168.10.108 192.168.10.109 ;do ETCDCTL_API=3 /etc/kubeasz/bin/etcdctl --endpoints=https://\"$ip\":2379 --cacert=/etc/kubeasz/clusters/qijia01/ssl/ca.pem --cert=/etc/kubeasz/clusters/qijia01/ssl/etcd.pem --key=/etc/kubeasz/clusters/qijia01/ssl/etcd-key.pem endpoint health; done",
        "delta": "0:00:00.526961",
        "end": "2022-04-22 18:03:04.644297",
        "failed": false,
        "msg": "",
        "rc": 0,
        "start": "2022-04-22 18:03:04.117336",
        "stderr": "https://192.168.10.107:2379 is healthy: successfully committed proposal: took = 42.136716ms\nhttps://192.168.10.108:2379 is healthy: successfully committed proposal: took = 12.285904ms\nhttps://192.168.10.109:2379 is healthy: successfully committed proposal: took = 11.06195ms",
        "stderr_lines": [
            "https://192.168.10.107:2379 is healthy: successfully committed proposal: took = 42.136716ms",
            "https://192.168.10.108:2379 is healthy: successfully committed proposal: took = 12.285904ms",
            "https://192.168.10.109:2379 is healthy: successfully committed proposal: took = 11.06195ms"
        ],
        "stdout": "",
        "stdout_lines": []
    }
}

TASK [get a running ectd node] ***********************************************************************************************************************************************************************************************************************************
changed: [localhost]

TASK [debug] *****************************************************************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "RUNNING_NODE.stdout": "192.168.10.107"
}

TASK [get current time] ******************************************************************************************************************************************************************************************************************************************
changed: [localhost]

TASK [make a backup on the etcd node] ****************************************************************************************************************************************************************************************************************************
changed: [localhost -> 192.168.10.107]

TASK [fetch the backup data] *************************************************************************************************************************************************************************************************************************************
changed: [localhost -> 192.168.10.107]

TASK [update the latest backup] **********************************************************************************************************************************************************************************************************************************
changed: [localhost]

PLAY RECAP *******************************************************************************************************************************************************************************************************************************************************
localhost                  : ok=10   changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

查看备份的数据是否存在
snapshot.db是复制的最新备份文件的数据,恢复的脚本写死了文件名为snapshot.db,但我们备份的文件是以时间戳命名的 所以在备份完成后会执行cp snapshot_202204221803.db snapshot.db

root@master-1:~# ll /etc/kubeasz/clusters/qijia01/backup/
total 5248
drwxr-xr-x 2 root root      57 Apr 22 18:03 ./
drwxr-xr-x 5 root root     203 Apr 20 18:44 ../
-rw------- 1 root root 2682912 Apr 22 18:03 snapshot.db
-rw------- 1 root root 2682912 Apr 22 18:03 snapshot_202204221803.db

删掉一个pod 看恢复的时候是否能还原回来

root@master-1:~# kubectl get pod -A
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE
default                net-test1                                   1/1     Running   0          66m
default                net-test2                                   1/1     Running   0          2d22h
kube-system            calico-kube-controllers-647f956d86-zrjq9    1/1     Running   0          2d23h
kube-system            calico-node-47phc                           1/1     Running   0          2d23h
kube-system            calico-node-9ghhw                           1/1     Running   0          2d23h
kube-system            calico-node-c7stp                           1/1     Running   0          2d23h
kube-system            calico-node-lcjsx                           1/1     Running   0          2d23h
kube-system            coredns-74c56d8f8d-d2jbp                    1/1     Running   0          2d
kube-system            coredns-74c56d8f8d-vds9h                    1/1     Running   0          2d
kubernetes-dashboard   dashboard-metrics-scraper-c45b7869d-5h8t7   1/1     Running   0          47h
kubernetes-dashboard   kubernetes-dashboard-576cb95f94-mzwpz       1/1     Running   0          47h
root@master-1:~# kubectl delete pod net-test1
pod "net-test1" deleted

root@master-1:~# 
root@master-1:~# kubectl get pod -A
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE
default                net-test2                                   1/1     Running   0          2d22h
kube-system            calico-kube-controllers-647f956d86-zrjq9    1/1     Running   0          2d23h
kube-system            calico-node-47phc                           1/1     Running   0          2d23h
kube-system            calico-node-9ghhw                           1/1     Running   0          2d23h
kube-system            calico-node-c7stp                           1/1     Running   0          2d23h
kube-system            calico-node-lcjsx                           1/1     Running   0          2d23h
kube-system            coredns-74c56d8f8d-d2jbp                    1/1     Running   0          2d
kube-system            coredns-74c56d8f8d-vds9h                    1/1     Running   0          2d
kubernetes-dashboard   dashboard-metrics-scraper-c45b7869d-5h8t7   1/1     Running   0          47h
kubernetes-dashboard   kubernetes-dashboard-576cb95f94-mzwpz       1/1     Running   0          47h

恢复数据并验证删除的pod是否被还原

root@master-1:~# ezctl restore qijia01
ansible-playbook -i clusters/qijia01/hosts -e @clusters/qijia01/config.yml playbooks/95.restore.yml
2022-04-22 18:06:01 INFO cluster:qijia01 restore begins in 5s, press any key to abort:


PLAY [kube_master] ***********************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************
ok: [192.168.10.101]
ok: [192.168.10.102]

TASK [stopping kube_master services] *****************************************************************************************************************************************************************************************************************************
changed: [192.168.10.102] => (item=kube-apiserver)
changed: [192.168.10.101] => (item=kube-apiserver)
changed: [192.168.10.102] => (item=kube-controller-manager)
changed: [192.168.10.102] => (item=kube-scheduler)
changed: [192.168.10.101] => (item=kube-controller-manager)
changed: [192.168.10.101] => (item=kube-scheduler)

PLAY [kube_master,kube_node] *************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************
ok: [192.168.10.104]
ok: [192.168.10.105]

TASK [stopping kube_node services] *******************************************************************************************************************************************************************************************************************************
changed: [192.168.10.105] => (item=kubelet)
changed: [192.168.10.101] => (item=kubelet)
changed: [192.168.10.102] => (item=kubelet)
changed: [192.168.10.104] => (item=kubelet)
changed: [192.168.10.105] => (item=kube-proxy)
changed: [192.168.10.101] => (item=kube-proxy)
changed: [192.168.10.102] => (item=kube-proxy)
changed: [192.168.10.104] => (item=kube-proxy)

PLAY [etcd] ******************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************
ok: [192.168.10.107]
ok: [192.168.10.109]
ok: [192.168.10.108]

TASK [cluster-restore : 停止ectd 服务] ***************************************************************************************************************************************************************************************************************************
changed: [192.168.10.109]
changed: [192.168.10.108]
changed: [192.168.10.107]

TASK [cluster-restore : 清除etcd 数据目录] ***********************************************************************************************************************************************************************************************************************
changed: [192.168.10.108]
changed: [192.168.10.109]
changed: [192.168.10.107]

TASK [cluster-restore : 生成备份目录] ****************************************************************************************************************************************************************************************************************************
ok: [192.168.10.107]
changed: [192.168.10.109]
changed: [192.168.10.108]

TASK [cluster-restore : 准备指定的备份etcd 数据] *****************************************************************************************************************************************************************************************************************
changed: [192.168.10.109]
changed: [192.168.10.108]
changed: [192.168.10.107]

TASK [cluster-restore : 清理上次备份恢复数据] ********************************************************************************************************************************************************************************************************************
ok: [192.168.10.107]
ok: [192.168.10.108]
ok: [192.168.10.109]

TASK [cluster-restore : etcd 数据恢复] ***************************************************************************************************************************************************************************************************************************
changed: [192.168.10.107]
changed: [192.168.10.108]
changed: [192.168.10.109]

TASK [cluster-restore : 恢复数据至etcd 数据目录] *****************************************************************************************************************************************************************************************************************
changed: [192.168.10.108]
changed: [192.168.10.107]
changed: [192.168.10.109]

TASK [cluster-restore : 重启etcd 服务] ***************************************************************************************************************************************************************************************************************************
changed: [192.168.10.107]
changed: [192.168.10.109]
changed: [192.168.10.108]

TASK [cluster-restore : 以轮询的方式等待服务同步完成] ************************************************************************************************************************************************************************************************************
changed: [192.168.10.107]
changed: [192.168.10.108]
changed: [192.168.10.109]

PLAY [kube_master] ***********************************************************************************************************************************************************************************************************************************************

TASK [starting kube_master services] *****************************************************************************************************************************************************************************************************************************
changed: [192.168.10.102] => (item=kube-apiserver)
changed: [192.168.10.102] => (item=kube-controller-manager)
changed: [192.168.10.101] => (item=kube-apiserver)
changed: [192.168.10.102] => (item=kube-scheduler)
changed: [192.168.10.101] => (item=kube-controller-manager)
changed: [192.168.10.101] => (item=kube-scheduler)

PLAY [kube_master,kube_node] *************************************************************************************************************************************************************************************************************************************

TASK [starting kube_node services] *******************************************************************************************************************************************************************************************************************************
changed: [192.168.10.104] => (item=kubelet)
changed: [192.168.10.102] => (item=kubelet)
changed: [192.168.10.105] => (item=kubelet)
changed: [192.168.10.101] => (item=kubelet)
changed: [192.168.10.105] => (item=kube-proxy)
changed: [192.168.10.102] => (item=kube-proxy)
changed: [192.168.10.104] => (item=kube-proxy)
changed: [192.168.10.101] => (item=kube-proxy)

PLAY RECAP *******************************************************************************************************************************************************************************************************************************************************
192.168.10.101             : ok=5    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
192.168.10.102             : ok=5    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
192.168.10.104             : ok=3    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
192.168.10.105             : ok=3    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
192.168.10.107             : ok=10   changed=7    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
192.168.10.108             : ok=10   changed=8    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
192.168.10.109             : ok=10   changed=8    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

此时看到  net-test1 已经恢复
root@master-1:~# kubectl get pod -A
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE
default                net-test1                                   1/1     Running   0          73m
default                net-test2                                   1/1     Running   0          2d23h
kube-system            calico-kube-controllers-647f956d86-zrjq9    1/1     Running   0          2d23h
kube-system            calico-node-47phc                           1/1     Running   0          2d23h
kube-system            calico-node-9ghhw                           1/1     Running   0          2d23h
kube-system            calico-node-c7stp                           1/1     Running   0          2d23h
kube-system            calico-node-lcjsx                           1/1     Running   0          2d23h
kube-system            coredns-74c56d8f8d-d2jbp                    1/1     Running   0          2d
kube-system            coredns-74c56d8f8d-vds9h                    1/1     Running   0          2d1h
kubernetes-dashboard   dashboard-metrics-scraper-c45b7869d-5h8t7   1/1     Running   0          2d
kubernetes-dashboard   kubernetes-dashboard-576cb95f94-mzwpz       1/1     Running   1          2d
root@master-1:~# 

比较遗憾的是此种备份和恢复方式是全量备份 全量恢复,如果我只是想恢复某一个namespace下面的pod 那么这方法就无法实现

恢复etcd的流程
当etcd的集群宕机总数超过了节点数的一半时 机会导致集群宕机 后期需要恢复数据流程如下

  • 恢复服务器系统
  • 重新部署etcd集群
  • 停止kube-apiserver/controller-manager/scheduer/kubelet/kube-proxy
  • 停止etcd集群
  • 各个etcd节点恢复同一份备份数据
  • 启动etcd集群并验证节点健康状态
  • 启动kube-apiserver/controller-manager/scheduer/kubelet/kube-proxy
  • 验证k8s master状态及pod数据
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,098评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,213评论 2 380
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,960评论 0 336
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,519评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,512评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,533评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,914评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,574评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,804评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,563评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,644评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,350评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,933评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,908评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,146评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,847评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,361评论 2 342

推荐阅读更多精彩内容