跨版本升级kubernetes v1.19.3的那些坑

背景

由于kubernetes v1.16+存在严重漏洞，所有我们决定将kubernetes版本升级到当时plan时的稳定版本v1.19.3，由于从1.16跨了两个比较大的版本升级到1.19，所以升级过程中遇到各种问题，本系列文章记录当时在升级过程中遇到的一些比较棘手的问题的解决方案。

kubernetes 的升级

kubernetes 在1.16.7版本之前，主要组件api-server, scheduler, controller-manager,kubelet都是打包在一起，合成一个二进制文件hyperkube，所以kubernetes主要组件只需构建成一个hyperkubeimage，但是在1.19.3版本社区又把这几个二进制文件拆开成kube-apiserver,kube-scheduler,kube-controller-manager三个独立的二进制文件，分别build成三个不同的image。
在升级kubernetesv1.19.3时存在的最大的一个棘手的问题就是v1.19.3总社区移除了好几个旧版本的API，比如将DaemonSet、Deployment资源从apiVersion: extensions/v1beta1改成使用apiVersion: apps/v1，升级之后，在升级后的cluster集群中，k8s 1.19.3将不再认识旧版本的API资源，所以首先第一步就是需要修改集群中所有的资源的API升级到最新的版本apps/v1,主要的是helm的chart。在修改了chart的版本后，看起来万事大吉了，但是问题来了，在升级k8s后，etcd中同时存储了旧版本的资源和新版本的资源，k8s为了保持Backward compatibility,向后兼容性，在升级后的cluster中，新的kubernetes版本能同时识别这两个版本的资源，没有问题，但是作为chart来说，helm并没有做这种新旧资源的向后兼容性，因为在升级前的helm release中已经安装的chart的 manifest是旧版本的资源，当使用helmclient来操作这些资源时，就会产生如下错误:

 helm status --tls image-manager --tls
Error: [unable to recognize "": no matches for kind "DaemonSet" in version "extensions/v1beta1", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta1"]

为此笔者还在社区一段操作，搜到几个类似的问题，最终明白就是因为helm release 中的metadata在新的k8s集群中不认识了,好在helm社区为了解决API不兼容的问题，又专门的fix文档，于是乎，根据社区的文档，需要手动去修改helm release 中的manifest数据
This is the official helm doc https://helm.sh/docs/topics/kubernetes_apis/ talking about the problem and provide some workaround, mainly section https://helm.sh/docs/topics/kubernetes_apis/#updating-api-versions-of-a-release-manifest.

From the doc it proposed two ways to workaround the issue:

Manually update release data in configmap with supported API version.
Use helm plugin mapkubeapis to help replace deprecated or removed API in helm release.

The manual step #1 involves lots of manual operations, for example get helm release data from configmap, decode it, update api version, encode and apply the new change in the configmap. And it's little complicated.

Here is an example how I use helm plugin mapkubeapis to fix the problem we see in image manager

# kubectl -n kube-system get configmap |grep image-manager
image-manager.v1                                1      21h

# helm list --tls | grep image-manager
image-manager               1           Wed Nov  4 01:35:29 2020    DEPLOYED    image-manager-3.3.2001                              kube-system

Install the plugin

# helm plugin install https://github.com/hickeyma/helm-mapkubeapis
Downloading and installing helm-mapkubeapis v0.0.15 ...
https://github.com/hickeyma/helm-mapkubeapis/releases/download/v0.0.15/helm-mapkubeapis_0.0.15_linux_amd64.tar.gz
Installed plugin: mapkubeapis

Update helm release image-manager with supported API version

# helm mapkubeapis --namespace=kube-system --v2 image-manager
2020/11/04 23:17:14 Release 'image-manager' will be checked for deprecated or removed Kubernetes APIs and will be updated if necessary to supported API versions.
2020/11/04 23:17:14 Get release 'image-manager' latest version.
2020/11/04 23:17:14 Check release 'image-manager' for deprecated or removed APIs...
2020/11/04 23:17:14 Found deprecated or removed Kubernetes API:
"apiVersion: apps/v1beta1
kind: StatefulSet"
Supported API equivalent:
"apiVersion: apps/v1
kind: StatefulSet"
2020/11/04 23:17:14 Found deprecated or removed Kubernetes API:
"apiVersion: extensions/v1beta1
kind: DaemonSet"
Supported API equivalent:
"apiVersion: apps/v1
kind: DaemonSet"
2020/11/04 23:17:14 Found deprecated or removed Kubernetes API:
"apiVersion: extensions/v1beta1
kind: Ingress"
Supported API equivalent:
"apiVersion: networking.k8s.io/v1beta1
kind: Ingress"
2020/11/04 23:17:14 Finished checking release 'image-manager' for deprecated or removed APIs.
2020/11/04 23:17:14 Deprecated or removed APIs exist, updating release: image-manager.
2020/11/04 23:17:14 Set status of release version 'image-manager.v1' to 'superseded'.
2020/11/04 23:17:14 Release version 'image-manager.v1' updated successfully.
2020/11/04 23:17:14 Add release version 'image-manager.v2' with updated supported APIs.
2020/11/04 23:17:14 Release version 'image-manager.v2' added successfully.
2020/11/04 23:17:14 Release 'image-manager' with deprecated or removed APIs updated successfully to new version.
2020/11/04 23:17:14 Map of release 'image-manager' deprecated or removed APIs to supported versions, completed successfully.

Now check again helm release image-manager

# kubectl -n kube-system get configmap | grep image-mana
image-manager-init-certs-config                 1      21h
image-manager.v1                                1      21h
image-manager.v2                                1      64s

# helm list --tls | grep image-mana
image-manager               2           Wed Nov  4 23:17:14 2020    DEPLOYED    image-manager-3.3.2001                              kube-system

# helm status image-manager --tls
LAST DEPLOYED: Wed Nov  4 23:17:14 2020
NAMESPACE: kube-system
STATUS: DEPLOYED

RESOURCES:
==> v1/ConfigMap
NAME                             DATA  AGE
image-manager-init-certs-config  1     21h
registry-config                  1     21h

==> v1/DaemonSet
NAME                      DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
image-manager-init-certs  3        3        3      3           3          <none>         21h

==> v1/StatefulSet
NAME           DESIRED  CURRENT  AGE
image-manager  1        1        21h

==> v1beta1/Ingress
NAME                 AGE
image-manager-token  21h
image-manager        21h

==> v1alpha1/Certificate
NAME                         AGE
image-manager-token-cert     21h
image-manager-registry-cert  21h
image-manager-cert           21h

==> v1/Pod(related)
NAME                            READY  STATUS   RESTARTS  AGE
image-manager-init-certs-7npql  1/1    Running  0         21h
image-manager-init-certs-f9mcq  1/1    Running  0         21h
image-manager-init-certs-vltsv  1/1    Running  0         21h
image-manager-0                 2/2    Running  0         21h

You see helm plugin mapkubeapis will help update helm release with supported API version and create new release revision. Now we can start ICP helm upgrade.
helm mapkubeapiplugin工具，相当于将helm release中的旧的API 资源copy一份修改成新的API资源，然后用新的API创建一个新的release,所以此操作之后会增加一个新的helm release。
历经了前面两个大难题，似乎一切看起来万事大吉了，升级是没啥大问题了。但是当rollback整个cluster的时候问题又来了，由于在旧版本的k8s的charts 中，是不强制要求加selector的，但是在k8s 1.19.3版本中，是强制要求加selector：

spec:
  selector:
    matchLabels:
      app: icp-management-ingress
      chart: icp-management-ingress
      component: icp-management-ingress
      heritage: Tiller
      k8s-app: icp-management-ingress
      release: icp-management-ingress

虽然我们可以在升级包的chart中加入selector，但是在rollback的时候问题就来了，rollback的时候报错：

[root@sunny-gf4 ~]# helm rollback --tls   --force internal-management-ingress 2
Error: failed to create resource: DaemonSet.apps "internal-management-ingress" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app":"internal-management-ingress", "chart":"icp-management-ingress", "component":"internal-management-ingress", "heritage":"Tiller", "k8s-app":"internal-management-ingress", "release":"internal-management-ingress"}: `selector` does not match template `labels`

真是upgrade容易rollback难呀，不知道下一步会遇到什么问题。这个错误的意思是，在两个版本的metadata中必须保持chart的selector中的label是一致的，但是明显我们旧版本的资源中是没有加selector的，欲哭无泪，真是验证了谁升级谁痛苦，更何况还需要rollback。那问题如何解决呢？
最初我们想的办法是手动改helm release中的metadata,类似helm mapkubeapiplugin做的工作，但是要自动化这个手动的操作，复杂度有些高，还需要python处理metadata数据，对于我们程序员来说怎么能容忍这种乌龟操作，浴室经过几天苦思冥想，终于想出来workaround，在升级k8s版本之前，先修整一波有问题的charts，类似先做一步小的升级，该加selector的加selector，然后再升级k8s，然后再升级chart修改chart的API的版本，又一次逢凶化吉，完美的解决了问题。总结一下升级路径：

upgrade chart
upgrade k8s 1.19.3
upgrade chart
rollback chart
rollback k8s 1.16.7
chart release 的变化
1----upgrade------>2------upgrade k8s---->2-----upgrade Charts----->3-----rollback chart(2)----->4-----rollback k8s----->4

etcd的升级

etcd是kubernetes云平台的分布式数据库，可谓是kubernetes的心脏，一旦升级etcd有问题，整个cluster都会挂掉，瘫痪，是客户所不能容忍的失误,所以etcd的升级非常重要，更何况我们还需要支持etcd的回滚（泪目。。。）。etcd社区明确表示，etcd不支持跨版本的升级，且跨版本的升级不支持zero-downtime，所以不得不在升级etcd之前先给集群做数据的backup，给集群数据做snapshot：etcdctl3 snapshot save,万幸的是这两个版本的etcd数据的格式我们的一致的都是etcd3的数据，否则还需要做数据的转换。
以下是升级etcd的步骤：
After applying fix pack 3.2.2.2006, customer maybe need to rollback to 3.2.1.2003 or 3.2.1.2006 if some errors appears.

Backup etcd data before upgrading to 3.2.2.2006.

Log on one of Master nodes as root user.
Run the following commands to export the required environment variables:
```
export image=mycluster.icp:8500/ibmcom/etcd:3.2.24.2
export endpoint=etcd_member_IP
```
Replace etcd_member_IP with the IP address of one of your etcd members.

Copy the etcdctl binary to /user/local/bin/ by entering the following commands:

mkdir tmp && chown -R etcd:etcd tmp
docker run --rm -v $(pwd)/tmp:/data $image cp /usr/local/bin/etcdctl /data
mv tmp/etcdctl /usr/local/bin/etcdctl && rm -rf tmp

Configure the etcdctl command.

alias etcdctl3="ETCDCTL_API=3 etcdctl --endpoints=https://${endpoint}:4001 --cacert=/etc/cfc/conf/etcd/ca.pem --cert=/etc/cfc/conf/etcd/client.pem --key=/etc/cfc/conf/etcd/client-key.pem"

Validate the etcd cluster status by running the following commands:

etcdctl3 --write-out=table endpoint status
etcdctl3 endpoint health

Take a snapshot of the etcd data by entering the following command. Create /data dir if needed.
```
etcdctl3 snapshot save /data/etcd.db
```
The etcd backup data now is available at /data/etcd.db on the master node.

Apply fix pack 3.2.2.2006

Please follow normal procedures for applying fix pack 3.2.2.2006.

Rollback fix pack 3.2.2.2006

Before rollback 3.2.2.2006, please backup secret icp-mongodb-metrics.

kubectl -n kube-system get secret icp-mongodb-metrics -o yaml > icp-mongodb-metrics.yaml.bak

After running rollback by applying the previous fixpack for example 3.2.1.2003, possibly that hit some error as below,

stderr: 'Error: Could not get information about the resource: no kind "Ingress" is registered for version "networking.k8s.io/v1beta1" in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"'

Then please run below steps to restore etcd.

If ansible is not available on customer's boot node, please run below command to get ansible environment inside container in the installation directory.

docker run -e LICENSE=accept --net=host --rm -it -v "$(pwd)":/installer/cluster -v /data/:/data ibmcom/icp-inception-amd64:3.2.1.2003-ee /bin/bash
Then run below command for each master IP address.
ssh -i $CLUSTER_DIR/ssh_key root@<master ip address>
Make sure jq is installed on each master node. For example, it could be installed on ubuntu with following commnd.
ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m package -a "use=apt name=jq state=present"

Stop Kubernetes on all master nodes. This stops the etcd pod and prevents Kubernetes from automatically creating new pods for the ones that you are stopping.

a. Create a directory for the backup pod by entering the following command:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -a "mkdir -p /etc/cfc/podbackup"

b. Move the backup pod into the directory:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mv /etc/cfc/pods/*.json /etc/cfc/podbackup"

c. Wait for etcd to stop on all nodes. You can check the status by entering the following command:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m wait_for -a  "port=4001 state=stopped"

d. After etcd stopped, stop the kubelet by running this command on all master nodes and Management nodes:

ansible master,management -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m service -a "name=kubelet state=stopped"

e. After the kubelet stopped, restart the Docker service to ensure that all pods that are not managed by kubelet are stopped by entering the following command:

ansible master,management -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m service -a "name=docker state=restarted"

Purge, copy and restore the etcd data.

a. Purge the current etcd data on all master Nodes by running the following commands:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mv /var/lib/etcd /var/lib/etcd.old"
ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mv /var/lib/etcd-wal /var/lib/etcd-wal.old"

b. Copy the etcd snapshot to all master nodes. Assuming that you have the /data/etcd.db file in your environment, which contains a backup of your etcd, run the following procedure to copy the file to all master nodes:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m copy -a "src=/data/etcd.db dest=/tmp/snapshot.db"

c. Restore the snapshot on all master nodes. Assuming you have cloned the Git repository, and that your current directory is icp-backup/scripts, run the following commands to run the script that restores the snapshot to all of the master nodes:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mkdir -p /var/lib/etcd && chown -R etcd:etcd /var/lib/etcd /tmp/snapshot.db"
ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m script -a "./multimaster-etcd-restore.sh"

The data is loaded into the /var/lib/etcd/restored directory on each of your master nodes, with the cluster settings configured.

d. Move the contents to the /var/lib/etcd/ and /var/lib/etcd-wal/ directories by running the following commands:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mkdir -p /var/lib/etcd-wal && chown -R etcd:etcd /var/lib/etcd-wal"
ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mv /var/lib/etcd/restored/member  /var/lib/etcd/"
ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mv /var/lib/etcd/member/wal/ /var/lib/etcd-wal/"

e. Run the following script to purge the kubelet pods directory to ensure consistency between the cached kubelet data and the etcd data:

ansible master,management -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m script -a "./purge_kubelet_pods.sh"

f. Re-enable the kubelet pod by entering the following command:

ansible master,management -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m service -a "name=kubelet state=started"

g. Re-enable the etcd pod by entering the following command:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mv /etc/cfc/podbackup/etcd.json /etc/cfc/pods"

h. Run the following command to monitor the progress of the etcd component status as it starts:

ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m wait_for -a  "port=4001 state=started"

Validate the etcd cluster health.
a. Run the following commands to configure the etcdctl tool to query the etcd cluster:

export endpoint=<etcd_member1_IP>:4001,<etcd_member2_IP>:4001,<etcd_member3_IP>:4001
alias etcdctl="ETCDCTL_API=3 etcdctl --cacert=/etc/cfc/conf/etcd/ca.pem --cert=/etc/cfc/conf/etcd/client.pem --key=/etc/cfc/conf/etcd/client-key.pem"

Change the value for <etcd_memberx_IP> to the IP address of the etcd node that you are working with.

b. Query the cluster health by entering the following command:

etcdctl --endpoints=${endpoint} endpoint health

Start the remaining IBM Cloud Private cluster pods by entering the following command:
```
ansible master -i $CLUSTER_DIR/hosts -e @$CLUSTER_DIR/config.yaml --private-key=$CLUSTER_DIR/ssh_key -m shell -a "mv /etc/cfc/podbackup/*.json /etc/cfc/pods"
```
This command enables kubelet to start the remaining core Kubernetes pods, which then start the workloads that are managed by Kubernetes.
It takes several minutes for all pods to be restarted. You can monitor the pods in the kube-system namespace by running the following command:
```
kubectl get pods --namespace=kube-system
```
Because rollback failed, it's expected behavior if there is some pods failed to start.

Because secret icp-mongodb-metrics was also restored after etcd restored. So pod icp-mongo-db will failed with error similar as below,

Warning  Unhealthy  3m19s  kubelet, 10.11.27.35  Liveness probe failed: time="2020-09-19T04:15:22Z" level=error msg="Cannot connect to server using url mongodb://****:****@localhost:27017: server returned error on SASL authentication step: Authentication failed." source="connection.go:84"

Please run below commands to fix the problem.

kubectl -n kube-system delete secret icp-mongodb-metrics
kubectl -n kube-system create -f icp-mongodb-metrics.yaml.bak

After restoring etcd and secret icp-mongodb-metrics successfully, please re-run apply-fixpack for the previous fix pack. For example rollback to 3.2.1.2003 with below command:
```
docker run -e LICENSE=accept --net=host --rm -t -v "$(pwd)":/installer/cluster ibmcom/icp-inception-amd64:3.2.1.2003-ee apply-fixpack
```

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 205,132评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 87,802评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,566评论 0赞 338
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,858评论 1赞 277
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,867评论 5赞 368
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,695评论 1赞 282
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,064评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,705评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 42,915评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,677评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,796评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,432评论 4赞 322
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,041评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,992评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,223评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,185评论 2赞 352
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,535评论 2赞 343