kubernetes cloud-provider for aliyun

概述

CloudProvider 提供kubernetes与云厂商基础服务的对接能力,由 cloud-controller-manager组件实现(lb,云盘,安全组等等)。

aliyun-cloud-provider这个组件主要是aliyun平台的对接插件,可以让用户在创建k8s LoadBalancer 类型的service的时候自动的为用户创建一个阿里云SLB,同时动态的绑定与解绑SLB后端,并且提供了丰富的配置允许用户自定义生成的LoadBalancer.

其他cloud-provider比如aws可提供elb,ebs,安全组等能力,阿里云这里实现了service_controller和route_controller,所以暂时只提供了对接slb资源,其他资源需要单独插件在cloud-provider上层实现,比如cluster-autoscaler需依赖cloud-provider完成node初始化.

所以cloud-provider的能力仅仅对接云厂商基础服务给kubernetes集群使用,我们使用aliyun-cloud-provider 主要也是对接aliyun ess资源,使用cluster-autoscaler动态/定时伸缩kubernetes node节点。

创建aliyun-cloud-provider的正确姿势

前置条件

  • kubernetes version > 1.7.2
  • CloudNetwork: 仅支持阿里云vpc网络。默认开启,可手动关闭。
    --configure-cloud-routes=false
    
  • kube apiserver和kube controller manager不能指定--cloud provider标志。cloud-provider已经单独抽离为running-cloud-provider组件,以后版本此标志将被弃用并删除。
  • kubelet必须指定--cloud-provider=external,这是确保kubelet 节点在正常能被调度之前必须由clour-provider-controller初始化操作完成。

集群环境:

  • kubernetes version: v1.12.3
  • network plugin: flannel vxlan

kubernetes集群配置

kubele 启动参数添加--cloud-provider=external --hostname override=region.instance id--provider id=region.instance id参数并重启kubelet。格式为Region_ID.Instance。

KUBELET_CLOUD_PROVIDER_ARGS="--cloud-provider=external --hostname-override=${REGION_ID}.${INSTANCE_ID} --provider-id=${REGION_ID}.${INSTANCE_ID}"

以下命令可找到REGION_ID 和INSTANCE_ID

$ META_EP=http://100.100.100.200/latest/meta-data
$ echo `curl -s $META_EP/region-id`.`curl -s $META_EP/instance-id`

然后重启kubelet kubectl get node 查看hostname是否生效,也可以delete node,重新注册到集群。

部署aliyun-cloud-provider

cloud-provider需要一定的权限才能访问阿里云,需要为ECS实例创建一些RAM策略,或者直接使用accesskeyid&secret,由于我们容器部署cloud-provider所以这里采用AK。

1.配置aliyun-cloud-provider AK(access key) 权限
创建自定义策略,然后将策略绑定到k8s-cloud-provider用户,创建其AK提供给aliyun-cloud-provider使用,保证只能访问我们授权的资源。或者参考地址:https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/examples/master.policy

{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "ecs:Describe*",
        "ecs:AttachDisk",
        "ecs:CreateDisk",
        "ecs:CreateSnapshot",
        "ecs:CreateRouteEntry",
        "ecs:DeleteDisk",
        "ecs:DeleteSnapshot",
        "ecs:DeleteRouteEntry",
        "ecs:DetachDisk",
        "ecs:ModifyAutoSnapshotPolicyEx",
        "ecs:ModifyDiskAttribute",
        "ecs:CreateNetworkInterface",
        "ecs:DescribeNetworkInterfaces",
        "ecs:AttachNetworkInterface",
        "ecs:DetachNetworkInterface",
        "ecs:DeleteNetworkInterface",
        "ecs:DescribeInstanceAttribute"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "cr:Get*",
        "cr:List*",
        "cr:PullRepository"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "slb:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "cms:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "vpc:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "log:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "nas:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    }
  ]
}

创建cloud-config configmap 为cloud-provider提供配置,主要存放AK信息

---
apiVersion: v1
data:
  # 这里填写k8s-cloud-provider用户的AK
  special.keyid: xxxxxx
  special.keysecret: xxxxx
kind: ConfigMap
metadata:
  name: cloud-config
  namespace: kube-system
  1. ServiceAccount system:cloud-controller-manager
    在启用RBAC的情况下,cloudprovider使用system:cloud controller manager服务帐户授权访问kubernetes集群。所以:
  • 必须创建某些RBAC角色和绑定。有关详细信息,可以参考cloud-controller-manager.yml

  • 需要提供kubeconfig文件给cloud-provider,kubeconfig文件保存到/etc/kubernetes/cloud-controller-manager.conf

替换$CA_DATA 为cat /etc/kubernetes/pki/ca.crt|base64 -w 0输出结果或者kubectl config view --flatten=true可查看CA data ,以及将apiserver替换自己kubernetes apiserver的服务器地址. 我们通过configmap 挂载到cloud-provider容器中.

# RBAC
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
rules:
  - apiGroups:
      - ""
    resources:
      - persistentvolumes
      - services
      - secrets
      - endpoints
      - serviceaccounts
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
      - delete
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - services/status
    verbs:
      - update
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - events
      - endpoints
    verbs:
      - create
      - patch
      - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-controller-manager
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:shared-informers
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: shared-informers
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-node-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-node-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:pvl-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: pvl-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:route-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: route-controller
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloud-controller-manager
  namespace: kube-system
 
 
#
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cloud-controller-manager
  namespace: kube-system
data:
  cloud-controller-manager.conf: |-
    kind: Config
    contexts:
    - context:
        cluster: kubernetes
        user: system:cloud-controller-manager
      name: system:cloud-controller-manager@kubernetes
    current-context: system:cloud-controller-manager@kubernetes
    users:
    - name: system:cloud-controller-manager
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: $CA_DATA
        server: https://10.100.254.78:6443
      name: kubernetes
  1. 通过DaemonSet部署aliyun-cloud-provider
    完整的yaml如下:
# cloud-controller-manager.yml
---
apiVersion: v1
data:
  special.keyid: xxxxxxxx
  special.keysecret: xxxxxxxxx
kind: ConfigMap
metadata:
  name: cloud-config
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cloud-controller-manager
  namespace: kube-system
data:
  cloud-controller-manager.conf: |-
    kind: Config
    contexts:
    - context:
        cluster: kubernetes
        user: system:cloud-controller-manager
      name: system:cloud-controller-manager@kubernetes
    current-context: system:cloud-controller-manager@kubernetes
    users:
    - name: system:cloud-controller-manager
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: ${CA_DATA}
        server: https://${APISERVER_ADDRESS}
      name: kubernetes
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
rules:
  - apiGroups:
      - ""
    resources:
      - persistentvolumes
      - services
      - secrets
      - endpoints
      - serviceaccounts
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
      - delete
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - services/status
    verbs:
      - update
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - events
      - endpoints
    verbs:
      - create
      - patch
      - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-controller-manager
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:shared-informers
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: shared-informers
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-node-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-node-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:pvl-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: pvl-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:route-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: route-controller
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloud-controller-manager
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: cloud-controller-manager
    tier: control-plane
  name: cloud-controller-manager
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cloud-controller-manager
      tier: control-plane
  template:
    metadata:
      labels:
        app: cloud-controller-manager
        tier: control-plane
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccountName: cloud-controller-manager
      tolerations:
      - effect: NoSchedule
        operator: Exists
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        operator: Exists
        key: node.cloudprovider.kubernetes.io/uninitialized
      nodeSelector:
         node-role.kubernetes.io/master: "true"
      containers:
      - command:
        -  /cloud-controller-manager
        - --kubeconfig=/etc/kubernetes/cloud-controller-manager.conf
        - --address=127.0.0.1
        - --allow-untagged-cloud=true
        - --leader-elect=true
        - --cloud-provider=alicloud
        - --cluster-name="${cluster_name}"
        # network plugin 或者cloudNetwork 可考虑打开
        - --allocate-node-cidrs=false
        - --cluster-cidr=172.30.0.0/16
        # 非cloudNetwork必须禁止,否则会配置阿里云路由表
        - --configure-cloud-routes=false
        - --use-service-account-credentials=true
        - --route-reconciliation-period=30s
        - --v=5
        image: registry.cn-hangzhou.aliyuncs.com/acs/cloud-controller-manager-amd64:v1.9.3.112-g93c7140-aliyun
        env:
        - name: ACCESS_KEY_ID
          valueFrom:
            configMapKeyRef:
              name: cloud-config
              key: special.keyid
        - name: ACCESS_KEY_SECRET
          valueFrom:
            configMapKeyRef:
              name: cloud-config
              key: special.keysecret
        livenessProbe:
          failureThreshold: 8
          httpGet:
            host: 127.0.0.1
            path: /healthz
            port: 10252
            scheme: HTTP
          initialDelaySeconds: 15
          timeoutSeconds: 15
        name: cloud-controller-manager
        resources:
          requests:
            cpu: 200m
        volumeMounts:
        - mountPath: /etc/kubernetes/
          name: k8s
          readOnly: true
        - mountPath: /etc/ssl/certs
          name: certs
        - mountPath: /etc/pki
          name: pki
      hostNetwork: true
      volumes:
      #- hostPath:
      #    path: /etc/kubernetes
      - configMap:
           name: cloud-controller-manager
        name: k8s
      - hostPath:
          path: /etc/ssl/certs
        name: certs
      - hostPath:
          path: /etc/pki
        name: pk
  1. 启动aliyun-cloud-provider
$kubectl apply -f cloud-controller-manager.yml

cloud-provider 创建SLB测试

一旦云控制器管理器启动并运行,运行一个示例nginx部署:

$ cat <<EOF >nginx.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-example
spec:
  replicas: 1
  revisionHistoryLimit: 2
  template:
    metadata:
      labels:
        app: nginx-example
    spec:
      containers:
      - image: nginx:latest
        name: nginx
        ports:
          - containerPort: 80
EOF

然后使用以下类型创建服务:LoadBalancer:

$ kubectl expose deployment nginx-example --name=nginx-example --type=LoadBalancer --port=80
$ kubectl get svc
NAME            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
nginx-example   LoadBalancer   10.254.13.75   10.100.255.76   80:11435/TCP   25h

其他SLB相关创建annotation参考:https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/zh/usage.md

cloud-provider注意事项:

  • --configure-cloud-routes=true,代码逻辑强制刷新路由表,需注意pod cidr是否会冲突,避免被刷新删除影响线上业务。
  • 默认逻辑:创建slb逻辑会随机指定可用区,annotation指定可用区范围有限制。
  • slb 所在交换机与所在可用区不一致。比如kubernetes-g对应beijing-g区,但是slb所在区却在beijing-a,备份可用区在Beijing-e,不方便管理.
  • aliyun-cloud-provider只实现了route_controller和service_controler只支持对接slb资源,有状态存储,cluster-autoscaler分别是单独的provider.

通过cloud-provider动态管理SLB最佳实践:

  1. 通过annotation指定SLB实例规格类型
  2. 通过annotation指定SLB backend服务器实例,通过node实例标签过滤
  3. 指定SLB 主备可用区,建议跟node主备在同个可用区(需提前在控制台测试哪些AZ支持主备)

FAQ:

  1. node出现node.kubernetes.io/network-unavailable 污点
    按照社区github部署完cloud-provider之后,新建的pod全部pending
    排查发现每个node 都打上了node.kubernetes.io/network-unavailable的污点,是由于--configure-cloud-routes=true默认true导致要在aliyun创建路由表,由于我们使用的flannel-vxlan所以禁用重启cloud-provider.然后删除每个node的condition或者delete node 重新注册到集群即可
  2. node出现node.cloudprovider.kubernetes.io/uninitialized污点
    cloud-provider组件出现问题没有正常运行,导致node不能被cloud-provider初始化,防止pod被调度到此节点。参见https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/

参考链接:

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,098评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,213评论 2 380
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,960评论 0 336
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,519评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,512评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,533评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,914评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,574评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,804评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,563评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,644评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,350评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,933评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,908评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,146评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,847评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,361评论 2 342

推荐阅读更多精彩内容