[k8s源码分析][kubelet] devicemanager 之使用device-plugin(模拟gpu)

1. 前言

转载请说明原文出处, 尊重他人劳动成果!

源码位置: https://github.com/nicktming/kubernetes/tree/tming-v1.13/pkg/kubelet/cm/devicemanager
分支: tming-v1.13 (基于v1.13版本)

device-plugin
分支: tming-v1.11(基于v1.11版本)

device manager and device plugin
1. [k8s源码分析][kubelet] devicemanager 之 pod_devices 和 checkpoint
2. [k8s源码分析][kubelet] devicemanager 之使用device-plugin(模拟gpu)
3. [k8s源码分析][kubelet] devicemanager 之 device-plugin向kubelet注册
4. [k8s源码分析][kubelet] devicemanager 之 kubelet申请资源
5. [k8s源码分析][kubelet] devicemanager 之重启kubelet和device-plugin

本文将分析device plugin是如何使用的, 然后再开始对device plugin与kubelet之间如何进行协同工作进行分析.

本文将以gpu-device-plugin为例子, 然后由于机器上没有真正的GPU, 因此将虚拟出几个GPU, 但是功能上会完全一样.

2. 例子

2.1 当前集群的状态

cluster.png

[root@master kubectl]# ./kubectl get nodes
NAME          STATUS     ROLES    AGE   VERSION
172.21.0.12   NotReady   <none>   15d   v0.0.0-master+$Format:%h$
172.21.0.16   Ready      <none>   15d   v0.0.0-master+$Format:%h$
[root@master kubectl]# 
[root@master kubectl]# ./kubectl describe node 172.21.0.12
Name:               172.21.0.12
...
Capacity:
 cpu:                2
 ephemeral-storage:  51473888Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3880944Ki
 pods:               110
Allocatable:
 cpu:                2
 ephemeral-storage:  47438335103
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3778544Ki
 pods:               110
...
[root@master kubectl]# ./kubectl describe node 172.21.0.16
Name:               172.21.0.16
...
Capacity:
 cpu:                2
 ephemeral-storage:  51473888Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             8009720Ki
 pods:               110
Allocatable:
 cpu:                2
 ephemeral-storage:  47438335103
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             7907320Ki
 pods:               110
...

这里主要关注资源(Capacity 和 Allocatable), 所以无关的的地方就滤过了.
Capacity: 代表容量
Allocatable: 可分配的各种资源
如果不理解没关系, 在分析device manager的时候会有一个更清晰的认识.

从上面的信息可以看到当前集群中的两个节点都没有任何外来的资源. 另外需要关注一个目录/var/lib/kubelet/device-plugins, 该目录很重要:

kubelet_internal_checkpoint: 保存了device manager的状态, device manager重启的时候会从该文件中加载数据.
kubelet.sock: device manger的服务端, 各种device-plugin向该服务端请求注册.

[root@master device-plugins]# pwd
/var/lib/kubelet/device-plugins
[root@master device-plugins]# ls
DEPRECATION  kubelet_internal_checkpoint  kubelet.sock
[root@master device-plugins]# cat kubelet_internal_checkpoint 
{"Data":{"PodDeviceEntries":null,"RegisteredDevices":{}},"Checksum":3467439661}
[root@master device-plugins]#

2.2 运行device-plugin

device-plugin-1.png

由于没有真正的GPU, 所以改了一下NVIDIA关于获取和监控gpu的代码. 由于其本质上是获取机器上的所有GPU的UUID 然后注册到device manager中, 因此本文就自己构造了几个GPU UUID. (效果是一样的.)

// k8s-device-plugin/nvidia.go
func getDevices() []*pluginapi.Device {
    n := uint(10)
    var devs []*pluginapi.Device
    for i := uint(0); i < n; i++ {
        devs = append(devs, &pluginapi.Device{
            ID:     fmt.Sprintf("%v-%v", resourceName, i),
            Health: pluginapi.Healthy,
        })
    }
    return devs
}
// k8s-device-plugin/main.go
newResourceName := os.Getenv("resourcename")
    if newResourceName != "" {
        resourceName = newResourceName
    }
serverSock = fmt.Sprintf("%v%v.sock", pluginapi.DevicePluginPath, resourceName)

//  k8s-device-plugin/server.go
func (m *NvidiaDevicePlugin) Allocate(ctx context.Context, reqs *pluginapi.AllocateRequest) (*pluginapi.AllocateResponse, error) {
    devs := m.devs
    name := fmt.Sprintf("NVIDIA_VISIBLE_DEVICES/%v", resourceName)
    ...
    for _, req := range reqs.ContainerRequests {
        response := pluginapi.ContainerAllocateResponse{
            Envs: map[string]string{
                name: strings.Join(req.DevicesIDs, ","),
            },
        }
        ...
}

运行

[root@master NVIDIA]# pwd
/root/go/src/github.com/NVIDIA
[root@master NVIDIA]# git clone https://github.com/nicktming/k8s-device-plugin.git
[root@master k8s-device-plugin]# go build .
[root@master k8s-device-plugin]# export resourcename=nicktming.com/gpu
[root@master k8s-device-plugin]# ./k8s-device-plugin 
2019/10/31 16:33:43 Loading NVML
2019/10/31 16:33:43 Fetching devices.
2019/10/31 16:33:43 Starting FS watcher.
2019/10/31 16:33:43 Starting OS watcher.
2019/10/31 16:33:43 Starting to serve on /var/lib/kubelet/device-plugins/gpu.sock
2019/10/31 16:33:43 Registered device plugin with Kubelet

2.3 查看节点状态

首先查看集群中该节点的资源信息

[root@master kubectl]# ./kubectl describe node 172.21.0.16
Name:               172.21.0.16
...
Capacity:
 cpu:                2
 ephemeral-storage:  51473888Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             8009720Ki
 nicktming.com/gpu:  10
 pods:               110
Allocatable:
 cpu:                2
 ephemeral-storage:  47438335103
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             7907320Ki
 nicktming.com/gpu:  10
 pods:               110
...

可以看到刚刚运行device-plugin的节点(172.21.0.16)已经向kubelet中的device manager注册了该资源nicktming.com/gpu 并且可分配的资源数为10.

2.4 申请该资源

先申请8个gpu.

[root@master kubectl]# ./kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
172.21.0.12   Ready    <none>   15d   v0.0.0-master+$Format:%h$
172.21.0.16   Ready    <none>   15d   v0.0.0-master+$Format:%h$
[root@master kubectl]# ./kubectl get pods --all-namespaces
No resources found.
[root@master kubectl]# cat deviceplugin/pod-gpu-8.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: test-gpu-8
spec:
  containers:
  - name: podtest-8
    image: nginx
    resources:
      limits:
        nicktming.com/gpu : 8
      requests:
        nicktming.com/gpu : 8
    ports:
    - containerPort: 80

[root@master kubectl]# ./kubectl apply -f deviceplugin/pod-gpu-8.yaml 
pod/test-gpu-8 created

查看状态: 可以看到成功申请了8块gpu, 毫无疑问该pod必须只能运行172.21.0.16节点上, 因为目前只有该节点有此资源nicktming.com/gpu.

当然真实情况中docker(nvidia docker)看到了环境变量NVIDIA_VISIBLE_DEVICES=具体的GPU UUID, 就会将对应的gpu投射到容器中.

[root@master kubectl]# ./kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
test-gpu-8   1/1     Running   0          50s
[root@master kubectl]# ./kubectl exec -it test-gpu-8 env | grep NVIDIA_VISIBLE_DEVICES
NVIDIA_VISIBLE_DEVICES/nicktming.com/gpu=nicktming.com/gpu-2,nicktming.com/gpu-3,nicktming.com/gpu-7,nicktming.com/gpu-6,nicktming.com/gpu-1,nicktming.com/gpu-5,nicktming.com/gpu-8,nicktming.com/gpu-9
[root@master kubectl]# 
[root@master kubectl]# ./kubectl describe pods test-gpu-8 | grep -i node
Node:               172.21.0.16/172.21.0.16
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s

查看/var/lib/kubelet/device-plugins中内容, 可用看到多了一个gpu.sock, 此处是devicemanager需要与对应的device-plugin发请求. (后面源码部分中会具体分析)

[root@master device-plugins]# pwd
/var/lib/kubelet/device-plugins
[root@master device-plugins]# ls
DEPRECATION  gpu.sock  kubelet_internal_checkpoint  kubelet.sock
[root@master device-plugins]# cat kubelet_internal_checkpoint | jq .
{
  "Data": {
    "PodDeviceEntries": [
      {
        "PodUID": "94c13838-fbba-11e9-ba9e-525400d54f7e",
        "ContainerName": "podtest-8",
        "ResourceName": "nicktming.com/gpu",
        "DeviceIDs": [
          "nicktming.com/gpu-9",
          "nicktming.com/gpu-2",
          "nicktming.com/gpu-3",
          "nicktming.com/gpu-7",
          "nicktming.com/gpu-6",
          "nicktming.com/gpu-1",
          "nicktming.com/gpu-5",
          "nicktming.com/gpu-8"
        ],
        "AllocResp": "CroBChZOVklESUFfVklTSUJMRV9ERVZJQ0VTEp8Bbmlja3RtaW5nLmNvbS9ncHUtMixuaWNrdG1pbmcuY29tL2dwdS0zLG5pY2t0bWluZy5jb20vZ3B1LTcsbmlja3RtaW5nLmNvbS9ncHUtNixuaWNrdG1pbmcuY29tL2dwdS0xLG5pY2t0bWluZy5jb20vZ3B1LTUsbmlja3RtaW5nLmNvbS9ncHUtOCxuaWNrdG1pbmcuY29tL2dwdS05"
      }
    ],
    "RegisteredDevices": {
      "nicktming.com/gpu": [
        "nicktming.com/gpu-6",
        "nicktming.com/gpu-7",
        "nicktming.com/gpu-8",
        "nicktming.com/gpu-0",
        "nicktming.com/gpu-1",
        "nicktming.com/gpu-2",
        "nicktming.com/gpu-3",
        "nicktming.com/gpu-4",
        "nicktming.com/gpu-9",
        "nicktming.com/gpu-5"
      ]
    }
  },
  "Checksum": 3602853121
}

接下来再创建一个申请3个gpu的pod, 按照常识, 该pod无法创建成功, 因为现在只剩下2个gpu, 分别是nicktming.com/gpu-4 和 nicktming.com/gpu-0.

[root@master kubectl]# cat deviceplugin/pod-gpu-3.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: test-gpu-3
spec:
  containers:
  - name: podtest-3
    image: nginx
    resources:
      limits:
        nicktming.com/gpu : 3
      requests:
        nicktming.com/gpu : 3
    ports:
    - containerPort: 80

[root@master kubectl]# ./kubectl apply -f deviceplugin/pod-gpu-3.yaml 
pod/test-gpu-3 created
[root@master kubectl]# ./kubectl get pods 
NAME         READY   STATUS    RESTARTS   AGE
test-gpu-3   0/1     Pending   0          6s
test-gpu-8   1/1     Running   0          8m20s
[root@master kubectl]# ./kubectl describe pod test-gpu-3
Name:               test-gpu-3
...
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  24s (x2 over 24s)  default-scheduler  0/2 nodes are available: 2 Insufficient nicktming.com/gpu.

可以看到该pod一直处于pending状态, 无法进行调度, 因为集群中的两个节点都无法满足该pod.

2.5 为另外一个节点添加资源

device-plugin-2.png

由于资源不够, 此时在另外一个节点172.21.0.12中添加资源, 就是运行同样资源的device-plugin.

[root@worker device-plugin]# ifconfig 
...
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.21.0.12  netmask 255.255.240.0  broadcast 172.21.15.255
       ...
[root@worker device-plugin]# pwd
/root/worker/device-plugin
[root@worker device-plugin]# export resourcename=nicktming.com/gpu
[root@worker device-plugin]# ls
k8s-device-plugin
[root@worker device-plugin]# ./k8s-device-plugin 
2019/10/31 17:00:42 Loading NVML
2019/10/31 17:00:42 Fetching devices.
2019/10/31 17:00:42 Starting FS watcher.
2019/10/31 17:00:42 Starting OS watcher.
2019/10/31 17:00:42 Starting to serve on /var/lib/kubelet/device-plugins/gpu.sock
2019/10/31 17:00:42 Registered device plugin with Kubelet

查看节点(172.21.0.12)状态, 可以看到该节点已经有了该资源(nicktming.com/gpu)

[root@master kubectl]# ./kubectl describe node 172.21.0.12
Name:               172.21.0.12
...
Capacity:
 cpu:                2
 ephemeral-storage:  51473888Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3880944Ki
 nicktming.com/gpu:  10
 pods:               110
Allocatable:
 cpu:                2
 ephemeral-storage:  47438335103
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3778544Ki
 nicktming.com/gpu:  10
 pods:               110
...

查看pod运行情况, 可以看到test-gpu-3已经运行在172.21.0.12, 关于调度部分可以参考 kube-scheduler, 因为该pod会隔一段时间拿回来调度, 此时发现已经有可用的资源, 就是被调度到某一台机器上了.

[root@master kubectl]# ./kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
test-gpu-3   1/1     Running   0          10m
test-gpu-8   1/1     Running   0          18m
[root@master kubectl]# ./kubectl exec -it test-gpu-3 env | grep NVIDIA_VISIBLE_DEVICES
NVIDIA_VISIBLE_DEVICES/nicktming.com/gpu=nicktming.com/gpu-0,nicktming.com/gpu-2,nicktming.com/gpu-3
[root@master kubectl]# ./kubectl describe pod test-gpu-3 | grep -i node
Node:               172.21.0.12/172.21.0.12
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
  Warning  FailedScheduling   3m57s (x16 over 11m)  default-scheduler     0/2 nodes are available: 2 Insufficient nicktming.com/gpu.
[root@master kubectl]#

2.6 创建另外一种资源rdma

device-plugin-3.png

[root@master k8s-device-plugin]# export resourcename=nicktming.com/rdma
[root@master k8s-device-plugin]# ./k8s-device-plugin 
2019/10/31 18:02:34 Loading NVML
2019/10/31 18:02:34 Fetching devices.
2019/10/31 18:02:34 Starting FS watcher.
2019/10/31 18:02:34 Starting OS watcher.
2019/10/31 18:02:34 Starting to serve on /var/lib/kubelet/device-plugins/rdma.sock
2019/10/31 18:02:34 Registered device plugin with Kubelet

查看状态

[root@master kubectl]# ./kubectl describe node 172.21.0.16
Name:               172.21.0.16
...
Capacity:
 cpu:                 2
 ephemeral-storage:   51473888Ki
 hugepages-1Gi:       0
 hugepages-2Mi:       0
 memory:              8009720Ki
 nicktming.com/gpu:   10
 nicktming.com/rdma:  10
 pods:                110
Allocatable:
 cpu:                 2
 ephemeral-storage:   47438335103
 hugepages-1Gi:       0
 hugepages-2Mi:       0
 memory:              7907320Ki
 nicktming.com/gpu:   10
 nicktming.com/rdma:  10
 pods:                110
...

此时申请2个gpu和10个rdma设备.

[root@master kubectl]# ./kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
test-gpu-3   1/1     Running   0          82m
test-gpu-8   1/1     Running   0          90m
[root@master kubectl]# cat deviceplugin/pod-gpu2-rdma10.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: test-gpu2-rdma10
spec:
  containers:
  - name: testpod-gpu2-rdma10
    image: nginx
    resources:
      limits:
        nicktming.com/gpu : 2
        nicktming.com/rdma : 10
      requests:
        nicktming.com/gpu : 2
        nicktming.com/rdma : 10
    ports:
    - containerPort: 80

[root@master kubectl]# ./kubectl apply -f deviceplugin/pod-gpu2-rdma10.yaml 
pod/test-gpu2-rdma10 created
[root@master kubectl]# ./kubectl get pods
NAME               READY   STATUS    RESTARTS   AGE
test-gpu-3         1/1     Running   0          82m
test-gpu-8         1/1     Running   0          91m
test-gpu2-rdma10   1/1     Running   0          6s
[root@master kubectl]# ./kubectl exec -it test-gpu2-rdma10 env | grep NVIDIA_VISIBLE_DEVICES
NVIDIA_VISIBLE_DEVICES/nicktming.com/gpu=nicktming.com/gpu-0,nicktming.com/gpu-4
NVIDIA_VISIBLE_DEVICES/nicktming.com/rdma=nicktming.com/rdma-4,nicktming.com/rdma-2,nicktming.com/rdma-7,nicktming.com/rdma-3,nicktming.com/rdma-9,nicktming.com/rdma-5,nicktming.com/rdma-0,nicktming.com/rdma-1,nicktming.com/rdma-6,nicktming.com/rdma-8
[root@master kubectl]#

查看kubelet_internal_checkpoint

[root@master device-plugins]# pwd
/var/lib/kubelet/device-plugins
[root@master device-plugins]# ls
DEPRECATION  gpu.sock  kubelet_internal_checkpoint  kubelet.sock  rdma.sock
[root@master device-plugins]# 
[root@master device-plugins]# cat kubelet_internal_checkpoint | jq .
{
  "Data": {
    "PodDeviceEntries": [
      {
        "PodUID": "94c13838-fbba-11e9-ba9e-525400d54f7e",
        "ContainerName": "podtest-8",
        "ResourceName": "nicktming.com/gpu",
        "DeviceIDs": [
          "nicktming.com/gpu-8",
          "nicktming.com/gpu-9",
          "nicktming.com/gpu-2",
          "nicktming.com/gpu-3",
          "nicktming.com/gpu-7",
          "nicktming.com/gpu-6",
          "nicktming.com/gpu-1",
          "nicktming.com/gpu-5"
        ],
        "AllocResp": "CroBChZOVklESUFfVklTSUJMRV9ERVZJQ0VTEp8Bbmlja3RtaW5nLmNvbS9ncHUtMixuaWNrdG1pbmcuY29tL2dwdS0zLG5pY2t0bWluZy5jb20vZ3B1LTcsbmlja3RtaW5nLmNvbS9ncHUtNixuaWNrdG1pbmcuY29tL2dwdS0xLG5pY2t0bWluZy5jb20vZ3B1LTUsbmlja3RtaW5nLmNvbS9ncHUtOCxuaWNrdG1pbmcuY29tL2dwdS05"
      },
      {
        "PodUID": "4d589c87-fbc7-11e9-ba9e-525400d54f7e",
        "ContainerName": "testpod-gpu2-rdma10",
        "ResourceName": "nicktming.com/rdma",
        "DeviceIDs": [
          "nicktming.com/rdma-9",
          "nicktming.com/rdma-5",
          "nicktming.com/rdma-0",
          "nicktming.com/rdma-1",
          "nicktming.com/rdma-6",
          "nicktming.com/rdma-8",
          "nicktming.com/rdma-3",
          "nicktming.com/rdma-2",
          "nicktming.com/rdma-7",
          "nicktming.com/rdma-4"
        ],
        "AllocResp": "Cv8BCilOVklESUFfVklTSUJMRV9ERVZJQ0VTL25pY2t0bWluZy5jb20vcmRtYRLRAW5pY2t0bWluZy5jb20vcmRtYS00LG5pY2t0bWluZy5jb20vcmRtYS0yLG5pY2t0bWluZy5jb20vcmRtYS03LG5pY2t0bWluZy5jb20vcmRtYS0zLG5pY2t0bWluZy5jb20vcmRtYS05LG5pY2t0bWluZy5jb20vcmRtYS01LG5pY2t0bWluZy5jb20vcmRtYS0wLG5pY2t0bWluZy5jb20vcmRtYS0xLG5pY2t0bWluZy5jb20vcmRtYS02LG5pY2t0bWluZy5jb20vcmRtYS04"
      },
      {
        "PodUID": "4d589c87-fbc7-11e9-ba9e-525400d54f7e",
        "ContainerName": "testpod-gpu2-rdma10",
        "ResourceName": "nicktming.com/gpu",
        "DeviceIDs": [
          "nicktming.com/gpu-0",
          "nicktming.com/gpu-4"
        ],
        "AllocResp": "ClMKKE5WSURJQV9WSVNJQkxFX0RFVklDRVMvbmlja3RtaW5nLmNvbS9ncHUSJ25pY2t0bWluZy5jb20vZ3B1LTAsbmlja3RtaW5nLmNvbS9ncHUtNA=="
      }
    ],
    "RegisteredDevices": {
      "nicktming.com/gpu": [
        "nicktming.com/gpu-0",
        "nicktming.com/gpu-4",
        "nicktming.com/gpu-9",
        "nicktming.com/gpu-7",
        "nicktming.com/gpu-8",
        "nicktming.com/gpu-1",
        "nicktming.com/gpu-2",
        "nicktming.com/gpu-3",
        "nicktming.com/gpu-5",
        "nicktming.com/gpu-6"
      ],
      "nicktming.com/rdma": [
        "nicktming.com/rdma-0",
        "nicktming.com/rdma-1",
        "nicktming.com/rdma-2",
        "nicktming.com/rdma-7",
        "nicktming.com/rdma-8",
        "nicktming.com/rdma-9",
        "nicktming.com/rdma-3",
        "nicktming.com/rdma-4",
        "nicktming.com/rdma-5",
        "nicktming.com/rdma-6"
      ]
    }
  },
  "Checksum": 3285376913
}

3. 总结

相信到这里对device-plugin如何使用就比较明朗了, 但是里面究竟发生了什么, 会在后续源码部分进行分析, 这里的例子也是为源码分析做准备.
接下来会从两个部分来分析device-plugin与device manager的工作机制.
1. device-plugin向device manager注册资源的过程.
2. pod申请资源的过程.

最后编辑于：2019.11.02 11:57:07

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,802评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,109评论 2赞 379
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,683评论 0赞 335
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,458评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,452评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,505评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,901评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,550评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,763评论 1赞 296
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,556评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,629评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,330评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,898评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,897评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,140评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,807评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,339评论 2赞 342

[k8s源码分析][kubelet] devicemanager 之 使用device-plugin(模拟gpu)

1. 前言

2. 例子

2.1 当前集群的状态

2.2 运行device-plugin

2.3 查看节点状态

2.4 申请该资源

2.5 为另外一个节点添加资源

2.6 创建另外一种资源rdma

3. 总结

推荐阅读更多精彩内容

[k8s源码分析][kubelet] devicemanager 之使用device-plugin(模拟gpu)