一、安装docker
参考: NVidia-Docker2安装与常用命令 - jimchen1218 - 博客园 (cnblogs.com)
1.备份sources.list
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
2.修改sources.list
sudo gedit /etc/apt/sources.list
3.替换云镜像
如果系统版本是20.04
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
如果系统版本是22.04
deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
# stable add by , in order to install g++7
deb [arch=amd64] http://archive.ubuntu.com/ubuntu focal main universe
4.更新
sudo apt update
5.清除系统原有docker
sudo apt-get remove docker docker-engine docker.io
6.更新程序
sudo apt update
7.安装依赖
# 如果遇到software-properties-common装不上可不用安装
sudo apt install apt-transport-https ca-certificates curl software-properties-common
8.添加Docker官方密钥到系统中
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
如果执行该命令时报错:curl:(35) gnutils_handshake() failed:Error in the push function. gpg:找不到有效的OpenPGP数据
解决方法:
sudo apt-get install build-essential fakeroot dpkg-dev libcurl4-openssl-dev
9.添加Docker源
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
10.更新一下源
sudo apt update
11.查看可以安装的docker版本
apt-cache policy docker-ce
如果有列表显示,说明可以正常安装了
12.开始安装docker
sudo apt install docker-ce
13.测试
docker --version
sudo docker run hello-world
出现unable to find image 'hello-world:latest' locally说明已安装成功
二、安装nvidia-container-runtime
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install nvidia-container-runtime
sudo apt install libnvidia-container1 libnvidia-container-tools nvidia-container-toolkit
三、安装nvidia-docker2
3.1 安装nvidia-docker2
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
如果遇到zlib缺失或版本低,执行如下命令
# 安装
sudo apt-get install zlib1g-dev
# 升级
sudo apt-get upgrade zlib1g-dev
3.2 添加nvidia运行时
为 Docker 添加 nvidia 这个运行时。完成后,我们的应用就能在容器中使用显卡资源了:
sudo nvidia-ctk runtime configure --runtime=docker
3.3 重启
sudo systemctl daemon-reload
sudo systemctl restart docker
服务重启完毕,我们查看 Docker 运行时列表,能够看到 nvidia 已经生效啦。
docker info | grep Runtimes
Runtimes: nvidia runc io.containerd.runc.v2
3.4 验证nvidia-docker
nvidia-docker -v
返回结果:
Docker version 24.0.6, build ed223bc
说明 nvidia-docker 安装成功
四、下拉镜像和运行容器
4.1 拉取镜像
sudo docker pull nvidia/cudagl:11.4.0-runtime-ubuntu20.04
如果报代理错误
Error response from daemon: Get "https://registry-1.docker.io/v2/": proxyconnect tcp: dial tcp 192.168.8.12:7890: connect: no route to host
清除代理
sudo vim /etc/systemd/system/docker.service.d/http-proxy.conf
sudo systemctl daemon-reload
sudo systemctl restart docker
# 查看是否取消成功
sudo docker info | grep -i proxy
4.2 运行容器
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cudagl:11.4.0-runtime-ubuntu20.04 nvidia-smi
或者
sudo nvidia-docker run --rm --gpus all nvidia/cudagl:11.4.0-runtime-ubuntu20.04 nvidia-smi
显示如下信息则表示成功
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 30% 37C P5 32W / 320W | 7452MiB / 16376MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+