Ubuntu18.04+Docker+Nvidia-docker环境部署
1.安装驱动
卸载旧驱动
sudo apt-get --purge remove "*cublas*" "cuda*"
sudo apt-get --purge remove "*nvidia*"
安装驱动
sudo ubuntu-drivers autoinstall
reboot #重启
2.安装Docker
https://docs.docker.com/engine/install/ubuntu/
删除旧版docker
sudo apt-get remove docker docker-engine docker.io containerd runc
# 卸载Docker CE
sudo apt-get purge docker-ce
# 卸载Docker EE
sudo apt-get purge docker-ee
# 删除Docker镜像、容器、数据卷等文件
sudo rm -rf /var/lib/docker
1.更新系统软件
sudo apt-get update
2.安装依赖包
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
3.添加官方密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
会显示ok
4.添加仓库
sudo add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu \ lsb_release -cs stable"
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu lsb_release -cs stable"
5.再次更新
sudo apt-get update
6.安装docker-ce
#sudo apt-get install docker-ce
sudo apt-get install docker-ce docker-ce-cli containerd.io
7.设置开机自启动并启动 docker-ce
sudo systemctl enable docker
sudo systemctl start docker
8.测试
sudo docker run hello-world
9.查看版本
>sudo docker version
3.安装nvidia-docker
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
4.权限控制
创建名为docker的组,如果之前已经有该组就会报错,可以忽略这个错误:
sudo groupadd docker
将当前用户加入组docker:
sudo gpasswd -a ${USER} docker
重启docker服务(生产环境请慎用):
sudo systemctl restart docker
添加访问和执行权限:
sudo chmod a+rw /var/run/docker.sock
重新启动
sudo reboot