1. 查看系统显卡情况
centos7.9来实验;
1、使用GPU得有nvidia显卡,所以得先看一下本机的显卡,分别输入a命令和b命令:
a、lspci | grep -i nvidia
b、lspci | grep -i vga
如果结果显示有nvidia字样还带有型号啥的,说明你机器上有nvidia显卡,可能能够使用GPU(这里用可能二字是因为计算是nvidia显卡也不一定是能用GPU加速计算,对于TensorFlow而言,应该是算力达到一定程度才可以),若果要是输入刚才命令啥也没有显示,而显示:
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
说明机器上没有nvidia显卡,只有一个普通显卡,即便你安装cuda也没用,不能加速计算。
[root@localhost ~]# lspci | grep -i vga
00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
[root@localhost ~]# lspci | grep -i nvidia
00:0c.0 3D controller: NVIDIA Corporation Device 20b5 (rev a1)
[root@localhost ~]# nvidia-smi
bash: nvidia-smi: 未找到命令...
问题:
- 有显卡(NVIDIA Corporation Device 20b5 (rev a1)),无驱动。
解决方法:
- 00:0c.0 3D controller: NVIDIA Corporation Device 20b5 (rev a1)
我们要查找的就是:20b5
然后进入网址:PCI Devices
1.1 定位显卡型号
输入并查找,得到
1.2 下载驱动
下载显卡的官方网址
下载命令:
wget https://cn.download.nvidia.cn/tesla/470.141.03/NVIDIA-Linux-x86_64-470.141.03.run
2. 安装NVIDIA GPU驱动
安装gcc等依赖包
当安装GPU驱动时,提示缺少相关的依赖包,在此,我们需要提前安装相关的依赖包,目前需要用到的是gcc , g++ , make :
[root@localhost ~]# yum -y install gcc gcc-c++ kernel-devel make
安装下载的GPU驱动:NVIDIA-Linux-x86_64-470.141.03.run ,目前驱动版本为:470.141.03,如下执行该驱动文件,即可安装。
注:
提示权限不足,直接chmod 777 权限
在后面加上不对Xserver进行检查的命令(-no-x-check)就可以安装成功!原因 --> 主要是由于安装远程控制lightgm 导致X-server启动。
[root@localhost yu]# chmod 777 NVIDIA-Linux-x86_64-470.141.03.run
[root@localhost yu]# ./NVIDIA-Linux-x86_64-470.141.03.run -no-x-check
最后执行nvidia-smi
验证是否安装成功
[root@localhost yu]# nvidia-smi
成功安装 大功告成
3. 安装CUDA 11.1
在nvidia官网下载cuda版本11.1.1,并安装
地址为https://developer.nvidia.com/cuda-toolkit-archive。建议采用runfile(local)方式下载安装。
# Installation Instructions:
wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda_11.1.1_455.32.00_linux.run
sudo sh cuda_11.1.1_455.32.00_linux.run
如下图,Driver选项不要勾选了,前面已经安装GPU驱动了。
[root@localhost yu]# sudo sh cuda_11.1.1_455.32.00_linux.run
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.1/
Samples: Installed in /root/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.1/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 455.00 is required for CUDA 11.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
创建环境变量,编辑 ~/.bashrc
文件:
[root@localhost ~]# vim ~/.bashrc
将下面命令追加到文件最后面:
export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
为 nvcc
命令创建一个软连接到/usr/bin目录:
[root@localhost ~]# sudo ln -s /usr/local/cuda/bin/nvcc /usr/bin/nvcc
使用nvcc命令查看cuda的版本:
[root@localhost ~]# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
4. cudnnan安装
从nvidia官网下载cudnn8.0
https://developer.nvidia.com/rdp/cudnn-archive
安装:
tar -xzvf cudnn-11.1-linux-x64-v8.0.4.30.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda-11.1/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-11.1/lib64
sudo chmod a+r /usr/local/cuda-11.1/include/cudnn*.h /usr/local/cuda-11.1/lib64/libcudnn*
安装验证:
cat /usr/local/cuda-11.1/include/cudnn_version.h | grep CUDNN_MAJOR -A 2