- 安装anaconda,建议下载anaconda3 4.2版本,默认python 3.5版本,去清华镜像下载,速度快
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-4.2.0-Linux-x86_64.sh
- 安装anaconda3,执行命令:
- 安装命令
bash Anaconda3-4.2.0-Linux-x86_64.sh
- 根据提示,输入enter,根据提示输入yes 同意license agreement
- 指定安装路径,可以直接输入enter使用默认安装路径,可以输入自定义路径
/work/anaconda3
然后按enter
- 根据提示输入yes,安装结束
- 此时Anaconda并未安装完成,若在终端输入python将会发现依然是Centos自带的python版本,这是因为.bashrc的更新还没有生效,执行
source ~/.bashrc
命令使其生效即可
- 安装命令
- 验证python版本
- 执行命令:
python
便可看到python及anaconda版本信息Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:53:06) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
- 验证python执行OK
>>> print("hello world!") hello world!
- 执行命令:
- 安装显卡驱动
-
检查是否电脑配置有Nvidia显卡
$ /usr/sbin/lspci | grep -i nvidia 执行结果: 3b:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1) d8:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
-
屏蔽默认带有的nouveau
打开/lib/modprobe.d/dist-blacklist.conf
将nvidiafb
注释掉。
#blacklist nvidiafb
,然后添加以下语句:blacklist nouveau options nouveau modeset=0
-
重建initramfs image步骤
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak dracut /boot/initramfs-$(uname -r).img $(uname -r)
修改运行级别为文本模式
systemctl set-default multi-user.target
查看nouveau是否已经禁用
ls mod | grep nouveau
如果没有显示相关内容,说明禁用成功修改运行级别回图形模式
systemctl set-default graphical.target
-
安装nvidia-detect命令,从ELRepo源安装
添加源:- centos-7
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
- CentOS-6
rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
- CentOS-5
rpm -Uvh http://www.elrepo.org/elrepo-release-5-5.el5.elrepo.noarch.rpm
安装:
yum install nvidia-detect
- centos-7
-
检查显卡驱动信息:
nvidia-detect -v
Probing for supported NVIDIA devices... [10de:1b38] NVIDIA Corporation GP102GL [Tesla P40] This device requires the current 390.48 NVIDIA driver kmod-nvidia [10de:1b38] NVIDIA Corporation GP102GL [Tesla P40] This device requires the current 390.48 NVIDIA driver kmod-nvidia [102b:0536] Matrox Electronics Systems Ltd. Device 0536 WARNING: Xorg log file /var/log/Xorg.0.log does not exist WARNING: Unable to determine Xorg ABI compatibility WARNING: The driver for this device does not support the current Xorg version
390.48
为需安装的显卡版本号,也可以去英伟达官网,下载驱动安装,因为这个版本去yum源里面没有搜索到,我直接去英伟达官网下载cuda相应的驱动
安装地址
tensorflow 1.7版本支持cuda 9.0,因此下载对应cuda9.0的驱动版本
驱动链接
yum install -y "kernel-devel-uname-r == $(uname -r)"
-
yum install gcc gcc-c++
安装gcc、g++编译器 - 安装驱动:
sh NVIDIA-Linux-x86_64-384.125.run
- 驱动安装成功后使用
nvidia-smi
命令查看显卡信息
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46 Driver Version: 390.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:3B:00.0 Off | 0 |
| N/A 29C P0 50W / 250W | 0MiB / 22919MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P40 Off | 00000000:D8:00.0 Off | 0 |
| N/A 32C P0 50W / 250W | 0MiB / 22919MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
- 重新启动, 使用root用户登陆
reboot
5 .cuda安装 tensorflow1.7支持cuda 9.0,因此需要下载对应的版本
- 安装地址
- 安装过程:
rpm -i cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64-rpm yum clean all yum install cuda
6 .cudnn安装,注册,下载对应版本的cudnn
- 解压cudnn文件,并将cudnn文件复制到cuda目录
tar -zxvf cudnn-9.0-linux-x64-v7.1.solitairetheme8 sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d sudo chmod a+r /usr/local/cuda/include/cudnn.h sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
- 安装tensorflow-gpu
- 前面已经安装anaconda3,直接使用pip命令安装tensorflow
pip install tensorflow-gpu #默认安装最新版本tensorflow-gpu版本
- 验证tensorflow-gpu安装是否成功
# python
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, Tensorflow')
>>> sess = tf.Session()
2018-04-09 09:58:07.326972: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2018-04-09 09:58:09.165200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:3b:00.0
totalMemory: 22.38GiB freeMemory: 22.21GiB
2018-04-09 09:58:09.383838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 1 with properties:
name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:d8:00.0
totalMemory: 22.38GiB freeMemory: 22.21GiB
2018-04-09 09:58:09.387220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0, 1
2018-04-09 09:58:10.093199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-09 09:58:10.093276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 1
2018-04-09 09:58:10.093289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N Y
2018-04-09 09:58:10.093300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 1: Y N
2018-04-09 09:58:10.094592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21559 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:3b:00.0, compute capability: 6.1)
2018-04-09 09:58:10.352224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 21559 MB memory) -> physical GPU (device: 1, name: Tesla P40, pci bus id: 0000:d8:00.0, compute capability: 6.1)
>>> print(sess.run(hello))
b'Hello, Tensorflow'
- tensorflow-gpu版本安装成功,终于完成
- 异常
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
- cuda版本不对,tensorflow1.7支持cuda8.0~9.0版本,重新安装cuda即可
libcudnn.so.7: cannot open shared object file: No such file or directory
- cuda的路径可能设置错了
sudo ldconfig /usr/local/cuda/lib64
Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for
example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line
option.
- kernel-devel 版本不对,使用此命令安装
yum install -y "kernel-devel-uname-r == $(uname -r)"
参考文章:
https://blog.csdn.net/Oh_My_Fish/article/details/78861867
https://www.cnblogs.com/kluan/p/4823152.html
<hr />