- 安装conda
下载地址:https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
下载完成后执行:bash Miniconda2-latest-Linux-x86_64.sh
安装地址:xlz/Miniconda
刷新一下配置文件:source /home/xlz/.bashrc #这里是管理员权限吼
创建实验环境:conda create -n stackGan python=2.7 #我们创建一个虚拟2.7环境
切换到python环境:source activate stackGan #激活这个stackGan的环境
为了不影响其他人用(公用服务器),将.bashrc中的(conda python)配置删除了,直接用绝对路径
xlz/Miniconda/bin
运行我们虚拟环境中的,这样就不会影响系统本来的python了
运行conda:
xlz/Miniconda/bin/conda
激活环境: 在
xlz/Miniconda/bin
文件夹下使用source activate
命令
不确定python环境对不对的话在python命令行下输入
import sys
print sys.executable
#查看当前运行的python
由于实验要求是tensorflow0.12
我来安装tenserflow 0.12:pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.0rc1-cp27-none-linux_x86_64.whl
在python环境下引用tensorflow出现错误
python
import tensorflow as tf
错误如下
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
Error importing tensorflow. Unless you are using bazel,
you should not try to import tensorflow from its source directory;
please exit the tensorflow source tree, and relaunch your python interpreter
from there
可以看到系统中安装了两个版本的cuda(如果你有管理权限并登陆自己的管理员账号,并且你在自己的虚拟环境下(stackGan),显示如下),第一个xlz是你登录服务器用的用户名,第二个xlz是你当前所在目录
(stackGan) xlz@服务区名:xlz$ ls /usr/local/ MATLAB bin cuda cuda-7.5 cuda-8.0 cuda_7.5.18_linux.run etc freesurfer games include lib lib64 libexec man sbin share src
而且有所需要的文件:
(stackGan) xlz@服务器名:xlz$ ls /usr/local/cuda-8.0/lib64 | grep libcudart.so.8.0
libcudart.so.8.0
libcudart.so.8.0.44
添加链接:
sudo ldconfig /usr/local/cuda-8.0/lib64
这时可以正常调用tensorflow 0.12了
(stackGan) xlz@xlz$ python
Python 2.7.14 |Anaconda, Inc.| (default, Mar 27 2018, 17:29:31)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
>>> tf.__version__
'0.12.0-rc1'
安装依赖:
pip install prettytensor progressbar python-dateutil easydict pandas torchfile
运行,出错!服了!这里切换目录到项目下
(stackGan) xlz@服务器名:xlz/StackGAN-master$ python stageI/run_exp.py --cfg stageI/cfg/birds.yml --gpu 0
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "stageI/run_exp.py", line 11, in <module>
from stageI.model import CondGAN
File "/bigData2/wn/StackGAN-master/stageI/model.py", line 4, in <module>
import prettytensor as pt
File "/bigData2/wn/Miniconda/envs/stackGan/lib/python2.7/site-packages/prettytensor/__init__.py", line 25, in <module>
from prettytensor import funcs
File "/bigData2/wn/Miniconda/envs/stackGan/lib/python2.7/site-packages/prettytensor/funcs.py", line 25, in <module>
from prettytensor.pretty_tensor_image_methods import *
File "/bigData2/wn/Miniconda/envs/stackGan/lib/python2.7/site-packages/prettytensor/pretty_tensor_image_methods.py", line 135, in <module>
class conv2d(prettytensor.VarStoreMethod):
File "/bigData2/wn/Miniconda/envs/stackGan/lib/python2.7/site-packages/prettytensor/pretty_tensor_image_methods.py", line 145, in conv2d
bias=tf.zeros_initializer(),
TypeError: zeros_initializer() takes at least 1 argument (0 given)
查资料后发现应该是tf版本的问题(所以最开始这个人到底怎么做出来的??)
参考地址
升级tensorflow到1.0.1版本(地址是瞎蒙的)
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp27-none-linux_x86_64.whl
然后使用工具将针对0.12下的.py文件转化成tf1.0下可用的
下载地址
(stackGan) xlz@服务器名:xlz/StackGAN-master$ python tf_upgrade.py --intree misc --outtree misc_1
TensorFlow 1.0 Upgrade Script
-----------------------------
Converted 8 files
Detected 0 errors that require attention
--------------------------------------------------------------------------------
Make sure to read the detailed log 'report.txt'
(stackGan) xlz@服务器名:xlz/StackGAN-master$ python tf_upgrade.py --intree stageI --outtree stageI_1
TensorFlow 1.0 Upgrade Script
-----------------------------
Converted 4 files
Detected 0 errors that require attention
--------------------------------------------------------------------------------
Make sure to read the detailed log 'report.txt'
(stackGan) xlz@服务器名:xlz/StackGAN-master$ python tf_upgrade.py --intree stageII --outtree stageII_2
TensorFlow 1.0 Upgrade Script
-----------------------------
Converted 4 files
Detected 0 errors that require attention
--------------------------------------------------------------------------------
Make sure to read the detailed log 'report.txt'
然后删除原有文件夹重命名新文件夹
(stackGan) xlz@服务器名:xlz/StackGAN-master$ rm -rf misc
(stackGan) xlz@服务器名:xlz/wn/StackGAN-master$ rm -rf stageI
(stackGan) xlz@服务器名:xlz/StackGAN-master$ rm -rf stageII
(stackGan) xlz@服务器名:xlz/StackGAN-master$ mv misc_1 misc
(stackGan) xlz@服务器名:xlz/StackGAN-master$ mv stageI_1 stageI
(stackGan) xlz@服务器名:xlz/StackGAN-master$ mv stageII_1 stageII
运行又出错,竟然开始习惯了~~
ImportError: No module named scipy.misc
安装scipy
pip install scipy
运行再次出错
ImportError: No module named yaml
安装yaml
pip install pyyaml
运行继续出错
python stageI/run_exp.py --cfg stageI/cfg/birds.yml --gpu 0 #运行代码
IOError: [Errno 2] No such file or directory: 'stageI/cfg/birds.yml'#错误提示
检查发现cfg文件夹被工具函数,就是前面那步下载地址转化为tf1.0那个工具,删除 重新将文件拷贝回去
再次运行继续出错
Traceback (most recent call last):
File "stageI/run_exp.py", line 68,
File "xlz/StackGAN-master/stageI/trainer.py", line 306, in train
counter = self.build_model(sess)
File "xlz/StackGAN-master/stageI/trainer.py", line 280, in build_model
self.init_opt()
File "xlz/StackGAN-master/stageI/trainer.py", line 101, in init_opt
fake_images = self.model.get_generator(tf.concat(axis=[c, z], values=1))
File "xlz/Miniconda/envs/stackGan/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1029, in concat
dtype=dtypes.int32).get_shape(
File "xlz/Miniconda/envs/stackGan/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 637, in convert_to_tensor
as_ref=False)
File "xlz/Miniconda/envs/stackGan/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 702, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "xlz/Miniconda/envs/stackGan/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 110, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "xlz/Miniconda/envs/stackGan/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 99, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "xlz/Miniconda/envs/stackGan/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "xlz/Miniconda/envs/stackGan/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got <prettytensor.pretty_tensor_class.Layer object at 0x7f77081fea10> of type 'Layer' instead.
修改文件
vim misc/datasets.py
中的101行,改为tf.concat([c,z],1)
运行,继续出错
Traceback (most recent call last):############################################################################################################################################################################################################################ |ETA: 0:00:00
File "stageI/run_exp.py", line 68, in <module>
algo.train()
File "xlz/StackGAN-master/stageI/trainer.py", line 379, in train
img_sum = self.epoch_sum_images(sess, cfg.TRAIN.NUM_COPY)
File "xlz/StackGAN-master/stageI/trainer.py", line 263, in epoch_sum_images
scipy.misc.imsave('%s/train.jpg' % (self.log_dir), gen_samples[0])
AttributeError: 'module' object has no attribute 'imsave'
因为没有安装Pillow
pip install Pillow
重新运行,暂时好了
截图庆祝一下!
然后再运行python stageII/run_exp.py --cfg stageII/cfg/birds.yml --gpu 0
时,又出错了!
InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./ckt_logs/birds/stageI/model_82000.ckpt: Not found: ./ckt_logs/birds/stageI
[[Node: save/RestoreV2_58 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_58/tensor_names, save/RestoreV2_58/shape_and_slices)]]
[[Node: save/RestoreV2_8/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_20_save/RestoreV2_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
再改
由于花了我整整三天的时间来训练stageII,别人要用服务器,我的就down掉了。暂时保存到model——140000.ckpt
按照说明,将140000.ckpt这个预训练好的模型放到项目的models文件夹下
cp model_140000.ckpt.data-00000-of-00001 /xlz/StackGAN-master/models
cp model_140000.ckpt.meta /xlz/StackGAN-master/models
cp model_140000.ckpt.index /bigData2/wn/StackGAN-master/models
sh demo/flowers_demo.sh
得到结果
仍会出现
TypeError: Expected int32, got <prettytensor.pretty_tensor_class.Layer object at 0x7f4ecc411ed0> of type 'Layer' instead.
还是按照之前的方法改正,换1,(c,z)的位置。