Pytorch采坑记~~持续更新中......

1.nn.Conv2D()输入参数数据格式不对

报错：TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:
完整报错：

  File "G:/python/project/model/A2net.py", line 36, in <module>
    model = A2Block(64)
  File "G:/python/project/model/A2net.py", line 15, in __init__
    self.dimension_reduction = nn.Conv2d(in_channels=inplanes, out_channels=inplanes/2, kernel_size=1, stride=1)
  File "C:\Users\MSY\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 297, in __init__
    False, _pair(0), groups, bias)
  File "C:\Users\MSY\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 33, in __init__
    out_channels, in_channels // groups, *kernel_size))
TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:
 * (torch.device device)
 * (torch.Storage storage)
 * (Tensor other)
 * (tuple of ints size, torch.device device)
 * (object data, torch.device device)

问题定位：定位到报错行为：

self.dimension_reduction = nn.Conv2d(in_channels=inplanes, out_channels=inplanes/2, kernel_size=1, stride=1)

问题分析: 根据报错信息，是说本行代码包含有float的数据类型，通过分析可以看到，只有inplanes/2可能是float类型，由此想到在python3中n/2是带有小数点的，应该为n//2为整数。(由于一个粗心，报错一个如此尴尬的bug)
问题解决：将输出通道数inplanes/2改为inplanes//2完美解决。

2.make.sh 编译NMS遇到问题

报错:OSError: The CUDA lib64 path could not be located in /usr/lib64
完整报错：

Traceback (most recent call last):
  File "build.py", line 59, in <module>
    CUDA = locate_cuda()
  File "build.py", line 54, in locate_cuda
    raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
OSError: The CUDA lib64 path could not be located in /usr/lib64

问题定位：打开build.py（某些项目为setup.py）找到

cudaconfig = {'home': home, 'nvcc': nvcc,
                  'include': pjoin(home, 'include'),
                  'lib64': pjoin(home, 'lib64')}

问题分析：lib引用的问题
问题解决：将home, 'lib64'中的lib64改为lib完美解决

3.one of the variables needed for gradient computation has been modified by an inplace operation

报错：one of the variables needed for gradient computation has been modified by an inplace operation
完整报错：

Traceback (most recent call last):
  File "train_test.py", line 454, in <module>
    train()
  File "train_test.py", line 327, in train
    loss.backward()
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

问题定位：这个bug报错并没有报明显的错误位置是最难过的，wwwwww~~~~~
问题分析：此问题是在测试运行网上关于一版Pytorch版本的SSD代码时，出现的，根据网上的错误解释，应该时Pytorch0.4版本和0.3版本的某些不一致造成的。该问题的常用解决方法时：
1：如果使用的是pytorch0.4.0版本，回退到pytorch0.3.0版本
2：如果有inreplace参数，设为False
3：报错原因是pytorch0.4.0版本后tensor不支持inplace操作了，所以把所有inplace错作去掉。
后在博客modified by an inplace operation中似乎找到了合适的答案.简单来说：x += 1 这种改成 x = x+1 原因：x+=1的值会直接在原值上面做更新，是inplace=True的情况，而后一种是先让x+1然后赋值给x,属于inplace=False
但是由于自己的代码较多，开始很难具体定位到哪个错误的位置，后来使用Beyond Compare(一款很棒的软件，强推~~~)与网上一版正确的代码比较，发现了错误。

x /= norm  #（原本的错误代码）

In-place的具体解释可以参考。pytorch 学习笔记（二十二）：关于 inplace operation
问题解决: 将x /= norm #改为x = x / norm
后记：后来偶然发现，Pycharm原来有全局搜索的功能，上述也说大致的问题由于 /= 操作符产生，但是代码过多，无法有效的找到 /=代码所在emmmmm，可以使用全局搜索Pycharm中按快捷键Ctrl + Shift + F或从从菜单Edit-》Find-》Find in Path进入全局查找界面，输入 /= 即可找到大致所在，VS code也可，自行查找即可。（白白浪费了那么多自己查找的时间，哇的一声~~~，对Pycharm还有待探索）

Pycharm全局搜索

补充：后来运行另外一个代码的时候，发现报了相同的错误，后来找到的错误为：

x.unsqueeze_(1) 改为：x = x.unsqueeze(1)

4.Fan in and fan out can not be computed for tensor with less than 2 dimensions

报错： Fan in and fan out can not be computed for tensor with less than 2 dimensions
完整报错：

 File "train_test_RFB.py", line 143, in <module>
    net.extras.apply(weights_init)
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 240, in apply
    module.apply(fn)
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 240, in apply
    module.apply(fn)
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 241, in apply
    fn(self)
  File "train_test_RFB.py", line 134, in weights_init
    init.kaiming_normal_(m.state_dict()[key], mode='fan_out')
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/nn/init.py", line 323, in kaiming_normal_
    fan = _calculate_correct_fan(tensor, mode)
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/nn/init.py", line 257, in _calculate_correct_fan
    fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
  File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/nn/init.py", line 181, in _calculate_fan_in_and_fan_out
 raise ValueError("Fan in and fan out can not be computed for tensor with less than 2 dimensions")
ValueError: Fan in and fan out can not be computed for tensor with less than 2 dimensions

问题定位:

init.kaiming_normal_(m.state_dict()[key], mode='fan_out')

问题分析:根据报错信息，可以知道，再使用init.kaiming_normal_()进行初始化的时候，只能初始化不小于2的维度的tensor，经过分析得到，在常见的使用

 if 'conv' in key:
                    init.kaiming_normal_(m.state_dict()[key], mode='fan_out')

进行初始化的过程中，问题出在定义的conv层，回想自己的网络，在conv中的定义

self.conv = nn.Sequential(nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False),
                                   nn.BatchNorm2d(inter_channels),
                                   nn.ReLU())

其中包含了BatchNorm层，而在 BatchNorm layer维度1 , 小于2。'Fan in and fan out can not be computed for tensor with less than 2 dimensions'
问题解决: 将复合的conv拆开写，或者改写初始化

5.libpng error: Read Error

报错：opencv2 报错 libpng error: Read Error
完整报错：

libpng error: Read Error
Traceback (most recent call last):
  File "main.py", line 100, in <module>
    main(config)
  File "main.py", line 43, in main
    train.train()
  File "/home/msy/project/PoolNet-master/solver.py", line 84, in train
    for i, data_batch in enumerate(self.train_loader):
  File "/home/msy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/home/msy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
  File "/home/msy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/msy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
 File "/home/msy/project/PoolNet-master/dataset/dataset.py", line 27, in __getitem__
    sal_image = load_image(os.path.join(self.sal_root, im_name))
  File "/home/msy/project/PoolNet-master/dataset/dataset.py", line 77, in load_image
    if len(im.shape) != 3 or im.shape[2] != 3:
AttributeError: 'NoneType' object has no attribute 'shape'

问题定位：

im = cv2.imread(name)
len(im.shape) != 3 or im.shape[2] != 3:

问题分析：图片格式的问题，有些图片比如说原本是jpg的格式，你后缀写成了.png或者其他类似的操作，就可能会出现这个问题（还是不完全理解，欢迎补充解答）。
问题解决：

import cv2
import numpy as np
from PIL import Image
from PIL import ImageFile
import imghdr

ImageFile.LOAD_TRUNCATED_IMAGES = True
if imghdr.what(name) == "png":
    Image.open(name).convert("RGB").save(name)
img = np.array(Image.open(name))

参考：
https://blog.csdn.net/andylei777/article/details/78095411
http://www.itdaan.com/blog/2016/11/22/d480f443ca62e56ddc47a7bed7cc85fd.html

6.TypeError: cannot assign 'torch.cuda.FloatTensor' as parameter 'edges' (torch.nn.Parameter or None expected)

报错：

TypeError: cannot assign 'torch.cuda.FloatTensor' as parameter 'edges' (torch.nn.Parameter or None expected)