由于业务需求,需要把网络模型转成C++可以调用的格式。于是踩了很多坑,这里记录下。具体的网络模型为基本的端到端网络模型,网络中没有特殊的自己定义的操作,基本都是pytorch的常规操作。
1,网络导出问题
调用了几种主流的导出模型方案,主要有:1,torch.jit.trace 方法。2,torch.jit.ScriptModule方法。3,pytorch转ONNX方法。
由于模型较为复杂,方法2修改东西较多,没有转成功。方法3需要配置环境较多,且可能转为caffe版本,也暂时没有尝试。最后确定使用方法1。方法1的先决条件为,网络中的变量不能随着输入的改变而改变,需要保证网络可以生成一张图。注意这里的输入为网络的输入,而不是构建网络时需要的超参数。
具体是使用方法可以参考:
https://pytorch.org/tutorials/advanced/cpp_export.html
示例代码:
import torch
import torchvision
# An instance of your model.
model = torchvision.models.resnet18()
# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)
# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
遇到问题:
我在直接使用这个命令导出模型时,会报错
torch.jit.trace assert(isinstance(orig, torch.nn.Module)),一直以为找不到模型,但是模型本身的输出是完全没有问题。于是我在jit.trace的源码中定位到问题代码,逐步输出模型参数,发现源码是把每一个网络模块转为ScriptModule,且是按照网络流动顺序转移。但是到了最后会输入None的变量。于是我查看网络模型代码,发现以下代码:
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(
self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False
),
nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
这里的downsample定义基础定义为None,但是在一定参数下可以为nn.Moudle类型。于是我怀疑这里的None的参数导致了模型导入不正确,于是我们可以把网络中所有的None参数修正为 torch.nn.Identity(),即不进行任何操作。如此操作,再修改网络中关于None的逻辑,就可以将网络导出为C++可读形式。
def _make_layer(self, block, planes, blocks, stride=1):
downsample = torch.nn.Identity()
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(
self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False
),
nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
2,编译问题
接下来需要考虑用C++的opencv读取一张图片,然后利用C++代码读取模型参数,并获取最终需要结果。首先配置了OpenCV4的release版本,借鉴https://www.jianshu.com/p/f54b0fc13811的方法。这里的具体配置不用细讲。但是再编译期间需要下载ippicv文件,不翻墙的话总是错误,服务器翻墙又比较麻烦。我们其实可以手动下载,而用CSDN上花币下。可以使用如下方法。
定位到ippicv的make文件,我们可以发现:
set(THE_ROOT "${OpenCV_BINARY_DIR}/3rdparty/ippicv")
ocv_download(FILENAME ${OPENCV_ICV_NAME}
HASH ${OPENCV_ICV_HASH}
URL
"${OPENCV_IPPICV_URL}"
"$ENV{OPENCV_IPPICV_URL}"
"https://raw.githubusercontent.com/opencv/opencv_3rdparty/${IPPICV_COMMIT}/ippicv/"
DESTINATION_DIR "${THE_ROOT}"
ID IPPICV
STATUS res
UNPACK RELATIVE_URL)
其实就定位了下载路径,结合make文件的上文,我们可以得到下载地址:"https://raw.githubusercontent.com/opencv/opencv_3rdparty/32e315a5b106a7b89dbed51c28f8120a48b368b4/ippicv/ippicv_2019_lnx_intel64_general_20180723.tgz",然后翻墙迅雷下载,就可以了。然后修改make文件路径。
而pytorch的C++模型读取同样使用https://pytorch.org/tutorials/advanced/cpp_export.html中的代码。分别确认opencv4示例代码和libtorch的示例代码可以顺利跑通后,然后考虑把两份代码结合。但是直接结合就会遇到如下问题:
cv::imread(std::string const&, int)’未定义的引用
尝试使用各种解决方案,包括重新编译OpenCV4,各种修改CMakeList文件,都没有顺利编译通过。https://github.com/opencv/opencv/issues/13000opencv的源码中已经提到了这个问题,但是还是没有很好的解决方案。参考https://www.jianshu.com/p/6fe9214431c6,我们考虑对可能是OpenCV4自身的版本问题。经过测试目前OpenCV3.4可以与libtorch一起跑,但是OpenCV4暂时不行。于是对OpenCV进行降级。使用如下代码删除OpenCV4:
sudo make uninstall
cd ..
sudo rm -r build
sudo rm -r /usr/local/include/opencv2 /usr/local/include/opencv /usr/include/opencv /usr/include/opencv2 /usr/local/share/opencv /usr/local/share/OpenCV /usr/share/opencv /usr/share/OpenCV /usr/local/bin/opencv* /usr/local/lib/libopencv*
sudo apt-get –purge remove opencv-doc opencv-data python-opencv
sudo apt-get –purge remove opencv-doc opencv-data python-opencv
cd /etc/ld.so.conf.d/
rm opencv.conf
然后用同样的方法下载OpenCV3.4源码,与相对应的ippicv文件,并进行编译。使用https://www.jianshu.com/p/f646448da265的方法。其中,在cmake指令中,使用
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local ..
make
make install
然后用同样方法添加路径。至此,OpenCV3.4 配置完成。
opencv4.3 与 opencv3.4 以及他们相对应的ippv_2019_intel64_general_20190723.tgz与ippv_2020_intel64_20191018_general.tgz文件我已分享到百度网盘,方便下载。(注意这里是linux版本)
链接: https://pan.baidu.com/s/1N035W8eBBITi1TWCtljBxw 提取码: ym69
3,整合编译
首先配置CMakeList.txt 文件
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(Human_Pose)
# Enable C++11
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
#Set library path
set(OpenCV_DIR /usr/local/lib/python3.6/dist-packages/torch)
set(CMAKE_PREFIX_PATH /usr/local/lib)
set(CMAKE_PREFIX_PATH /usr/local/lib/python3.6/dist-packages/torch)
find_package(Torch REQUIRED NO_CMAKE_FIND_ROOT_PATH)
find_package (OpenCV REQUIRED NO_CMAKE_FIND_ROOT_PATH)
# If the package has been found, several variables will
# be set, you can find the full list with descriptions
# in the OpenCVConfig.cmake file.
# Print some message showing some of them
message(STATUS "OpenCV library status:")
message(STATUS " version: ${OpenCV_VERSION}")
message(STATUS " libraries: ${OpenCV_LIBS}")
message(STATUS " include path: ${OpenCV_INCLUDE_DIRS}")
message(STATUS "Torch library status:")
message(STATUS " version: ${TORCH_VERSION}")
message(STATUS " libraries: ${TORCH_LIBRARIES}")
message(STATUS " include path: ${TORCH_INCLUDE_DIRS}")
include_directories( ${OpenCV_INCLUDE_DIRS} ${TORCH_INCLUDE_DIRS})
add_executable(Human_Pose Human_Pose.cpp)
target_link_libraries(Human_Pose ${OpenCV_LIBS})
target_link_libraries(Human_Pose ${TORCH_LIBRARIES})
项目名为Human_Pose,只包含一个Human_Pose.cpp文件。cpp的测试代码为:
#include <stdio.h>
#include<iostream>
#include <memory>
#include <torch/script.h> // One-stop header.
#include <opencv2/opencv.hpp>
using namespace std;
using namespace cv;
int main(int argc, char** argv )
{
if ( argc != 3 )
{
printf("usage: Human_pose <Image_Path> <path-to-exported-script-module>\n");
return -1;
}
cv::Mat image;
image = cv::imread( argv[1], 1 );
if ( !image.data )
{
printf("No image data \n");
return -1;
}
cv2:imwrite("test.png",image);
torch::jit::script::Module module;
try {
// Deserialize the ScriptModule from a file using torch::jit::load().
module = torch::jit::load(argv[2]);
}
catch (const c10::Error& e) {
std::cerr << "error loading the model\n";
return -1;
}
std::cout << "module input ok\n";
// Create a vector of inputs.
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 256, 192}));
// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';
return 0;
}
新建build文件夹,然后
cd buid
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --config Release
至此,可以完成跑通测试流程。