龟龟是最可爱的小猫咪
环境:cuda9.0&cudnn7.0.5
Install mxnet for cpp package in TX2 is not that easy
Record here for my experience.
Steps:
Follow the documentation on this site Install MXNet on a Jetson , there is a little different
-
First clone mxnet from github && cd mxnet
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet cd mxnet git submodule init git submodule update
-
Configure CUDA:
nvcc --version
on my TX2 is CUDA9.0
sudo rm /usr/local/cuda sudo ln -s /usr/local/cuda-9.0 /usr/local/cuda
-
Copy config.mk
cp make/crosscompile.jetson.mk config.mk
-
Edit config.mk, in config.mk , modify these settings:
USE_CUDA_PATH = /usr/local/cuda
USE_OPENCV = 1
USE_JEMALLOC = 0 which is different from official guide but VERY IMPORTENT
USE_GPERFTOOLS = 0 which is different from official guide but **VERY IMPORTENT
USE_CPP_PACKAGE = 1 for cpp package
Update the NVCC settings. NVCCFLAGS := -m64
there 3 and 4 is important , or when you finish your build , using the mxnet api , you might get error like :
src/tcmalloc.cc:284] Attempt to free invalid pointer
-
in 3rdparty/mshadow/make/mshadow.mk, change this setteing as follow:
MSHADOW_CFLAGS += -DMSHADOW_USE_PASCAL=1
-
Something else:
in Makefile, limit the arch for tx2, which is important.
KNOWN_CUDA_ARCHS := 62 # limit arch for tx2 hereifeq ($(USE_CUDA), 1) ifeq ($(CUDA_ARCH),) # KNOWN_CUDA_ARCHS := 30 35 50 52 60 61 70 75 KNOWN_CUDA_ARCHS := 62 # limit arch for tx2 here # Run nvcc on a zero-length file to check architecture-level support. # Create args to include SASS in the fat binary for supported levels. CUDA_ARCH := $(foreach arch,$(KNOWN_CUDA_ARCHS), \ $(shell $(NVCC) -arch=sm_$(arch) -E --x cu /dev/null >/dev/null 2>&1 && \ echo -gencode arch=compute_$(arch),code=sm_$(arch))) # Convert a trailing "code=sm_NN" to "code=[sm_NN,compute_NN]" to also # include the PTX of the most recent arch in the fat-binaries for # forward compatibility with newer GPUs. CUDA_ARCH := $(shell echo $(CUDA_ARCH) | sed 's/sm_\([0-9]*\)$$/[sm_\1,compute_\1]/') # Add fat binary compression if supported by nvcc. COMPRESS := --fatbin-options -compress-all CUDA_ARCH += $(shell $(NVCC) -cuda $(COMPRESS) --x cu /dev/null -o /dev/null >/dev/null 2>&1 && \ echo $(COMPRESS)) endif $(info Running CUDA_ARCH: $(CUDA_ARCH)) endif
OR you might get error like :
INFO: nvcc was not found on your path
INFO: Using /usr/local/cuda-9.0/bin/nvcc as nvcc path
Running CUDA_ARCH: -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=[sm_70,compute_70] --fatbin-options -compress-all
...
DMXNET_USE_LIBJPEG_TURBO=0" src/operator/tensor/broadcast_reduce_op_value.cu
Killed
Makefile:471: recipe for target 'build/src/operator/tensor/ordering_op_gpu.o' failed
make: *** [build/src/operator/tensor/ordering_op_gpu.o] Error 137
make: *** Waiting for unfinished jobs....
-
when you finished your built and use its cpp api, you may meet error like this:
terminate called after throwing an instance of 'dmlc::Error'
what(): [01:20:54] /usr/include/mxnet-cpp/ndarray.hpp:236: Check failed: MXNDArrayWaitToRead(blob_ptr_->handle_) == 0 (-1 vs. 0)
terminate called after throwing an instance of 'dmlc::Error'
what(): [01:20:54] /usr/include/mxnet-cpp/ndarray.hpp:236: Check failed: MXNDArrayWaitToRead(blob_ptr_->handle_) == 0 (-1 vs. 0)
this is a problem on gpu mode, which is resulted of TX2 **out of memory**, change the input to a smaller one can solve.