Caffe源码中math_functions文件分析

Caffe源码中有一些重要文件，这里介绍下math_functions文件。

include文件：

<glog/logging.h>：GLog库，它是google的一个开源的日志库，其使用可以参考：http://blog.csdn.net/fengbingchun/article/details/48768039 ；
<caffe/common.hpp>、<caffe/util/device_alternate.hpp>：这两个文件的介绍可以参考： http://blog.csdn.net/fengbingchun/article/details/54955236 ；

caffe/util/mkl_alternate.hpp文件：

这个文件里包含两种库，一个是Intel MKL，一个是OpenBLAS，这里用的是OpenBLAS。如果商用Intel MKL是需要付费的，下面仅对Intel MKL进行简单介绍。

Intel MKL(Math Kernel Library)即Intel数学核心函数库，它是一套高度优化和广泛线程安全的数学例程，专为需要极致性能的科学、工程及金融等领域的应用而设计。核心数学函数包括BLAS、LAPACK、ScaLAPACK、Sparse Solver、快速傅里叶变换、矢量数学及其它函数。它可以为当前及下一代英特尔处理器提供性能优化，包括更出色地与Microsoft Visual Studio、Eclipse和XCode相集成。英特尔MKL支持完全集成英特尔兼容性OpenMP运行时库，以实现更出色的Windows/Linux跨平台兼容性。

关于OpenBLAS的介绍可以参考： http://blog.csdn.net/fengbingchun/article/details/55509764；

在github/fengbingchun/Caffe_Test中<mkl_alternate.hpp>中走的是OpenBLAS分支。

math_functions文件

math_functions文件内函数：封装了一些基础的数学运算函数

(1)、caffe_cpu_gemm：C=alphaAB+beta*C；

(2)、caffe_cpu_gemv：y=alphaAx+beta*y；

(3)、caffe_axpy：Y=alpha*X+Y；

(4)、caffe_cpu_axpby：Y=alphaX+betaY；

(5)、caffe_copy：从X中拷贝前N个元素到Y中；

(6)、caffe_set：将X中的前N个元素置为alpha；

(7)、caffe_add_scalar：给Y中的前N个元素分别加上常数alpha；

(8)、caffe_scal：X = alpha*X；

(9)、caffe_sqr/ caffe_exp/caffe_log/caffe_abs：会调用mkl_alternate.hpp中的vsSqr、vsExp、vsLn、vsAbs、vdSqr、vdExp、vdLn、vdAbs函数；

(10)、caffe_add/caffe_sub/caffe_mul/caffe_div：会调用mkl_alternate.hpp中的vsAdd、vsSub、vsMul、vsDiv、vdAdd、vdSub、vdMul、vdDiv函数；

(11)、caffe_powx：会调用mkl_alternate.hpp中的vsPowx和vdPowx函数；

(12)、caffe_rng_rand：返回一个unsignedint类型的随机数；

(13)、caffe_nextafter：在最大方向上，返回b可以表示的最接近的数值；

(14)、caffe_rng_uniform：产生指定范围内的均匀分布随机数；

(15)、caffe_rng_gaussian：产生高斯分布随机数；

(16)、caffe_rng_bernoulli：产生伯努利分布随机数；

(17)、caffe_cpu_dot：计算步长为1的内积；

(18)、caffe_cpu_strided_dot：计算指定步长的内积；

(19)、caffe_cpu_hamming_distance：计算x、y之间的海明距离；

(20)、caffe_cpu_asum：计算向量x中前n个元素的绝对值之和；

(21)、caffe_sign：类似于正负号函数，仅返回-1或1；

(22)、caffe_cpu_scale：Y=alpha*X 。

注：本文的内容参考博客：http://blog.csdn.net/fengbingchun/article/details/56280708

math_functions.hpp文件源码注释

#ifndef CAFFE_UTIL_MATH_FUNCTIONS_H_
#define CAFFE_UTIL_MATH_FUNCTIONS_H_

#include <stdint.h>
#include <cmath>  // for std::fabs and std::signbit

#include "glog/logging.h"

#include "caffe/common.hpp"
#include "caffe/util/device_alternate.hpp"
#include "caffe/util/mkl_alternate.hpp"

namespace caffe {

// Caffe gemm provides a simpler interface to the gemm functions, with the
// limitation that the data has to be contiguous in memory.
// C=alpha*A*B+beta*C
// A,B,C 是输入矩阵（一维数组格式）
// CblasRowMajor :数据是行主序的（二维数据也是用一维数组储存的）
// TransA, TransB：是否要对A和B做转置操作（CblasTrans CblasNoTrans）
// M： A、C 的行数
// N： B、C 的列数
// K： A 的列数， B 的行数
// lda ： A的列数（不做转置）行数（做转置）
// ldb： B的列数（不做转置）行数（做转置）
template <typename Dtype>
void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,
    const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
    const Dtype alpha, const Dtype* A, const Dtype* B, const Dtype beta,
    Dtype* C);

// y=alpha*A*x+beta*y
// 其中X和Y是向量，A 是矩阵
// M：A 的行数
// N：A 的列数
// cblas_sgemv 中的 参数1 表示对X和Y的每个元素都进行操作
template <typename Dtype>
void caffe_cpu_gemv(const CBLAS_TRANSPOSE TransA, const int M, const int N,
    const Dtype alpha, const Dtype* A, const Dtype* x, const Dtype beta,
    Dtype* y);

// Y=alpha*X+Y
// N：为X和Y中element的个数
template <typename Dtype>
void caffe_axpy(const int N, const Dtype alpha, const Dtype* X,
    Dtype* Y);

// Y=alpha*X+beta*Y
template <typename Dtype>
void caffe_cpu_axpby(const int N, const Dtype alpha, const Dtype* X,
    const Dtype beta, Dtype* Y);

// 从X中拷贝前N个元素到Y中
template <typename Dtype>
void caffe_copy(const int N, const Dtype *X, Dtype *Y);

// 将X中的前N个元素置为alpha
template <typename Dtype>
void caffe_set(const int N, const Dtype alpha, Dtype *X);

//  一般为新申请的内存做初始化，功能是将buffer所指向内存中的每个字节的内容全部设置为c指定的ASCII值, count为块的大小
inline void caffe_memset(const size_t N, const int alpha, void* X) {
  memset(X, alpha, N);  // NOLINT(caffe/alt_fn)
}

// 给X中的前N个元素分别加上常数alpha
template <typename Dtype>
void caffe_add_scalar(const int N, const Dtype alpha, Dtype *X);

// X = alpha*X
// N： X中element的个数
template <typename Dtype>
void caffe_scal(const int N, const Dtype alpha, Dtype *X);

template <typename Dtype>
void caffe_sqr(const int N, const Dtype* a, Dtype* y);

template <typename Dtype>
void caffe_add(const int N, const Dtype* a, const Dtype* b, Dtype* y);

template <typename Dtype>
void caffe_sub(const int N, const Dtype* a, const Dtype* b, Dtype* y);

template <typename Dtype>
void caffe_mul(const int N, const Dtype* a, const Dtype* b, Dtype* y);

// 会调用mkl_alternate.hpp中的vsAdd、vsSub、vsMul、vsDiv、vdAdd、vdSub、vdMul、vdDiv函数
// caffe_add、 caffe_sub、 caffe_mul、 caffe_div 函数
// 这四个函数分别实现element-wise的加减乘除（y[i] = a[i] + - * \ b[i]）
template <typename Dtype>
void caffe_div(const int N, const Dtype* a, const Dtype* b, Dtype* y);

// 会调用mkl_alternate.hpp中的vsPowx和vdPowx函数
// caffe_powx、 caffe_sqr、 caffe_exp、 caffe_abs 函数
// 同样是element-wise操作，分别是y[i] = a[i] ^ b， y[i] = a[i]^2，y[i] = exp(a[i] )，y[i] = |a[i]|
template <typename Dtype>
void caffe_powx(const int n, const Dtype* a, const Dtype b, Dtype* y);

template <typename Dtype>
void caffe_bound(const int n, const Dtype* a, const Dtype min,
    const Dtype max, Dtype* y);

// 返回一个unsignedint类型的随机数
unsigned int caffe_rng_rand();

// 在最大方向上，返回b可以表示的最接近的数值
template <typename Dtype>
Dtype caffe_nextafter(const Dtype b);

// 产生指定范围内的均匀分布随机数
template <typename Dtype>
void caffe_rng_uniform(const int n, const Dtype a, const Dtype b, Dtype* r);

// 产生高斯分布随机数
template <typename Dtype>
void caffe_rng_gaussian(const int n, const Dtype mu, const Dtype sigma,
                        Dtype* r);

// 产生伯努利分布随机数
template <typename Dtype>
void caffe_rng_bernoulli(const int n, const Dtype p, int* r);

template <typename Dtype>
void caffe_rng_bernoulli(const int n, const Dtype p, unsigned int* r);

template <typename Dtype>
void caffe_exp(const int n, const Dtype* a, Dtype* y);

template <typename Dtype>
void caffe_log(const int n, const Dtype* a, Dtype* y);

// 会调用mkl_alternate.hpp中的vsSqr、vsExp、vsLn、vsAbs、vdSqr、vdExp、vdLn、vdAbs函数
template <typename Dtype>
void caffe_abs(const int n, const Dtype* a, Dtype* y);

// 计算步长为1的内积
template <typename Dtype>
Dtype caffe_cpu_dot(const int n, const Dtype* x, const Dtype* y);

// 计算指定步长的内积
// 功能： 返回 vector X 和 vector Y 的内积。
// incx， incy ： 步长，即每隔incx 或 incy 个element 进行操作。
template <typename Dtype>
Dtype caffe_cpu_strided_dot(const int n, const Dtype* x, const int incx,
    const Dtype* y, const int incy);

// Returns the sum of the absolute values of the elements of vector x
// 计算向量x中前n个元素的绝对值之和
template <typename Dtype>
Dtype caffe_cpu_asum(const int n, const Dtype* x);

// the branchless, type-safe version from
// http://stackoverflow.com/questions/1903954/is-there-a-standard-sign-function-signum-sgn-in-c-c
// 类似于正负号函数，仅返回-1或1
template<typename Dtype>
inline int8_t caffe_sign(Dtype val) {
  return (Dtype(0) < val) - (val < Dtype(0));
}

// The following two macros are modifications of DEFINE_VSL_UNARY_FUNC
//   in include/caffe/util/mkl_alternate.hpp authored by @Rowland Depp.
// Please refer to commit 7e8ef25c7 of the boost-eigen branch.
// Git cherry picking that commit caused a conflict hard to resolve and
//   copying that file in convenient for code reviewing.
// So they have to be pasted here temporarily.
// 一元函数，类似于mkl_alternate.hpp中的宏DEFINE_VSL_UNARY_FUNC，包括：
//  (1)、caffe_cpu_sign：正负号函数，输出-1、0、1；
//  (2)、caffe_cpu_sgnbit：作用类似于std::signbit，static_cast<bool>((std::signbit)(x))；x为负数输出为1，其它输出为0；
//  (3)、caffe_cpu_fabs：取绝对值，作用类似于std::fabs。
#define DEFINE_CAFFE_CPU_UNARY_FUNC(name, operation) \
  template<typename Dtype> \
  void caffe_cpu_##name(const int n, const Dtype* x, Dtype* y) { \
    CHECK_GT(n, 0); CHECK(x); CHECK(y); \
    for (int i = 0; i < n; ++i) { \
      operation; \
    } \
  }

// output is 1 for the positives, 0 for zero, and -1 for the negatives
DEFINE_CAFFE_CPU_UNARY_FUNC(sign, y[i] = caffe_sign<Dtype>(x[i]));

// This returns a nonzero value if the input has its sign bit set.
// The name sngbit is meant to avoid conflicts with std::signbit in the macro.
// The extra parens are needed because CUDA < 6.5 defines signbit as a macro,
// and we don't want that to expand here when CUDA headers are also included.
DEFINE_CAFFE_CPU_UNARY_FUNC(sgnbit, \
    y[i] = static_cast<bool>((std::signbit)(x[i])));

DEFINE_CAFFE_CPU_UNARY_FUNC(fabs, y[i] = std::fabs(x[i]));

// Y=alpha*X
template <typename Dtype>
void caffe_cpu_scale(const int n, const Dtype alpha, const Dtype *x, Dtype* y);

#ifndef CPU_ONLY  // GPU

// Decaf gpu gemm provides an interface that is almost the same as the cpu
// gemm function - following the c convention and calling the fortran-order
// gpu code under the hood.
template <typename Dtype>
void caffe_gpu_gemm(const CBLAS_TRANSPOSE TransA,
    const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
    const Dtype alpha, const Dtype* A, const Dtype* B, const Dtype beta,
    Dtype* C);

template <typename Dtype>
void caffe_gpu_gemv(const CBLAS_TRANSPOSE TransA, const int M, const int N,
    const Dtype alpha, const Dtype* A, const Dtype* x, const Dtype beta,
    Dtype* y);

template <typename Dtype>
void caffe_gpu_axpy(const int N, const Dtype alpha, const Dtype* X,
    Dtype* Y);

template <typename Dtype>
void caffe_gpu_axpby(const int N, const Dtype alpha, const Dtype* X,
    const Dtype beta, Dtype* Y);

void caffe_gpu_memcpy(const size_t N, const void *X, void *Y);

template <typename Dtype>
void caffe_gpu_set(const int N, const Dtype alpha, Dtype *X);

inline void caffe_gpu_memset(const size_t N, const int alpha, void* X) {
#ifndef CPU_ONLY
  CUDA_CHECK(cudaMemset(X, alpha, N));  // NOLINT(caffe/alt_fn)
#else
  NO_GPU;
#endif
}

template <typename Dtype>
void caffe_gpu_add_scalar(const int N, const Dtype alpha, Dtype *X);

template <typename Dtype>
void caffe_gpu_scal(const int N, const Dtype alpha, Dtype *X);

template <typename Dtype>
void caffe_gpu_add(const int N, const Dtype* a, const Dtype* b, Dtype* y);

template <typename Dtype>
void caffe_gpu_sub(const int N, const Dtype* a, const Dtype* b, Dtype* y);

template <typename Dtype>
void caffe_gpu_mul(const int N, const Dtype* a, const Dtype* b, Dtype* y);

template <typename Dtype>
void caffe_gpu_div(const int N, const Dtype* a, const Dtype* b, Dtype* y);

template <typename Dtype>
void caffe_gpu_abs(const int n, const Dtype* a, Dtype* y);

template <typename Dtype>
void caffe_gpu_exp(const int n, const Dtype* a, Dtype* y);

template <typename Dtype>
void caffe_gpu_log(const int n, const Dtype* a, Dtype* y);

template <typename Dtype>
void caffe_gpu_powx(const int n, const Dtype* a, const Dtype b, Dtype* y);

// caffe_gpu_rng_uniform with two arguments generates integers in the range
// [0, UINT_MAX].
void caffe_gpu_rng_uniform(const int n, unsigned int* r);

// caffe_gpu_rng_uniform with four arguments generates floats in the range
// (a, b] (strictly greater than a, less than or equal to b) due to the
// specification of curandGenerateUniform.  With a = 0, b = 1, just calls
// curandGenerateUniform; with other limits will shift and scale the outputs
// appropriately after calling curandGenerateUniform.
template <typename Dtype>
void caffe_gpu_rng_uniform(const int n, const Dtype a, const Dtype b, Dtype* r);

template <typename Dtype>
void caffe_gpu_rng_gaussian(const int n, const Dtype mu, const Dtype sigma,
                            Dtype* r);

template <typename Dtype>
void caffe_gpu_rng_bernoulli(const int n, const Dtype p, int* r);

template <typename Dtype>
void caffe_gpu_dot(const int n, const Dtype* x, const Dtype* y, Dtype* out);

template <typename Dtype>
void caffe_gpu_asum(const int n, const Dtype* x, Dtype* y);

template<typename Dtype>
void caffe_gpu_sign(const int n, const Dtype* x, Dtype* y);

template<typename Dtype>
void caffe_gpu_sgnbit(const int n, const Dtype* x, Dtype* y);

template <typename Dtype>
void caffe_gpu_fabs(const int n, const Dtype* x, Dtype* y);

template <typename Dtype>
void caffe_gpu_scale(const int n, const Dtype alpha, const Dtype *x, Dtype* y);

#define DEFINE_AND_INSTANTIATE_GPU_UNARY_FUNC(name, operation) \
template<typename Dtype> \
__global__ void name##_kernel(const int n, const Dtype* x, Dtype* y) { \
  CUDA_KERNEL_LOOP(index, n) { \
    operation; \
  } \
} \
template <> \
void caffe_gpu_##name<float>(const int n, const float* x, float* y) { \
  /* NOLINT_NEXT_LINE(whitespace/operators) */ \
  name##_kernel<float><<<CAFFE_GET_BLOCKS(n), CAFFE_CUDA_NUM_THREADS>>>( \
      n, x, y); \
} \
template <> \
void caffe_gpu_##name<double>(const int n, const double* x, double* y) { \
  /* NOLINT_NEXT_LINE(whitespace/operators) */ \
  name##_kernel<double><<<CAFFE_GET_BLOCKS(n), CAFFE_CUDA_NUM_THREADS>>>( \
      n, x, y); \
}

#endif  // !CPU_ONLY

}  // namespace caffe

#endif  // CAFFE_UTIL_MATH_FUNCTIONS_H_

最后编辑于：2018.01.03 21:15:25

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 204,445评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,889评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,047评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,760评论 1赞 276
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,745评论 5赞 367
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,638评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,011评论 3赞 398
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,669评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,923评论 1赞 299
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,655评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,740评论 1赞 330
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,406评论 4赞 320
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,995评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,961评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,197评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,023评论 2赞 350
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,483评论 2赞 342

Caffe源码中math_functions文件分析

include文件：

caffe/util/mkl_alternate.hpp文件：

math_functions文件

math_functions.hpp文件源码注释

推荐阅读更多精彩内容