摘要
建立时间和空间上的模型。时间模型:堆叠RNN和分层RNN。提出两种将空间图像转为节点序列的方法。使用3D数据增强的方法来防止过拟合
引言
以往的RNN识别只关注了骨骼节点在时间上的前后的联系,然而不同动作代表不同的骨骼节点的空间构型(??)。时间上有两种模型,堆叠RNN和分层RNN,分层RNN有着更少的参数。时间RNN学习不同时间点下骨骼节点的动态。空间RNN学习节点的独立性。
双流RNN
空间结构来表示图形化的点,时间结构来表示运动。
时间
堆叠RNN一个时间步一次处理所有的骨骼节点,构建了两层,每层使用LSTM。分层RNN将人体分为五个部分,四肢加躯干。
五个独立的部分组成整体的运动,踢腿用到了腿部,跑步用到了双手和双脚。同样分层RNN也是两层。第一层每个部分对应一个RNN,第二层一个RNN来构建整体运动。
空间
为了构建节点独立性,我们要将图形结构转换为序列,RNN每一步的输入对应每个节点的坐标信息,一个节点有三个信息(x,y,z)
As a joint has only three coordinates,we select a temporal window centered at the time step and concatenate the coordinates inside this window to represent this joint(????????)(时间空间的输入矩阵是对称阵,时间输入矩阵一次输入24个节点所以是不同时间下的24个节点连续的输入,空间输入矩阵是一个节点在不同时间下的所有的位置信息,我是这样理解的。。。。)
序列构建有两种方法
The traversal sequence guarantees the spatial rela�tionships in a graph by accessing most joints twice in both forward and reverse directions.
Different from the temporal RNN, spatial RNN could recognize actions by a glimpse of one frame (when the size of temporal window equals 1). Here, we do not use a hi�erarchical structure based on body parts, as the number of joints is limited (e.g., 25 for the NTU RGB+D dataset)
堆叠rnn每层的神经元个数是512,分层rnn每个部分的rnn和整体的rnn是128、512
To demonstrate the effectiveness of the two-stream RNN, we simply adopt stacked RNN for the temporal chan�nel and chain sequence for the spatial channel. The weight of predicted scores of the temporal RNN is 0.9, and the temporal window size of the spatial RNN is one fourth of the fixed length T, both are determined by cross-validation. The networks are trained using stochastic gradient descent. The learning rate, initiated with 0.02, is reduced by multi�plying it by 0.7 every 60 epochs during training.