目录
图嵌入是一种从图中生成无监督节点特征(node features)的方法,生成的特征可以应用在各类机器学习任务上。现代的图网络,尤其是在工业应用中,通常会包含数十亿的节点(node)和数万亿的边(edge)。这已经超出了已知嵌入系统的处理能力。Facebook开源了一种嵌入系统,PyTorch-BigGraph(PBG),系统对传统的多关系嵌入系统做了几处修改让系统能扩展到能处理数十亿节点和数万亿条边的图形。
本系列为翻译的pytouch的官方手册,希望能帮助大家快速入门GNN及其使用,全文十五篇,文中如果有勘误请随时联系。
(一)Facebook开源图神经网络-Pytorch Biggraph
(二)Facebook:BigGraph 中文文档-数据模型(PyTorch)
(三)Facebook:BigGraph 中文文档-从实体嵌入到边分值(PyTorch)
(四)Facebook:BigGraph 中文文档-I/O格式化(PyTorch)
(五)Facebook:BigGraph 中文文档-批预处理
(六)Facebook:BigGraph 中文文档-分布式模式(PyTorch)
(七)Facebook:BigGraph 中文文档-损失计算(PyTorch)
(八)Facebook:BigGraph 中文文档-评估(PyTorch)
(九)Facebook:BigGraph 中文文档-动态关系(PyTorch)
Dynamic relations 动态关系
Caution 注意
This is an advanced topic! 这是升级教程!
Enabling the dynamic_relations flag in the configuration activates an alternative mode to be used for graphs with a large number of relations (more than ~100 relations). In dynamic relation mode, PBG runs with several modifications to its “standard” operation in order to support the large number of relations.
在配置中启动dynamic_relations配置将激活另一种模式,用于具有大量关系(超过~100)的图。在动态关系模式下,PBG运行时对其“标准”操作进行了一些修改用于支持大量的关系。
The differences are:
相比不同有:
The number of relations isn’t provided in the config but is instead found in the input data, namely in the entity path, inside a dynamic_rel_count.txt file. The settings of the relations, however, are still provided in the config file. This happens by providing a single relation config which will act as a “template” for all other ones, by being duplicated an appropriate number of times. One can think of this as the one relation in the config being “broadcasted” to the size of the relation list found in the dynamic_rel_count.txt file.
配置中不需要提供number数量,替代的是在输入数据的整个实体路径中来查找,即dynamic_rel_count.txt文件,但关系的设置,仍然需要在配置文件中配置。这是通过提供一个单独的关系配置来实现的,该配置将充当当所有其他关系的“模板”,并且被复制合适的次数。我们将其看做是配置中的一个关系被“广播”到dynamic_rel_count.txt文件中的关系列表的大小。
The batches of positive edges that are passed from the training loop into the model contain edges for multiple relation types at the same time (instead of each batch coming entirely from the same relation type). This introduces some performance challenges in how the operators are applied to the embeddings, as instead of a single operator with a single set of parameters applied to all edges, there might be a different one for each edge. The previous property ensures that all the operators are of the same type, so just their parameters might differ from one row to another. To account for this, the operators for dynamic relations are implemented differently, with a single operator object containing the parameters for all relation types. This implementation detail should be transparent as for how the operators are applied to the embeddings, but might come up when retrieving the parameters at the end of training.
在训练循环中包含正边的批次,传入模型中同时包含多个关系类型的边(不是每个批次完全来自同一关系类型)。这让如何将运算符应用于嵌入上带来了一些性能挑战,因为对于每个边,可能会有一个不同的运算符,而不是对所有边应用一组参数的单个运算符。previous属性确保所有运算符都是同一类型的,因此这些参数可能会不同的行不一样。为了匹配,动态关系的运算符以不同方式实现,单个运算符对象包含所有关系类型的参数。对于如何向运算符应用到嵌入中,整个实现细节应该是透明的,但在训练结束时检索参数是可能会出现。
With non-dynamic relations, the operator is applied to the embedding of the right-hand side entity of the edge, whereas the embedding of the left-hand side entity is left unchanged. In a given batch, denote the 𝑖i-th positive edge by (𝑥𝑖,𝑟,𝑦𝑖) (𝑥𝑖 and 𝑦𝑖yi being the left- and right-hand side entities, 𝑟 being the relation type). For each of the positive edges, denote its 𝑗-th negative sample (𝑥𝑖,𝑟,𝑦′𝑖,𝑗). Due to same-batch negative sampling it may occur that the same right-hand side entity is used as a negative for several positives, that is, that 𝑦′𝑖1,𝑗1=𝑦′𝑖2,𝑗2 . for 𝑖1≠𝑖2. However, since it’s the same relation type 𝑟rfor all negatives, all the right-hand side entities will be transformed in the same way (i.e., passed through 𝑟’s operator) no matter what positive edge they are a negative for. we need to apply the operator of 𝑟r to all of them, hence the total number of operator evaluations is equal to the number of positives and negatives.
对于非动态关系,算子应用在右侧实体的嵌入上,同事左侧试题的嵌入保持不变。在给定的批次中,用(𝑥𝑖,𝑟,𝑦𝑖)来表示第i个正边(xi和yi为左侧和右侧的实体,r是关系类型)。对每一个正边,用(𝑥𝑖,𝑟,𝑦′𝑖,𝑗)来表示对应的第j个负样本。由于同一批负采样可能会出现同一个右侧实体被抽样为复变,如:𝑦′𝑖1,𝑗1=𝑦′𝑖2,𝑗2 并且 i1≠𝑖2.