本测试报告由PandaDB开发团队提供
时间: 2021年3月31日
1.测试简介
PandaDB是以属性图为基础实现的大规模异构数据的融合管理。为指导后续研发,我们以目前最为成熟、应用最广泛、单机图查询的性能标杆图数据库——Neo4j为参照,实测了PandaDB和Neo4j在单机图查询上的性能差异。
本次测试,我们采用了图数据库的国际通行基准测试LDBC的测试数据集和部分测试负载。
2.测试环境
表 1:测试环境
环境 | 配置 |
---|---|
硬件环境 | 单台测试物理机,配置: 双路至强可扩展金牌6230R CPU 384GB DDR4内存 220 TB Raid 5 HDD |
软件环境 | 操作系统版本:CentOS 7.8 (64 bit) JDK版本:1.8 |
测试使用软件版本 | PandaDB版本:v0.3.0.210331 Neo4j版本:v3.5.6 Community |
3.测试负载
基于基准测试LDBC的测试数据和测试负载。其中测试数据中有170亿边,25亿节点。
首先git clone https://github.com/ldbc/ldbc_snb_datagen,然后生成测试数据并导入PandaDB。
(1)测试数据的生成
编辑ldbc_snb_datagen根目录下的params.ini
文件,将generator.scaleFactor设置为1000。然后执行命令:
tools/run.py --cores 24 --memory 100g ./target/ldbc_snb_datagen-0.4.0-SNAPSHOT-jar-with-dependencies.jar params.ini
生成的数据量在1.3TB左右。
(2)测试数据的导入
将测试数据分别导入Neo4j、PandaDB,导入语句见附录1。
Neo4j导入耗时:1d 5h 40m 49s 176ms。
PandaDB导入耗时:21h 19m 13s 107ms。
(3) 数据索引
:person("id")
:post("id")
:comment("id")
:person("firstName")
(4) 数据量
PandaDB磁盘占用为2.4 TB,Neo4j 1.8 TB。
4.测试语句
表 2 : 本测试报告所用测试负载(Cypher语句)
编号 | 查询语句 | 对应的LDBC测试语句 | 测试语义 |
---|---|---|---|
C1 | MATCH (n:person{firstName:"%s"}) RETURN n |
根据非唯一属性过滤节点 | |
C2 | MATCH (m:comment {id: "%s"}) RETURN m.creationDate AS messageCreationDate, m.content as content |
interactive-short4 | 根据唯一属性过滤节点 |
C3 | MATCH (n:person {id:"%s"})-[r:knows]-(friend:person{lastName:"Sharma"}) RETURN id(friend) |
interactive-short3 | 一度关系,返回id |
C4 | MATCH (n:person{id:"%s"})-[r:knows]-(friend) RETURN friend.id AS personId, friend.firstName AS firstName, friend.lastName AS lastName, r.creationDate AS friendshipCreationDate |
interactive-short3 | 一度关系,返回节点数据 |
C5 | MATCH (n:person {id:"%s"})-[:isLocatedIn]->(p:place) RETURN n.firstName AS firstName, n.lastName AS lastName, n.birthday AS birthday, n.locationIP AS locationIP, n.browserUsed AS browserUsed, p.id AS cityId, n.gender AS gender, n.creationDate AS creationDate |
interactive-short1 | 一度关系,返回节点数据 |
C6 | MATCH (m:comment{id:"%s"})-[:hasCreator]->(p:person) RETURN p.id AS personId, p.firstName AS firstName, p.lastName AS lastName |
nteractive-short5 | 一度关系,返回节点数据 |
C7 | MATCH (n:person {id:"%s"})-[:knows]-> () -[:knows]->(m:person{gender:"male"}) RETURN id(m) |
二度关系,首尾节点加属性过滤 | |
C8 | MATCH (n:person {id:"%s"})-[:knows]-> () -[:knows]->(m:person) RETURN m.firstName AS firstName, m.lastName AS lastName, m.birthday AS birthday, m.locationIP AS locationIP, m.browserUsed AS browserUsed |
二度关系,返回属性 | |
C9 | MATCH (:person {id:"%s"})<-[:hasCreator]-(m)-[:replyOf]->(p:post)-[:hasCreator]->(c) RETURN m.id AS messageId, m.creationDate AS messageCreationDate, p.id AS originalPostId, c.id AS originalPostAuthorId, c.firstName AS originalPostAuthorFirstName, c.lastName AS originalPostAuthorLastName |
interactive-short2 | 三度关系 |
C10 | MATCH (m:comment{id:"%s"})-[:replyOf]->(p:post)<-[:containerOf]-(f:forum)-[:hasModerator]->(mod:person) RETURN f.id AS forumId, f.title AS forumTitle, mod.id AS moderatorId, mod.firstName AS moderatorFirstName, mod.lastName AS moderatorLastName |
interactive-short6 | 三度关系 |
C11 | MATCH (m:post{id:"%s"})<-[:replyOf]-(c:comment)-[:hasCreator]->(p:person) RETURN c.id AS commentId, c.content AS commentContent, c.creationDate AS commentCreationDate, p.id AS replyAuthorId, p.firstName AS replyAuthorFirstName, p.lastName AS replyAuthorLastName |
interactive-short7(前半部分) | 两度关系 |
C12 | MATCH (m:post{id:"%s"})-[:hasCreator]->(a:person)-[r:knows]-(p) RETURN m.id AS postId, m.language as postLanguage, p.id AS replyAuthorId, p.firstName AS replyAuthorFirstName, p.lastName AS replyAuthorLastName |
interactive-short7(后半部分) | 两度关系 |
5. 测试结果
表3:测试结果(ms)
查询语句 | Neo4j 查询耗时 |
PandaDB 查询耗时 |
加速比[1] (PandaDB相对于Neo4j) |
---|---|---|---|
C1 | 998 | 1,125 | 0.89 |
C2 | 154 | 54 | 2.85 |
C3 | 7,381 | 1,197 | 6.17 |
C4 | 1,261 | 473 | 2.67 |
C5 | 68 | 109 | 0.62 |
C6 | 139 | 126 | 1.10 |
C7 | 2,218 | 486 | 4.56 |
C8 | 3,275 | 2,447 | 1.34 |
C9 | 37,793 | 27,743 | 1.36 |
C10 | 164 | 169 | 0.97 |
C11 | 117 | 107 | 1.09 |
C12 | 2,232 | 212 | 10.53 |
附录:测试数据导入语句
导入语句如下所示。其中<data-dir>修改为数据实际存储路径。
(1)Neo4j数据导入命令
nohup neo4j-community-3.5.6/bin/neo4j-admin import --database graph1000.db --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tag-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/comment-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tagclass-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/person-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/forum-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/post-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/organisation-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/organisation_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_knows_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tagclass_isSubclassOf_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_studyAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasMember_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_workAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/place_isPartOf_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tag_hasType_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasModerator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_containerOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_hasInterest_tag-output.csv --delimiter "|" --array-delimiter ";" > neo4j-import-0303.log 2>&1 &
(2)PandaDB数据导入命令
nohup java -jar pandadb-importer-v0.3.jar --db-path=<data-dir>/panda-server/ldbc-1000.0302.db --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tag-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/comment-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tagclass-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/person-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/forum-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/post-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/organisation-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/organisation_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_knows_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tagclass_isSubclassOf_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_studyAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasMember_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_workAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/place_isPartOf_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tag_hasType_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasModerator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_containerOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_hasInterest_tag-output.csv --delimeter="|" --array-delimeter=";" > 1000-0302.log 2>&1 &
[1] 注:加速比计算公式:Neo4j查询时间/PandaDB查询时间。此值越大表示pandadb性能优势越明显,为1表示查询性能相同