Solr全量与增量导入

Solr Full Import全量导入

所谓全量索引一般指的是每次从数据库中读取需要导入的全部数据，然后提交到Solr Server，最后删除指定core的所有索引数据进行重建。全量导入一般在数据首次导入或者备份数据恢复时执行。

以下为一个多表查询的全量导入案例：

ER图：

ER图

根据如上ER图，我们在数据库中执行如下SQL建表并插入测试数据。

use solr;

create table feature(item_id bigint,descrip varchar(80));
create table item(id bigint, item_name varchar(20), manu varchar(20), weight float,price float, popularity int, includes varchar(20));
create table item_category(item_id bigint, category_id bigint);
create table category(id bigint, descrip varchar(80));

alter table item add primary key(id);
alter table item_category add primary key(item_id, category_id);
alter table category add primary key(id);

insert into item values(1,"item1", "menu1", 12.0, 33.1, 10, "includes1");
insert into item_category values(1,1);
insert into category values(1,"this is the description of category 1");
insert into feature values(1,"this is the feature 1");

需求描述：我们希望将item表的所有字段以及item的category信息，item的descrip描述信息一并导入到solr指定的core中，因此solr的schema.xml中需要预先定义如下域：name, manu, weight, price, popularity, includes, cat, features.
解决方案：

比较容易想到的就是通过SQL语句一并返回所需的域（数据）。

use solr;
select i.id,i.item_name,i.manu,i.weight,i.price,i.popularity,i.includes,c.descrip as cat,f.descrip as feature from item i, item_category ic, category c, feature f where i.id=ic.item_id and ic.category_id=c.id and i.id=f.item_id;

也可以通过data-config.xml中嵌套entity来实现。

<dataConfig>
  <dataSource name="jdbcDataSource" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/solr?useUnicode=true&amp;characterEncoding=utf-8" user="root" password="mysql"/>
  <document>
     <entity dataSource="jdbcDataSource" name="item" query="select * from item">
        <field column="id" name="id"/>
        <field column="name" name="name"/>
        <field column="manu" name="manu"/>
        <field column="weight" name="weight"/>
        <field column="price" name="price"/>
        <field column="popularity" name="popularity"/>
        <field column="includes" name="includes"/>
        <entity name="feature" query="select descrip from feature where item_id='${item.id}'">
            <field column="descrip" name="features"/>
        </entity>
        <entity name="item_category" query="select category_id from item_category where item_id='${item.id}'">
            <entity name="category" query="select descrip from category where id='${item_category.category_id}'">
                <field column="descrip" name="cat"/>
            </entity>
        </entity>
     </entity>
  </document>
</dataConfig>

至此，重新加载core，通过url接口进行全量导入：http://localhost:8080/solr/mysql_fullimport/dataimport?command=full-import

Solr Delta Import增量导入

当索引数据量很大时，每次都依靠全量导入显然很不切实际，所以增量导入索引数据更为重要。
增量导入操作内部是新开辟一个新线程来完成，并且此时core的dataimport运行状态为status="busy"。增量导入耗时时间取决于需要增量导入的数据集合大小。任何时候你都可以通过http://localhost:8080/solr/<core_name>/dataimport 这个链接来获取到增量导入的运行状态。
当增量导入操作被执行，他会读取存储在conf/deltaimport.properties配置文件，利用配置文件里记录的上一次操作时间来运行增量查询，增量导入完成后，会更新conf/deltaimport.properties配置文件里的上一次操作时间戳。首次执行增量导入时，若conf/deltaimport.properties配置文件不存在，会自动创建。

#Sun Mar 03 19:59:43 IRKT 2019
item.last_index_time=2019-03-03 19\:59\:43
last_index_time=2019-03-03 19\:59\:43

如果要使用增量导入，前提是你的表必需有两个字段，一个是删除标志字段即逻辑删除标志：isdeleted，另一个则是数据创建时间字段：create_date,字段名称不一定非得是isdeleted和create_date，但必须要包含两个表示该含义的字段。根据数据创建时间跟上一次增量导入操作时间一对比，就可以通过SQL语句查询出需要增量导入的数据，根据isdeleted字段可以查询出被标记为删除的数据，这些数据的ID主键需要传递给solr，这样solr就能同步删除索引中相关Document，实现数据增量更新。如果你数据表里的数据都是物理删除，没有逻辑标志字段的话，那么找出已删除的数据显得比较困难，所以这就是需要逻辑删除标志字段的原因。

仍然使用上一节的那几张表为例。对于复合主键记录的增量更新，solr会抛出deltaQuery has no column to resolve to declared paimary key pk='key1, key2'，暂时还没有找到合适的解决方案。如有，请留言告知，谢谢。

<dataConfig>
  <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/solr?useUnicode=true&amp;characterEncoding=utf-8" user="root" password="mysql"/>
  <document>
    <entity name="item" pk="id" query="select * from item" deltaImportQuery="select * from item where id='${dih.delta.id}'" deltaQuery="select id from item where last_modified &gt; '${dih.last_index_time}'">
    <entity name="feature" pk="item_id" query="select descrip as features from feature where item_id='${item.id}'" deltaQuery="select item_id from feature where last_modified > '${dih.last_index_time}'" parentDeltaQuery="select id from item where id='${feature.item_id}'"/>
    <entity name="item_category" pk="item_id" query="select category_id from item_category where item_id='${item.id}'" deltaQuery="select item_id, category_id from item_category where last_modified > '${dih.last_index_time}'" parentDeltaQuery="select id from item where id='${item_category.item_id}'">
    <entity name="category" pk="id" query="select descrip as cat from category where id='${item_category.category_id}'" deltaQuery="select id from category where last_modified &gt; '${dih.last_index_time}'" parentDeltaQuery="select item_id, category_id from item_category where category_id='${category.id}'"/>
    </entity>
    </entity>
  </document>
</dataConfig>

pk：表示当前entity表示主键字段名称，这里的主键指的是数据库表中的主键，而非solr中的uniqueKey主键域。如果你的sql语句中使用了as关键字为主键字段定义了别名，那么这里的pk属性需要相应的修改为主键字段的别名，切记；
query：用于指定全量导入时需要的sql语句，比如select * from xxx where isdeleted=0，查询返回的是为被删除的所有有效数据，这个query参数只对全量导入有效，对增量导入无效；
deltaQuery：查询需要增量导入的记录的主键id所需的sql语句。可能是update，insert，delete等操作，比如：deltaQuery="select id from xxx where my_date > '${dataimporter.last_index_time}'"，此参数值对增量导入有效；
deletedPkQuery:查询已经被逻辑删除了的数据所需的SQL语句，所以这里你需要一个类似isdeleted的逻辑删除标志位字段。solr通过此参数表示的sql语句执行后返回的结果集来删除索引里面对应的数据。使用示例：select id from myinfo where isdeleted=1，此参数对增量导入有效。
deltaImportQuery: deltaImpotQuery="select * from myinfo where id='${dataimporter.delta.id}'"，利用deltaQuery参数返回的所有需要增量导入的数据主键id，遍历每个主键id，然后循环执行deltaImportQuery参数表示的sql语句返回所有需要增量导入的数据。其中变量${dataimporter.delta.id}用于获取deltaQuery返回的每个主键id。

增量导入的接口url：http://localhost:8080/solr/<core_name>/dataimport?command=delta-import

扩展阅读：solr dataimport scheduler:http://code.google.com/p/solr-data-import-scheduler/

最后编辑于：2019.03.03 22:04:47

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,324评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,303评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,192评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,555评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,569评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,566评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,927评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,583评论 0赞 257
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,827评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,590评论 2赞 320
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,669评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,365评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,941评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,928评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,159评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,880评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,399评论 2赞 342

Solr全量与增量导入

Solr Full Import全量导入

Solr Delta Import增量导入

推荐阅读更多精彩内容