It's very useful to retrieve massive data, instead of the deep pagination.
Deep pagination with from
and size
— e.g. ?size=10&from=10000
— is very inefficient as (in this example) 100,000 sorted results have to be retrieved from each shard and resorted in order to return just 10 results. This process has to be repeated for every page requested.
Initial query :
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{ "query": { "match" : { "title" : "elasticsearch" } }}'
The result from the above request includes a _scroll_id, which should be passed to the scroll
API in order to retrieve the next batch of results.
curl -XGET 'localhost:9200/_search/scroll?scroll=1m' -d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1'