Quickstart单机测试
http://druid.io/docs/0.10.1/tutorials/quickstart.html
(1)Getting started
下载安装Druid:
curl -O http://static.druid.io/artifacts/releases/druid-0.10.1-bin.tar.gz
tar -xzf druid-0.10.1-bin.tar.gz
cd druid-0.10.1
主要目录:
-
LICENSE
- the license files. -
bin/
- scripts useful for this quickstart. -
conf/*
- template configurations for a clustered setup. -
conf-quickstart/*
- configurations for this quickstart. -
extensions/*
- all Druid extensions. -
hadoop-dependencies/*
- Druid Hadoop dependencies. -
lib/*
- all included software packages for core Druid. -
quickstart/*
- files useful for this quickstart.
(2)Start up Zookeeper
启动ZK
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz
tar -xzf zookeeper-3.4.6.tar.gz
cd zookeeper-3.4.6
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
(3)Start up Druid services
启动Druid,Zookeeper running后,返回 druid-0.10.1目录,执行
bin/init
这会为我们建立目录如log和var,下面在不同的terminal windows中执行不同的进程
java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical
java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
如果需要CTRL-C 来结束(这里不需要)
如果需要重启,需要删掉var目录,然后重启bin/init
摄入数据
在druid-0.10.1目录下执行
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json localhost:8090/druid/indexer/v1/task
返回
{"task":"index_hadoop_wikiticker_2017-11-26T12:57:40.055Z"}
ingestion task console: http://localhost:8090/console.html
coordinator console http://localhost:8081/#/.
(4)查询数据
执行
curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty
返回
[html] view plaincopy
<embed id="ZeroClipboardMovie_1" src="http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf" loop="false" menu="false" quality="best" bgcolor="#ffffff" width="16" height="16" name="ZeroClipboardMovie_1" align="middle" allowscriptaccess="always" allowfullscreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" flashvars="id=1&width=16&height=16" wmode="transparent" style="box-sizing: border-box;">
- {"task":"index_hadoop_wikiticker_2017-11-18T16:07:55.681Z"}localhost:druid-0.10.-data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty
- [ {
- "timestamp" : "2015-09-12T00:46:58.771Z",
- "result" : [ {
- "edits" : 33,
- "page" : "Wikipedia:Vandalismusmeldung"
- }, {
- "edits" : 28,
- "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
- }, {
- "edits" : 27,
- "page" : "Jeremy Corbyn"
- }, {
- "edits" : 21,
- "page" : "Wikipedia:Administrators' noticeboard/Incidents"
- }, {
- "edits" : 20,
- "page" : "Flavia Pennetta"
- }, {
- "edits" : 18,
- "page" : "Total Drama Presents: The Ridonculous Race"
- }, {
- "edits" : 18,
- "page" : "User talk:Dudeperson176123"
- }, {
- "edits" : 18,
- "page" : "Wikipédia:Le Bistro/12 septembre 2015"
- }, {
- "edits" : 17,
- "page" : "Wikipedia:In the news/Candidates"
- }, {
- "edits" : 17,
- "page" : "Wikipedia:Requests for page protection"
- }, {
- "edits" : 16,
- "page" : "Utente:Giulio Mainardi/Sandbox"
- }, {
- "edits" : 16,
- "page" : "Wikipedia:Administrator intervention against vandalism"
- }, {
- "edits" : 15,
- "page" : "Anthony Martial"
- }, {
- "edits" : 13,
- "page" : "Template talk:Connected contributor"
- }, {
- "edits" : 12,
- "page" : "Chronologie de la Lorraine"
- }, {
- "edits" : 12,
- "page" : "Wikipedia:Files for deletion/2015 September 12"
- }, {
- "edits" : 12,
- "page" : "Гомосексуальный образ жизни"
- }, {
- "edits" : 11,
- "page" : "Constructive vote of no confidence"
- }, {
- "edits" : 11,
- "page" : "Homo naledi"
- }, {
- "edits" : 11,
- "page" : "Kim Davis (county clerk)"
- }, {
- "edits" : 11,
- "page" : "Vorlage:Revert-Statistik"
- }, {
- "edits" : 11,
- "page" : "Конституция Японской империи"
- }, {
- "edits" : 10,
- "page" : "The Naked Brothers Band (TV series)"
- }, {
- "edits" : 10,
- "page" : "User talk:Buster40004"
- }, {
- "edits" : 10,
- "page" : "User:Valmir144/sandbox"
- } ]
================================
数据加载方法
Loading Data
http://druid.io/docs/0.10.1/tutorials/ingestion.html
两种形式streaming (real-time) file-based (batch)
【1】HDFS文件
http://druid.io/docs/0.10.1/ingestion/batch-ingestion.html
【2】Kafka, Storm, Spark Streaming
利用Tranquility客户端 http://druid.io/docs/0.10.1/ingestion/stream-ingestion.html#stream-push
文件加载简单入门
Files-based
【1】加载本地磁盘文件:http://druid.io/docs/0.10.1/tutorials/tutorial-batch.html
【2】Streams-based
push data over HTTP:http://druid.io/docs/0.10.1/tutorials/tutorial-streams.html
【3】Kafka-based tutorial:http://druid.io/docs/0.10.1/tutorials/tutorial-kafka.html
例子1-加载本地磁盘文件
Loading from Files-Load your own batch data
【1】按照单机版下载并启动
http://druid.io/docs/0.10.1/tutorials/quickstart.html
【2】写ingestion规则
参考下载包中的 quickstart/wikiticker-index.json
要点:
(1)标识dataset,dataSource中dataSchema
(2)标识dataset的位置,inputSpec中的paths,多个文件用逗号分隔
(3)标识timestamp,timestampSpec的column
(4)标识dimensions ,dimensionsSpec的imensions(
(5)标识metrics,metricsSpec
(6)ranges,granularitySpec的intervals
如果数据无时间可以按照"2000-01-01T00:00:00.000Z"形式标识每一行
文件支持TSV, CSV, and JSON ,不支持嵌套JSON
JSON数据形式如下:
pageviews.json文件内容
{"time": "2015-09-01T00:00:00Z", "url": "/foo/bar", "user": "alice", "latencyMs": 32}
{"time": "2015-09-01T01:00:00Z", "url": "/", "user": "bob", "latencyMs": 11}
{"time": "2015-09-01T01:30:00Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}
主要保证每一行数据没有newline符号
如按下面写规则json,my-index-task.json
"dataSource": "pageviews"
"inputSpec": {
"type": "static",
"paths": "pageviews.json"
}
"timestampSpec": {
"format": "auto",
"column": "time"
}
"dimensionsSpec": {
"dimensions": ["url", "user"]
}
"metricsSpec": [
{"name": "views", "type": "count"},
{"name": "latencyMs", "type": "doubleSum", "fieldName": "latencyMs"}
]
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "day",
"queryGranularity": "none",
"intervals": ["2015-09-01/2015-09-02"]
}
【3】为了保障indexing task可以读到pageviews.json文件内容
(1)本地执行(不配置连接hadoop),将pageviews.json文件放在Druid root目录
(2)若连接hadoop,修改inputSpec中的paths
【4】执行
curl -X 'POST' -H 'Content-Type:application/json' -d @my-index-task.json OVERLORD_IP:8090/druid/indexer/v1/task
若本地执行用下面
curl -X 'POST' -H 'Content-Type:application/json' -d @my-index-task.json localhost:8090/druid/indexer/v1/task=
通过http://OVERLORD_IP:8090/druid/indexer/v1/task 查看indexing的进度
【4】查询数据
数据将在1到2分钟后可用,通过Coordinator console http://localhost:8081/#/. 查看
【5】查看数据
http://druid.io/docs/0.10.1/querying/querying.html
例子2-消费kafka数据
Tutorial: Load from Kafka
【1】下载启动kafka
curl -O http://www.us.apache.org/dist/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz
tar -xzf kafka_2.11-0.9.0.0.tgz
cd kafka_2.11-0.9.0.0
启动Kafka broker
./bin/kafka-server-start.sh config/server.properties
建立Kafka topic命名为metrics
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic metrics
【2】发送样例数据
Druid目录生成测试数据bin/generate-example-metrics
启动kafka的producer
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic metrics
将生成的数据贴到producer的终端中
【3】查询数据