环境:
[root@test spark]# uname -a
Linux test 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@test spark]# cat /etc/issue
CentOS release 6.5 (Final)
[root@test ~]# ls
jdk-7u79-linux-x64.tar.gz spark-1.6.0-bin-hadoop2.6.tgz
这里我假设你已经安装并且配置好了运行spark的环境,本文只记录官网教程给出的Spark Streaming 的WordCount程序的一个python版本。
进入安装好的spark目录中,这里我是
cd /usr/local/spark
在examples/src/main/python/streaming/
下我们能看到各种数据接入方式的示例,这里我使用的是network_wordcount.py
(因为这个看起来使用方法很easy)
官网也给了例子的使用方法
"""
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: network_wordcount.py <hostname> <port>
<hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and then run the example
$ bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999
"""
即:我们首先要安装nc(netcat)这个东西
- 下载netcat安装包
wget http://sourceforge.net/projects/netcat/files/netcat/0.7.1/netcat-0.7.1-1.i386.rpm
- 执行安装: rpm -ihv netcat-0.7.1-1.i386.rpm
这里报了如下错误:
rpm -ihv netcat-0.7.1-1.i386.rpm
warning: netcat-0.7.1-1.i386.rpm: Header V3 DSA/SHA1 Signature, key ID b2d79fc1: NOKEY
error: Failed dependencies:
libc.so.6 is needed by netcat-0.7.1-1.i386
libc.so.6(GLIBC_2.0) is needed by netcat-0.7.1-1.i386
libc.so.6(GLIBC_2.1) is needed by netcat-0.7.1-1.i386
libc.so.6(GLIBC_2.3) is needed by netcat-0.7.1-1.i386
- 解决依赖包问题
[root@test streaming]# yum list glibc*
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* epel: ftp.cuhk.edu.hk
* extras: mirrors.aliyun.com
* rpmforge: ftp.neowiz.com
* updates: mirrors.aliyun.com
Installed Packages
glibc.i686 2.12-1.192.el6 @base
glibc.x86_64 2.12-1.192.el6 @base
glibc-common.x86_64 2.12-1.192.el6 @base
glibc-devel.x86_64 2.12-1.192.el6 @base
glibc-headers.x86_64 2.12-1.192.el6 @base
glibc-static.x86_64 2.12-1.192.el6 @base
glibc-utils.x86_64 2.12-1.192.el6 @base
Available Packages
glibc-devel.i686 2.12-1.192.el6 base
glibc-static.i686 2.12-1.192.el6 base
- 安装依赖包:
yum install glibc.i686
- 再次执行安装:
rpm -ihv netcat-0.7.1-1.i386.rpm
warning: netcat-0.7.1-1.i386.rpm: Header V3 DSA/SHA1 Signature, key ID b2d79fc1: NOKEY
Preparing... ########################################### [100%]
1:netcat ########################################### [100%]
安装成功
- 执行指令
nc -lk 9999
提示
nc: invalid option -- 'k'
Try `nc --help' for more information.
网上搜了一下解决办法http://unix.stackexchange.com/questions/193579/nc-commands-k-option
S O L V E D The consultant installed netcat so I uninstalled netcat and then nc was not working. So I also removed and reinstalled nc again. Now -k option is working now Thanks for your helps – Murat Apr 1 '15 at 10:03
意思就是卸载了再重新安装一遍,貌似是被netcat的一种指令装重复了。
- 解决netcat问题
[root@test ~]# yum remove netcat
Loaded plugins: fastestmirror
Setting up Remove Process
Resolving Dependencies
--> Running transaction check
---> Package netcat.i386 0:0.7.1-1 will be erased
--> Finished Dependency Resolution
重新安装:(这里要注意使用依赖包的名称是nc
)
[root@test ~]# yum install nc
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* epel: mirror.premi.st
* extras: mirrors.aliyun.com
* rpmforge: ftp.neowiz.com
* updates: mirrors.aliyun.com
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package nc.x86_64 0:1.84-24.el6 will be installed
--> Finished Dependency Resolution
- 执行程序
新建一个命令行窗口执行以下指令:
[root@test spark]# nc -lk 9999
在刚才的窗口执行指令(还是在spark主目录下):
[root@test spark]# bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999
- 测试输出
在nc 那端的窗口输入:
hello nihao my name is xzp hello world!
spark程序显示:
-------------------------------------------
Time: 2016-07-20 11:56:41
-------------------------------------------
(u'my', 1)
(u'is', 1)
(u'nihao', 1)
(u'world!', 1)
(u'xzp', 1)
(u'name', 1)
(u'hello', 2)
-------------------------------------------
Time: 2016-07-20 11:56:42
-------------------------------------------
-------------------------------------------
Time: 2016-07-20 11:56:43
-------------------------------------------
整个流程到这里就结束拉,接下来就是根据业务逻辑自己更改官方实例了,因为我司是通过RESTAPI方式调用从而获取数据,所以接下来的数据接口就会改成调用RESTAPI版本