CentOS7 安装爬虫框架Scrapy

1.安装依赖

[root@iZ2zegaforshlunfo6xw8qZ~]# yum -y groupinstall "Development tools"

[root@hadron ~]# yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel --skip-broken 


2.安装Python(略,可自行搜索教程)

3.安装Scrapy爬虫框架

[root@iZ2zegaforshlunfo6xw8qZ~]# pip3 install scrapy

Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/Collecting scrapy Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/9a/d3/5af102af577f57f706fcb302ea47d40e09355778488de904b3594d4e48d2/Scrapy-2.1.0-py2.py3-none-any.whl (239 kB) |████████████████████████████████| 239 kB 3.8 MB/s Collecting service-identity>=16.0.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/e9/7c/2195b890023e098f9618d43ebc337d83c8b38d414326685339eb024db2f6/service_identity-18.1.0-py2.py3-none-any.whl (11 kB)Collecting parsel>=1.5.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/23/1e/9b39d64cbab79d4362cdd7be7f5e9623d45c4a53b3f7522cd8210df52d8e/parsel-1.6.0-py2.py3-none-any.whl (13 kB)Collecting w3lib>=1.17.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a3/59/b6b14521090e7f42669cafdb84b0ab89301a42f1f1a82fcf5856661ea3a7/w3lib-1.22.0-py2.py3-none-any.whl (20 kB)Requirement already satisfied: lxml>=3.5.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (4.5.0)Collecting PyDispatcher>=2.0.5 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/cd/37/39aca520918ce1935bea9c356bcbb7ed7e52ad4e31bff9b943dfc8e7115b/PyDispatcher-2.0.5.tar.gz (34 kB)Collecting cssselect>=0.9.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB)Collecting zope.interface>=4.1.3 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/45/87/0d0c79724621056e39ac0385d0171fba3e92645b7947b143347aecf3069f/zope.interface-5.1.0-cp38-cp38-manylinux2010_x86_64.whl (243 kB) |████████████████████████████████| 243 kB 90.2 MB/s Requirement already satisfied: cryptography>=2.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (2.9.2)Collecting protego>=0.1.15 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/db/6e/bf6d5e4d7cf233b785719aaec2c38f027b9c2ed980a0015ec1a1cced4893/Protego-0.1.16.tar.gz (3.2 MB) |████████████████████████████████| 3.2 MB 34.6 MB/s Collecting Twisted>=17.9.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/4a/b4/4973c7ccb5be2ec0abc779b7d5f9d5f24b17b0349e23240cfc9dc3bd83cc/Twisted-20.3.0.tar.bz2 (3.1 MB) |████████████████████████████████| 3.1 MB 3.8 MB/s ERROR: Command errored out with exit status 1: command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-pzjcmemj/Twisted/setup.py'"'"'; __file__='"'"'/tmp/pip-install-pzjcmemj/Twisted/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-1wa67ju3 cwd: /tmp/pip-install-pzjcmemj/Twisted/ Complete output (33 lines): WARNING: The repository located at mirrors.cloud.aliyuncs.com is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host mirrors.cloud.aliyuncs.com'. ERROR: Could not find a version that satisfies the requirement incremental>=16.10.1 (from versions: none) ERROR: No matching distribution found for incremental>=16.10.1 Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/setuptools/installer.py", line 128, in fetch_build_egg subprocess.check_call(cmd) File "/usr/local/python3/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpn_8n87uq', '--quiet', '--index-url', 'http://mirrors.cloud.aliyuncs.com/pypi/simple/', 'incremental>=16.10.1']' returned non-zero exit status 1. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-pzjcmemj/Twisted/setup.py", line 20, in <module> setuptools.setup(**_setup["getSetupArgs"]()) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/__init__.py", line 143, in setup _install_setup_requires(attrs) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/__init__.py", line 138, in _install_setup_requires dist.fetch_build_eggs(dist.setup_requires) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/dist.py", line 695, in fetch_build_eggs resolved_dists = pkg_resources.working_set.resolve( File "/usr/local/python3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 781, in resolve dist = best[req.key] = env.best_match( File "/usr/local/python3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1066, in best_match return self.obtain(req, installer) File "/usr/local/python3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1078, in obtain return installer(requirement) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/dist.py", line 754, in fetch_build_egg return fetch_build_egg(self, req) File "/usr/local/python3/lib/python3.8/site-packages/setuptools/installer.py", line 130, in fetch_build_egg raise DistutilsError(str(e)) distutils.errors.DistutilsError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpn_8n87uq', '--quiet', '--index-url', 'http://mirrors.cloud.aliyuncs.com/pypi/simple/', 'incremental>=16.10.1']' returned non-zero exit status 1. ----------------------------------------ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.


其中加粗部分显示报错信息:

DistutilsError: Command '['/usr/bin/python3', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpn_8n87uq', '--quiet', '--index-url', 'http://mirrors.cloud.aliyuncs.com/pypi/simple/', 'incremental>=16.10.1']' returned non-zero exit status 1.

大概意思是说要求incremental>=16.10.1  但是系统检测到未满足条件,所以返回了status 1.解决办法是安装最新的incremental依赖:

[root@iZ2zegaforshlunfo6xw8qZ~]# pip3 install incremental

安装成功后再次运行sudo pip3 install scrapy命令:

[root@iZ2zegaforshlunfo6xw8qZ ~]# pip3 install scrapy

Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/Collecting scrapy Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/9a/d3/5af102af577f57f706fcb302ea47d40e09355778488de904b3594d4e48d2/Scrapy-2.1.0-py2.py3-none-any.whl (239 kB) |████████████████████████████████| 239 kB 4.3 MB/s Collecting queuelib>=1.4.2 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/4c/85/ae64e9145f39dd6d14f8af3fa809a270ef3729f3b90b3c0cf5aa242ab0d4/queuelib-1.5.0-py2.py3-none-any.whl (13 kB)Collecting service-identity>=16.0.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/e9/7c/2195b890023e098f9618d43ebc337d83c8b38d414326685339eb024db2f6/service_identity-18.1.0-py2.py3-none-any.whl (11 kB)Collecting w3lib>=1.17.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a3/59/b6b14521090e7f42669cafdb84b0ab89301a42f1f1a82fcf5856661ea3a7/w3lib-1.22.0-py2.py3-none-any.whl (20 kB)Collecting PyDispatcher>=2.0.5 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/cd/37/39aca520918ce1935bea9c356bcbb7ed7e52ad4e31bff9b943dfc8e7115b/PyDispatcher-2.0.5.tar.gz (34 kB)Collecting zope.interface>=4.1.3 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/45/87/0d0c79724621056e39ac0385d0171fba3e92645b7947b143347aecf3069f/zope.interface-5.1.0-cp38-cp38-manylinux2010_x86_64.whl (243 kB) |████████████████████████████████| 243 kB 8.8 MB/s Collecting protego>=0.1.15 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/db/6e/bf6d5e4d7cf233b785719aaec2c38f027b9c2ed980a0015ec1a1cced4893/Protego-0.1.16.tar.gz (3.2 MB) |████████████████████████████████| 3.2 MB 36.1 MB/s Requirement already satisfied: cryptography>=2.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (2.9.2)Requirement already satisfied: pyOpenSSL>=16.2.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (19.1.0)Collecting Twisted>=17.9.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/4a/b4/4973c7ccb5be2ec0abc779b7d5f9d5f24b17b0349e23240cfc9dc3bd83cc/Twisted-20.3.0.tar.bz2 (3.1 MB) |████████████████████████████████| 3.1 MB 4.4 MB/s Requirement already satisfied: lxml>=3.5.0 in /usr/local/python3/lib/python3.8/site-packages (from scrapy) (4.5.0)Collecting cssselect>=0.9.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB)Collecting parsel>=1.5.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/23/1e/9b39d64cbab79d4362cdd7be7f5e9623d45c4a53b3f7522cd8210df52d8e/parsel-1.6.0-py2.py3-none-any.whl (13 kB)Collecting pyasn1-modules Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/95/de/214830a981892a3e286c3794f41ae67a4495df1108c3da8a9f62159b9a9d/pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB) |████████████████████████████████| 155 kB 8.7 MB/s Collecting attrs>=16.0.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a2/db/4313ab3be961f7a763066401fb77f7748373b6094076ae2bda2806988af6/attrs-19.3.0-py2.py3-none-any.whl (39 kB)Collecting pyasn1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/62/1e/a94a8d635fa3ce4cfc7f506003548d0a2447ae76fd5ca53932970fe3053f/pyasn1-0.4.8-py2.py3-none-any.whl (77 kB) |████████████████████████████████| 77 kB 78.2 MB/s Requirement already satisfied: six>=1.4.1 in /usr/local/python3/lib/python3.8/site-packages (from w3lib>=1.17.0->scrapy) (1.14.0)Requirement already satisfied: setuptools in /usr/local/python3/lib/python3.8/site-packages (from zope.interface>=4.1.3->scrapy) (47.1.1)Requirement already satisfied: cffi!=1.11.3,>=1.8 in /usr/local/python3/lib/python3.8/site-packages (from cryptography>=2.0->scrapy) (1.14.0)Collecting constantly>=15.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/b9/65/48c1909d0c0aeae6c10213340ce682db01b48ea900a7d9fce7a7910ff318/constantly-15.1.0-py2.py3-none-any.whl (7.9 kB)Requirement already satisfied: incremental>=16.10.1 in /usr/local/python3/lib/python3.8/site-packages (from Twisted>=17.9.0->scrapy) (17.5.0)Collecting Automat>=0.3.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/dd/83/5f6f3c1a562674d65efc320257bdc0873ec53147835aeef7762fe7585273/Automat-20.2.0-py2.py3-none-any.whl (31 kB)Collecting hyperlink>=17.1.1 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/7f/91/e916ca10a2de1cb7101a9b24da546fb90ee14629e23160086cf3361c4fb8/hyperlink-19.0.0-py2.py3-none-any.whl (38 kB)Collecting PyHamcrest!=1.10.0,>=1.9.0 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/40/16/e54cc65891f01cb62893540f44ffd3e8dab0a22443e1b438f1a9f5574bee/PyHamcrest-2.0.2-py3-none-any.whl (52 kB) |████████████████████████████████| 52 kB 34.0 MB/s Requirement already satisfied: pycparser in /usr/local/python3/lib/python3.8/site-packages (from cffi!=1.11.3,>=1.8->cryptography>=2.0->scrapy) (2.20)Requirement already satisfied: idna>=2.5 in /usr/local/python3/lib/python3.8/site-packages (from hyperlink>=17.1.1->Twisted>=17.9.0->scrapy) (2.9)Building wheels for collected packages: PyDispatcher, protego, Twisted Building wheel for PyDispatcher (setup.py) ... done Created wheel for PyDispatcher: filename=PyDispatcher-2.0.5-py3-none-any.whl size=11515 sha256=8e02fc1fe7a7c370afdb7a9ca1444165cd92e91eea545f280a31a3a094a1dcde Stored in directory: /root/.cache/pip/wheels/f4/8a/2f/3888c02609d5e31c3ce52d11e865ced1e67e2b7ad196145414 Building wheel for protego (setup.py) ... done Created wheel for protego: filename=Protego-0.1.16-py3-none-any.whl size=7765 sha256=ea5d8dd4472010aeb8f67012c31223234721c77b90db8168bfb48c5017f4aecb Stored in directory: /root/.cache/pip/wheels/7b/e4/16/a1b04c3547b913e8898894d84c92efe23ed1b52db62cfaf2e9 Building wheel for Twisted (setup.py) ... done Created wheel for Twisted: filename=Twisted-20.3.0-cp38-cp38-linux_x86_64.whl size=3076155 sha256=7e2cd1aae813872858c2dd344bb0a92e41289a90e0b03f98315d2cec879c9f42 Stored in directory: /root/.cache/pip/wheels/03/e1/89/0c492632a418a54778123a939e3cac6719e7a93795661175a1Successfully built PyDispatcher protego TwistedInstalling collected packages: queuelib, pyasn1, pyasn1-modules, attrs, service-identity, w3lib, PyDispatcher, zope.interface, protego, constantly, Automat, hyperlink, PyHamcrest, Twisted, cssselect, parsel, scrapySuccessfully installed Automat-20.2.0 PyDispatcher-2.0.5 PyHamcrest-2.0.2 Twisted-20.3.0 attrs-19.3.0 constantly-15.1.0 cssselect-1.1.0 hyperlink-19.0.0 parsel-1.6.0 protego-0.1.16 pyasn1-0.4.8 pyasn1-modules-0.2.8 queuelib-1.5.0 scrapy-2.1.0 service-identity-18.1.0 w3lib-1.22.0 zope.interface-5.1.0

4.查看Scrapy的安装位置

[root@iZ2zegaforshlunfo6xw8qZ ~]# whereis scrapy

scrapy: /usr/local/python3/bin/scrapy

4.验证scrapy的版本信息

[root@iZ2zegaforshlunfo6xw8qZ ~]# /usr/local/python3/bin/scrapy -v

Scrapy 2.1.0 - no active projectUsage: scrapy <command> [options] [args]Available commands: bench Run quick benchmark test fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command

每次使用scrapy都需要 在/usr/local/python3/bin/scrapy目录下使用,直接使用scrapy命令的解决方法:

(1)在环境变量中添加scrapy的路径

vi /etc/profile

将下面的代码添加到最后一行

export SCRAPY_HOME=/usr/local/python3/

export PATH=$PATH:$SCRAPY_HOME/bin

或者直接使用:

export PATH=$PATH:/usr/local/python3/bin

然后执行resource /etc/profile,是修改生效

(2)建立软连接

[root@iZ2zegaforshlunfo6xw8qZ ~]# ln -s /usr/local/python3/bin/scrapy /usr/bin/scrapy

在控制台输入命令:scrapy -v .:

[root@iZ2zegaforshlunfo6xw8qZ ~]# scrapy -v

Scrapy 2.1.0 - no active projectUsage: scrapy <command> [options] [args]Available commands: bench Run quick benchmark test fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command

展示效果如上,表示已经安装成功,赶快开启你的爬虫之旅吧 !

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,088评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,715评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,361评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,099评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 60,987评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,063评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,486评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,175评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,440评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,518评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,305评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,190评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,550评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,880评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,152评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,451评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,637评论 2 335