环境win10-64+python3.5+scrapy1.2
a
from scrapy.linkextractors.sgml import SgmlLinkExtractor as sle
需要改为
from scrapy.linkextractors.lxmlhtml import LxmlLinkExtractor as sle
因为sgmllib在python3中已经被删除了
b
import urlparse
要改为
import urllib.parse as urlparse
因为在python3中重构了
c
urllib.urlencode改为urllib.parse.urlencode
d
import httplib
要改为
import http.client as httplib
e
如果scrapy crawl xxx运行爬虫后出现结果报错:
ImportError: cannot import name '_win32stdio'
ImportError: No module named 'win32api'
因为twisted需要安装依赖模块pywin32
pip install pypiwin32
最后完美运行