python小爬虫
python这门胶水语言, 已经是趋势了,使用范围太广,用它做爬虫比Java和OC方便太多, 几行代码就搞定了. 开发里面用它来做代码混淆也很方便.
爬取对象---斗图网
思路就三步骤:
- 确定url
- 发起请求.
- 获取图片保存
import requests
import re
# 1. 确定url
url = 'https://www.doutula.com/photo/list/'
header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
# 2. 发起请求
response = requests.get(url,headers = header).text
# 3.获取图片保存
reg = r'data-original="(.*?)"'
image_urls = re.findall(reg, response)
for image_url in image_urls:
image_name = image_url.split('/')[-1][:-4]
print(image_name)
image = requests.get(image_url,headers = header).content
with open("./images/%s.jpg" % image_name,"wb") as file:
file.write(image)