beautiful soup 是一个很强大的软件,我们可以用他来爬取网站上的一些信息,我们以尤果网的图片为例,做了一个code
from bs4 import BeautifulSoup
import requests,shutil
for i in range(2,42):
c = requests.get('https://www.ugirls.com/Content/Page-{}.html'.format(i))
c_soup = BeautifulSoup(c.text)
for i in c_soup.findAll("a", {"class": "magazine_item_wrap"}):
girl_url = i['href']
cont_page = requests.get(girl_url)
cont_soup = BeautifulSoup(cont_page.text)
div = cont_soup.findAll("div", {"class": "yang auto"})
img = div[0].findAll('img')
for i in img[0:3]:
name=i['alt']
url = i ['src']
response = requests.get(url, stream=True)
with open('/home/ws/PycharmProjects/untitled/作业/code/picture/{}.jpg'.format(name), 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
- 首先需要导入BeautifulSoup,requests,shutil这三个模块,下面会使用到
- 因为该网站有42页,所以我们需要做一个小循环让他从第二页开始查询
with open('/home/ws/PycharmProjects/untitled/作业/code/picture/{}.jpg'.format(name), 'wb') as out_file: shutil.copyfileobj(response.raw, out_file)```
这儿选择一个储存路径,最好和你的代码运行的程序存在一块,否则会报错的
好了,基本代码就是这样,比较简单,不怎么会描述
关于beautifulsoup 的使用,可以去这个网站看一看,之后你应该会知道如何爬取照片了
https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html