测试结果:
{'review_number': '65 reviews', 'star': 5, 'title': 'EarPod', 'image': 'img/pic_0000_073a9256d9624c92a05dc680fc28865f.jpg', 'price': '$24.99'}
{'review_number': '12 reviews', 'star': 4, 'title': 'New Pocket', 'image': 'img/pic_0005_828148335519990171_c234285520ff.jpg', 'price': '$64.99'}
{'review_number': '31 reviews', 'star': 4, 'title': 'New sunglasses', 'image': 'img/pic_0006_949802399717918904_339a16e02268.jpg', 'price': '$74.99'}
{'review_number': '6 reviews', 'star': 3, 'title': 'Art Cup', 'image': 'img/pic_0008_975641865984412951_ade7a767cfc8.jpg', 'price': '$84.99'}
{'review_number': '18 reviews', 'star': 4, 'title': 'iphone gamepad', 'image': 'img/pic_0001_160243060888837960_1c3bcd26f5fe.jpg', 'price': '$94.99'}
{'review_number': '18 reviews', 'star': 4, 'title': 'Best Bed', 'image': 'img/pic_0002_556261037783915561_bf22b24b9e4e.jpg', 'price': '$214.5'}
{'review_number': '35 reviews', 'star': 4, 'title': 'iWatch', 'image': 'img/pic_0011_1032030741401174813_4e43d182fce7.jpg', 'price': '$500'}
{'review_number': '8 reviews', 'star': 4, 'title': 'Park tickets', 'image': 'img/pic_0010_1027323963916688311_09cc2d7648d9.jpg', 'price': '$15.5'}
使用代码:
from bs4
import BeautifulSoup
data=[]
path='/Users/lihai/Desktop/Plan-for-combating-master/week1/1_2/1_2answer_of_homework/1_2_homework_required/index.html'
with open(path,'r') as f:
Soup=BeautifulSoup(f.read(),'lxml')
pics=Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
price=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
titles=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
review=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
stars=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')
for pic,pri,title,rev,star in zip(pics,price,titles,review,stars):
info={
'price':pri.get_text(),
'image':pic.get('src'),
'title':title.get_text(),
'review_number':rev.get_text(),
'star':len(star.find_all("span", class_='glyphicon glyphicon-star'))
}
data.append(info)
for d in data:
print(d)
心得体会:
通过这次实验对HTML选择器有了更为深入的理解,同时复习了DOM树。我完成实验的过程是先自己思考然后试着写,想很久写不出来的时候回参考老师的代码,比如p:nth-of-type(2)是参考老师的代码,然后去W3C查看这个代码的意思是获取同类的第二标签,对比网页HTML结构星星是第二个P标签所以顿时恍然大悟。