安装对应版本
selenium==2.48.0
beautifulsoup4==4.7.1
pip安装
pip3 install selenium==2.48.0
pip3 install sqlite3, beautifulsoup4,selenium
pip3 install lxml / pip3 install html5lib
Pip3 install PyExecJS
获取chromedriver
方法一 Mac安装
brew install chromedriver
方法二:
https://npm.taobao.org/mirrors/chromedriver/ 下载地址
找对应的chrome版本-下载chromedriver包,解压后,放入到/usr/local/bin
提升权限:
sudo chmod u+x,o+x /usr/local/bin/chromedriver
phantomjs安装(和chromedriver二选一)
方法一
使用官网:http://phantomjs.org/download.html
方法二
sudo npm install -g phantomjs-prebuilt
方法三
brew update && brew install phantomjs
selenium使用说明
八种单数形式
1.id定位:find_element_by_id(self, id_)
2.name定位:find_element_by_name(self, name)
3.class定位:find_element_by_class_name(self, name)
4.tag定位:find_element_by_tag_name(self, name)
5.link定位:find_element_by_link_text(self, link_text)
6.partial_link定位find_element_by_partial_link_text(self, link_text)
7.xpath定位:find_element_by_xpath(self, xpath)
8.css定位:find_element_by_css_selector(self, css_selector)
八种复数形式
9.id复数定位find_elements_by_id(self, id_)
10.name复数定位find_elements_by_name(self, name)
11.class复数定位find_elements_by_class_name(self, name)
12.tag复数定位find_elements_by_tag_name(self, name)
13.link复数定位find_elements_by_link_text(self, text)
14.partial_link复数定位find_elements_by_partial_link_text(self, link_text)
15.xpath复数定位find_elements_by_xpath(self, xpath)
16.css复数定位find_elements_by_css_selector(self, css_selector)
综合用例
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
link = 'http://www.baidu.com'
browser = webdriver.PhantomJS() # 使用PhantomJS
# browser = webdriver.Chrome() #使用chromedriver
browser.get(link)
browser.encoding="utf-8"
html_doc = browser.page_source
soup=BeautifulSoup(html_doc,'lxml')
soupArr=soup.select( '[style="text-decoration:none;"]' )
yuanwen_list=soup.find_all("div", "contson")[0]
......