urllib.request 模块定义了一些类及方法,用于帮助我们访问URL
urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False)
方法是用来打开url的方法,其中url可以是一个合法的url字符串,或者是一个request对象;data必须是字节数据类型的。详细介绍可参见python3官方文档urllib.request。
关于数据提交的两种方式get、post的区别,在此就不再赘述,下面给出两种提交方式的例子:
Get:
- 模拟浏览器发送无参数GET请求
from urllib import request
req = request.Request('http://www.douban.com/')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with request.urlopen(req) as f:
print('Status:', f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
print('Data:', f.read().decode('utf-8'))
- 模拟浏览器发送有参数GET请求
import urllib.parse
import urllib.request
#urlencode可以把key-value这样的键值对转换成我们想要的格式,返回的是a=1&b=2这样的字符串
#百度搜索的页面的请求为'http://www.baidu.com/s?wd=',wd为请求搜索的内容
#urlencode遇到中文会自动进行编码转化
#一个参数时可以采用'http://www.baidu.com/s?wd='+keywd的格式,
# 但是当keywd为中文的时候需要用urllib.request.quote(keywd)进行编码转换
data = urllib.parse.urlencode({'wd': '听城', 'password': '123'})
print(data)
response = urllib.request.urlopen('http://www.baidu.com/s?%s' % data)
html = response.read()
# print(html.decode('utf-8'))
file = open('D:/1.html','wb')
file.write(html)
file.close()
Post:
import urllib.parse
import urllib.request
url = 'http://127.0.0.1:8080/test/index.jsp'
values = {
'name': 'abc',
'password': '123'
}
data = urllib.parse.urlencode(values)
# that params output from urlencode is encoded to bytes before it is sent to urlopen as data
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
response = urllib.request.urlopen(req)
html = response.read()
print(html.decode('utf-8'))