6-1 如何读写csv数据
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">实际案例</mark>:我们可以通过雅虎网站获取中国股市(深市(数据集,它以csv的格式存储
http://table.finance.yahoo.com/table.csv?s=000001.sz
数据存储格式:
Date, open, Hige, Low, Close, Volume, Adj, Close
2016-06-30, 8.69, 8.74, 8.66, 8.70, 36220400, 8.70
请将平安银行这只股票,在2016年中成交量超过50000000的记录存储到另外一个csv文件中。
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">解决方案</mark>
使用标准库中的csv模块,可以使用其中的reader 和writer 完成csv文件的读写。
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n11" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import urllib import urlretrieve
urlretrieve('http://table.finance.yahoo.com/table.csv?s=000001.sz','pingan.csv') #下载文件,第一个参数为url,第二个参数为保存的文件名
cat pingan.csv | less #查看文件
import csv
rf = open('pingan.csv','rb') #打开文件,’b'表示使用二进制形式打开
reader=csv.reader(rf) # 读入文件,为迭代器
reader.next() #迭代器只能用next进行迭代,不能用for迭代
wf = open('pingan_copy.csv', 'wb')
writer=csv.writer(wf) # 创建一个writer的写
writer.writerow(['Date', 'open', 'Hige', 'Low', 'Close', 'Volume', 'Adj', 'Close']) # writerow可以一行一行写入
writer.writerow(reader.next()) #直接将迭代的内容写入
wf.flush() #目的使得能看到里面的内容,即缓冲完成
cat pingan_copy.csv | less
</pre>
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n12" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import csv
with open('pingan.csv','rb') as rf:
reader = csv.reader(rf) #创建reader
with open('pingan.csv', 'wb') as wf:
wtiter= csv.writer(wf) #创建writer
headers=reader.next() #将表头读取
writer.writerow(headers) #将表头信息写到文件中
for row in reader: 对reader进行迭代
if row[0] < '2016-01-01':
break
if int(row[5]) >= 50000000:
writer.writerow(row)
print('end')
</pre>
6-2 如何读写json数据?
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">实际案例</mark>
在web应用中肠用JSON(javascript object Notation)格式传输数据,例如我们利用百度语音识别服务器做语音识别,将本地音频数据post到百度的语音识别服务器,服务器响应结果为json字符串。
{‘corpus_no':'737419413412412414','err_msg': 'success', 'err_no': '0', 'result':['你好’,‘’],'sn' : '3275935789476025'}
下python中如何读写json数据,如何转换成python对象
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">解决方案</mark> 使用标准库中的json模块,其中loads, dumps函数可以完成json数据的读写
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n20" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import json
json下有一个方法,叫做dumps,可以将python对象转换成json字符串
l =[1,2,'abe', {'name':'Bob', 'age': 13}]
json.dumps(l) #转换成一个json对象
d= {'b': None, 'a': 'S', 'c': 'abc'}
json.dumps(d) #在json字符串中,None变成了null字符串
dumps参数的调整
json.dumps(l, separators=[',', ':']) #默认的分隔符是‘ ,’就是逗号前面还有一个空格,我们设置为只有逗号。
对输出的字段中的键进行排序:
json.dumps(d, sort_keys=True)
json 对象转换成python对象:
l2=json.loads([1,2,'abe', {'name':'Bob', 'age': 13}])
l2[0] #可以进行索引
d2= json.loads('{'b': Nono, 'a': S, 'c': 'abc'}')
d2['a'] #想字典一样进行索引
load 和loads, dump 和dumps的功能是抑制的,但是没有加s的接口是文件。
with open('demo,json','wb') as f:
json.dump(l,f) #运行结果就是将l以json的形式写入到文件中。
with open('demo.py', 'wb') as f:
json.load('{'b': Nono, 'a': S, 'c': 'abc'}', f) #将这个json对象写入到f文件中。
</pre>
6-3 如何解析简单的xml文档
xml是一种非常常用的标记性语言, 可以提供统一的方法来描述应用程序的结构化数据:
<?xml version="1.0"?>
<data>
<country name = "Liechtenstein">
<rank updated = "yes" >2</rank>
<year> 2008 </year>
<gdppc> 141100 </gdppc>
<neighbor name = "Austria" direction ="E"/>
<neighbor name = "Switzerland" direction = "W"/>
</country>
</data>
python中如何解析xml文档
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">解决方案</mark> :使用标准库中的xml.etree.ElementTree, 其中的parse 函数可以解析xml文档。
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n37" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from xml.etree.ElementTree import parse
f = open ('demo.xml')
et = parse(f)
root = et.getroot()
root.tag #查看标签
root.attrib #查看属性、
root.text #
root.text.strip()
root.getchildren()
for child in root:
print child.get('name')
root.find('country') #根据标签来寻找对象
root.findall('country') #找到全部标签为country的对象
也可以获得可迭代对象:
root.iterfind('country') #得到生成器对象
for e in root.iterfind('country'):print e.get('name')
只能找当前的直接子元素。
root.iter() #列出当前节点下全部的子元素
list(root.iter())
list(root.iter('rank'))
list(root.iter('neighbor'))
root.findall('country/*') #找它的孙子节点,匹配country下的所有节点
root.findall('.//rank') #无论rank在哪个层次下,都要找到
root.findall('.//rank/..') #在任意层次下找到rank,然后在找回它的父亲。
root.findall('country[@name]') #只有存在的属性才会被输出,没有这个属性的话就不会有值输出
root.findall('country[@name="Liechtenstein"]')
root.findall('country[rank]') #找到一个元素,其中必须包含某个tag
root.findall('country[rank="2"]') #找到指定的元素,其中rank中包含指定的文本。
root.findall('country[1]') #如果同样的子元素有多个的话,可用序号来进行引索
root.findall('country[last()]') #找到的是最后一个
root.findall('country[last()-1]') #找倒数第二个
</pre>
6-4如何构建xml文档
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">实际案例</mark>某些时候,我们需要将其他格式数据转换成xml,例如,我们需要把平安股票csv文件,转换成相应的xml:
pingan.csv
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" contenteditable="true" cid="n42" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Date, Open, Hige, Low, Close, Volume, AdjClose
2016-06-30, 8.69, 8.74, 8.66, 8.70, 36220400, 8.70</pre>
pingan.xml
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="xml" contenteditable="true" cid="n44" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"><Data>
<Row>
<Data> 2016-06-30</Data>
<Open> 8.69 </Open>
<High> 8.74</High>
<Low> 8.66 </Low>
<Close> 8.70 </Close>
<Volume> 36220400 </Volume>
<AdjClose> 8.70</AdjClose>
</Row>
</Data></pre>
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">解决方案</mark> 使用标准库中的xml.etree.ElementTree,构建ElementTree,使用write方法写入文件
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n46" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from xml.etree.ElementTree import Element,ElementTree
e=Element('Data') #标签
e.tag #查看标签
e.set('name','abc') #设置属性
e.get('name') #查看属性
from xml.etree.ElementTree import tostring
tostring(e) #检查这个元素及其属性
e.text='123' #为这个属性赋值
e.append() #添加子元素
e2=Element('Row')
e3=Element('Open')
e3.text='8.69' #为这个字符串赋值
e2.append(e3) #将e3作为e2的子元素
tostring(e2) #检查e2
e.text=None #将刚才的data的子元素去掉
e.append(e2)
tostring(e)
et=ElemenTree(e) #将e作为一个根节点
et.write('demo.xml')</pre>
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n47" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import csv
from xml.etree.ElementTree import Element,ElementTree
下面的函数可以用来调整格式
def pretty (e, level=0):
if len(e) > 0:
e.text = '\n' + '\t' * (level +1)
for child in e:
pretty(child, level +1)
child.tail = child.tail[:-1]
e.tail = "\n" + "\t" * level
def csvToxml(fname):
with open(fname,'rb') as f:
reader=csv.reader(f)
header = reader.next()
root = Element('Data')
for row in reader:
eRow = Element('Row')
root.append(eRow)
for tag, text in zip(headers, row):
e = Element(tag)
e.text=text
eRow.append(e)
pretty(root)
return ElementTree(root)
et = csvToxml('pingan.csv')
et.write('pingan.xml') #格式不美观
</pre>
6-5 如何读写excel文件
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">实际案例</mark>
Microsoft Excel 是日常办公中使用得最频繁的软件,其数据格式为xls,xlsx,一种非常常用的电子表格。
某小学某班的成绩,记录在excel文件中:
姓名 0,0 | 语文0,1 | 数学0,2 | 外语 0,3 |
---|---|---|---|
李雷1.0 | 95 1,1 | 99 1,2 | 96 1,3 |
韩梅2.0 | 98 2,1 | 100 2,2 | 93 2,3 |
张峰3.0 | 94 3,1 | 95 3,2 | 95 3,3 |
利用python读取这个列表,并添加列为总分,并计算每个人的总分
<mark style="box-sizing: border-box; background: rgb(255, 255, 0); color: rgb(0, 0, 0);">解决方案</mark> 使用pip安装=》$pip install xlrd xlwt
使用第三方库xlrd 和 xlwt, 这两个库分别用于excel 的读和写
使用xlrd来读excel文件
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n77" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import xlrd
book=xlrd.open_workbook('demo.xlsx')
通过book 访问sheet
book.sheets() #返回一个列表,里面是所有的表
sheet = book.sheet_by_index(0) #这样的seet对应的是当前的表
sheet.nrows #访问所获取的列表的行数
sheet.ncols #访问所获取的列表的列数
cell=sheet.cell(0,0) #传入对应单元格的内容
cell 的属性:cell.ctype, cell.dump, cell.value, cell.xf_index
cell.ctype #返回cell的类型,所得到的类型可以使用xrld.XL 来查看参数,这里返回
xrld.XL_CELL_NUMBER #1
xrld.XL_CELL_TEXT #2
cell.value #返回的是Unicode
print cell.value #能够得到返回的值
cell=sheet.cell(1,1)
cell.ctype
cell.value
一次获取一列或者一行
sheet.row(1)
sheet.row_values(1) #直接返回第一行的值
sheet.row_values(rowx, start_colx=0, end_clox:None) 第一个参数是返回的行数,第二个参数是从那一列开始,第三个参数是终止的列
sheet.row_values(1,1) # 打印一行从第一个参数开始
sheet.col(1)
sheet.col_values() #方法和上面的sheet.row_values一样
sheet.put_cell(row, colx, ctype, value, xf_index)</pre>
使用 xlwt来写excel文件
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n79" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import xlwt
wbook=xlwt.Workbook() #创建一个工作簿
wsheet=wbook.add_sheet('sheet1') #添加一个工作表
wsheet.write(r,c, label='') # 一个单元格一个单元格写入内容
wbook.save('output.xlsx') #保存表格
</pre>
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="python" contenteditable="true" cid="n81" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-radius: 3px; padding: 8px 1em 6px; margin-bottom: 15px; margin-top: 15px; width: inherit; color: rgb(51, 51, 51); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#coding:utf8
import xlrd, xlwt
rbook = xlrd.open_workbook('demo.xlsx')
rsheet = rbook.sheet_by_index(0)
nc=rsheet.ncols
rsheet.put_cell(0,nc,xlrd.XL_CELL_TEXT,u'总分',None)
for row in xrange(1,rsheet.nrows):
t=sum(rsheet.row_values(row,1))
rsheet.put_cell(row,nc,xlrd.XL_CELL_NUMBER, t, None)
wbook= xlwt.Workbook()
wsheet = wbook.add_sheet(rsheet.name)
style = xlwt.easyxf('align: vertical center, horizontal center')
for r in xrange(rsheet.nrows):
for c in xrange(rsheet.ncols):
wsheet.write(r,c,rsheet.cell_value(r,c),style)
wbook.save('output.xlsx')</pre>