Python Pandas 使用[ ]进行数据操作
本文将介绍Pandas中“[ ]”的一些相关操作,如进行数据选择及更改。
“[ ]” 应该是最基本的选择数据的方法,下面是可以向其中传入的类型:
- 可以直接传入column;
- 也可以传入column list;
- 使用切片;
- 使用布尔索引。
读入数据
import pandas as pd
import numpy as np
import seaborn as sns
df
dates = pd.date_range('1/1/2020', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('ABCD'))
df
out:
A B C D
2020-01-01 0.336131 -0.086456 0.096903 -1.230599
2020-01-02 -0.106293 0.111821 1.165342 -1.378462
2020-01-03 -0.933779 0.898738 0.013194 -0.593243
2020-01-04 0.190229 -1.108908 0.597650 2.759475
2020-01-05 -0.647080 1.573537 1.357191 -0.536916
2020-01-06 -0.455373 1.342904 -0.316548 0.145119
2020-01-07 -1.350214 -0.044642 0.501508 1.969973
2020-01-08 -0.474602 -0.384916 1.829222 0.853519
传入列表
传入列表,并以列表顺序读取,返回 DataFrame对象。
df[['C','D']]
C D
2020-01-01 0.096903 -1.230599
2020-01-02 1.165342 -1.378462
2020-01-03 0.013194 -0.593243
2020-01-04 0.597650 2.759475
2020-01-05 1.357191 -0.536916
2020-01-06 -0.316548 0.145119
2020-01-07 0.501508 1.969973
2020-01-08 1.829222 0.853519
传入单列
如果单独传入某一列,则返回series对象;如果传入列表,则返回DataFrame对象,即使列表的长度为1.
df['C']
out:
2020-01-01 0.096903
2020-01-02 1.165342
2020-01-03 0.013194
2020-01-04 0.597650
2020-01-05 1.357191
2020-01-06 -0.316548
2020-01-07 0.501508
2020-01-08 1.829222
Freq: D, Name: C, dtype: float64
df[['C']]
out:
2020-01-01 0.096903
2020-01-02 1.165342
2020-01-03 0.013194
2020-01-04 0.597650
2020-01-05 1.357191
2020-01-06 -0.316548
2020-01-07 0.501508
2020-01-08 1.829222
可以用来交换列值。
df[['A','B']] = df[['B','A']]
df
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 0.111821 -0.106293 1.165342 -1.378462
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
如下所示是另一种交换子集的方法。
df.loc[:, ['A', 'B']] = df[['B', 'A']]
df.loc[:, ['A', 'B']] = df[['B', 'A']]
df
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 0.111821 -0.106293 1.165342 -1.378462
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
上面的操作不会交换列值,交换列值需要使用值来交换。
df.loc[:, ['A', 'B']] = df[['B', 'A']].values
df
out:
A B C D
2020-01-01 0.336131 -0.086456 0.096903 -1.230599
2020-01-02 -0.106293 0.111821 1.165342 -1.378462
2020-01-03 -0.933779 0.898738 0.013194 -0.593243
2020-01-04 0.190229 -1.108908 0.597650 2.759475
2020-01-05 -0.647080 1.573537 1.357191 -0.536916
2020-01-06 -0.455373 1.342904 -0.316548 0.145119
2020-01-07 -1.350214 -0.044642 0.501508 1.969973
2020-01-08 -0.474602 -0.384916 1.829222 0.853519
使用to_numpy()也可以进行交换。
df.loc[:, ['A', 'B']] = df[['B', 'A']].to_numpy()
df
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 0.111821 -0.106293 1.165342 -1.378462
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
使用切片
获取前两行数据
df[:2]
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-02 1.000000 2.000000 5.000000 6.000000
设置步长
df[::2]
out:
A B C D
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
df[1::2]
out:
A B C D
2020-01-02 4.000000 5.000000 6.000000 7.000000
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
将数据逆序排列
df[::-1]
out:
A B C D
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-02 1.000000 2.000000 5.000000 6.000000
2020-01-01 -0.086456 0.336131 0.096903 -1.230599
使用切片进行赋值
df[:2] = np.arange(8).reshape(2,4)
df
out:
A B C D
2020-01-01 0.000000 1.000000 2.000000 3.000000
2020-01-02 4.000000 5.000000 6.000000 7.000000
2020-01-03 0.898738 -0.933779 0.013194 -0.593243
2020-01-04 -1.108908 0.190229 0.597650 2.759475
2020-01-05 1.573537 -0.647080 1.357191 -0.536916
2020-01-06 1.342904 -0.455373 -0.316548 0.145119
2020-01-07 -0.044642 -1.350214 0.501508 1.969973
2020-01-08 -0.384916 -0.474602 1.829222 0.853519
使用布尔索引
df = pd.DataFrame(np.random.randn(8,4),index=dates,columns=list('abcd'))
df
out:
a b c d
2020-01-01 -1.749988 -0.249398 -1.165277 -0.806687
2020-01-02 0.026334 0.158118 0.341183 -1.042534
2020-01-03 0.513027 -0.127235 -0.454433 -0.162600
2020-01-04 1.719313 -1.417885 0.267647 -0.960537
2020-01-05 -0.259797 -0.851702 -0.873451 -0.476420
2020-01-06 -0.048619 -0.690095 0.759120 1.184295
2020-01-07 -0.748535 -1.252718 0.386220 -0.415996
2020-01-08 -0.497471 -0.550428 -0.867333 -0.109223
mask = df['a'] > 0
mask
out:
2020-01-01 False
2020-01-02 True
2020-01-03 True
2020-01-04 True
2020-01-05 False
2020-01-06 False
2020-01-07 False
2020-01-08 False
Freq: D, Name: a, dtype: bool
df[mask]
out:
a b c d
2020-01-02 0.026334 0.158118 0.341183 -1.042534
2020-01-03 0.513027 -0.127235 -0.454433 -0.162600
2020-01-04 1.719313 -1.417885 0.267647 -0.960537
多条件
df[mask & mask2]
mask2 = df['b'] < 0
df[mask & mask2]
out:
a b c d
2020-01-03 0.513027 -0.127235 -0.454433 -0.162600
2020-01-04 1.719313 -1.417885 0.267647 -0.960537
使用布尔索引更改数据
df[mask & mask2] = np.arange(8).reshape(2,4)
df
out:
a b c d
2020-01-01 -1.749988 -0.249398 -1.165277 -0.806687
2020-01-02 0.026334 0.158118 0.341183 -1.042534
2020-01-03 0.000000 1.000000 2.000000 3.000000
2020-01-04 4.000000 5.000000 6.000000 7.000000
2020-01-05 -0.259797 -0.851702 -0.873451 -0.476420
2020-01-06 -0.048619 -0.690095 0.759120 1.184295
2020-01-07 -0.748535 -1.252718 0.386220 -0.415996
2020-01-08 -0.497471 -0.550428 -0.867333 -0.109223