.loc[]
.loc主要是基于标签的,但也可以与布尔数组一起使用。
可以输入如下几种类型:
- 单个标签,例如5或'a';
- 列表或标签数组。['a', 'b', 'c']
- 带标签的切片对象'a':'f';
- 布尔数组
- 函数。
import pandas as pd
import numpy as np
import seaborn as sns
iris = pd.read_csv('iris.csv',header=0).sample(10)
iris
out:
sepal_length sepal_width petal_length petal_width species
11 4.8 3.4 1.6 0.2 setosa
106 4.9 2.5 4.5 1.7 virginica
14 5.8 4.0 1.2 0.2 setosa
61 5.9 3.0 4.2 1.5 versicolor
138 6.0 3.0 4.8 1.8 virginica
132 6.4 2.8 5.6 2.2 virginica
97 6.2 2.9 4.3 1.3 versicolor
119 6.0 2.2 5.0 1.5 virginica
31 5.4 3.4 1.5 0.4 setosa
19 5.1 3.8 1.5 0.3 setosa
iris.index = list('abcdefghij')
iris
out:
sepal_length sepal_width petal_length petal_width species
a 5.6 2.5 3.9 1.1 versicolor
b 6.0 3.0 4.8 1.8 virginica
c 7.2 3.6 6.1 2.5 virginica
d 5.4 3.7 1.5 0.2 setosa
e 6.6 3.0 4.4 1.4 versicolor
f 6.4 2.8 5.6 2.1 virginica
g 4.8 3.4 1.9 0.2 setosa
h 5.7 2.9 4.2 1.3 versicolor
i 6.1 3.0 4.9 1.8 virginica
j 6.5 3.2 5.1 2.0 virginica
Series
species = iris.species.copy()
species.loc['b']
out:
'virginica'
species.loc['c':'e']
out:
c virginica
d setosa
e versicolor
Name: species, dtype: object
species.loc['h':]
h versicolor
i virginica
j virginica
Name: species, dtype: object
DataFrame
直接通过标签访问
iris.loc[['a','c','d'], :]
out:
sepal_length sepal_width petal_length petal_width species
a 5.6 2.5 3.9 1.1 versicolor
c 7.2 3.6 6.1 2.5 virginica
d 5.4 3.7 1.5 0.2 setosa
通过标签切片访问
iris.loc['b':'f', 'sepal_length':'petal_length']
out:
sepal_length sepal_width petal_length
b 6.0 3.0 4.8
c 7.2 3.6 6.1
d 5.4 3.7 1.5
e 6.6 3.0 4.4
f 6.4 2.8 5.6
使用单个标签
iris.loc['d']
out:
sepal_length 5.4
sepal_width 3.7
petal_length 1.5
petal_width 0.2
species setosa
Name: d, dtype: object
使用布尔数组
iris.loc[iris.sepal_length > iris.sepal_length.mean()]
out:
sepal_length sepal_width petal_length petal_width species
c 7.2 3.6 6.1 2.5 virginica
e 6.6 3.0 4.4 1.4 versicolor
f 6.4 2.8 5.6 2.1 virginica
i 6.1 3.0 4.9 1.8 virginica
j 6.5 3.2 5.1 2.0 virginica
iris.index = np.random.randint(0,10,10)
iris
out:
sepal_length sepal_width petal_length petal_width species
8 5.6 2.5 3.9 1.1 versicolor
5 6.0 3.0 4.8 1.8 virginica
9 7.2 3.6 6.1 2.5 virginica
4 5.4 3.7 1.5 0.2 setosa
2 6.6 3.0 4.4 1.4 versicolor
0 6.4 2.8 5.6 2.1 virginica
3 4.8 3.4 1.9 0.2 setosa
7 5.7 2.9 4.2 1.3 versicolor
3 6.1 3.0 4.9 1.8 virginica
5 6.5 3.2 5.1 2.0 virginica
使用.loc切片时,如果索引中存在开始和停止标签,则返回位于两者之间的元素(包括它们):
iris.loc[9:2]
sepal_length sepal_width petal_length petal_width species
9 7.2 3.6 6.1 2.5 virginica
4 5.4 3.7 1.5 0.2 setosa
2 6.6 3.0 4.4 1.4 versicolor
如果两个中至少有一个不存在,但索引已排序,并且可以与开始和停止标签进行比较,那么通过选择在两者之间排名的标签,切片仍将按预期工作:
iris.sort_index()
sepal_length sepal_width petal_length petal_width species
0 6.4 2.8 5.6 2.1 virginica
2 6.6 3.0 4.4 1.4 versicolor
3 4.8 3.4 1.9 0.2 setosa
3 6.1 3.0 4.9 1.8 virginica
4 5.4 3.7 1.5 0.2 setosa
5 6.0 3.0 4.8 1.8 virginica
5 6.5 3.2 5.1 2.0 virginica
7 5.7 2.9 4.2 1.3 versicolor
8 5.6 2.5 3.9 1.1 versicolor
9 7.2 3.6 6.1 2.5 virginica
iris.sort_index().loc[3:7]
out:
sepal_length sepal_width petal_length petal_width species
3 4.8 3.4 1.9 0.2 setosa
3 6.1 3.0 4.9 1.8 virginica
4 5.4 3.7 1.5 0.2 setosa
5 6.0 3.0 4.8 1.8 virginica
5 6.5 3.2 5.1 2.0 virginica
7 5.7 2.9 4.2 1.3 versicolor
使用可调用函数进行选择
df = pd.DataFrame(np.random.randn(6,4), index=list('abcdef'), columns=list('ABCD'))
df
out:
A B C D
a 0.737161 -0.514738 -1.457052 0.353337
b 0.801916 0.266375 -0.968714 -0.087611
c -0.799433 -1.250238 -0.598625 1.259859
d -0.780325 1.910598 -0.522512 -0.680966
e -1.167703 -0.234484 0.243291 -1.931064
f -0.147435 0.145292 -0.256636 -0.110757
df.loc[lambda df: df.index > 'c']
out:
A B C D
d -0.780325 1.910598 -0.522512 -0.680966
e -1.167703 -0.234484 0.243291 -1.931064
f -0.147435 0.145292 -0.256636 -0.110757
df.loc[lambda df: df.A<0]
out:
A B C D
c -0.799433 -1.250238 -0.598625 1.259859
d -0.780325 1.910598 -0.522512 -0.680966
e -1.167703 -0.234484 0.243291 -1.931064
f -0.147435 0.145292 -0.256636 -0.110757
df.loc[lambda df: df.A<0, lambda df: ['A', 'B']]
out:
A B
c -0.799433 -1.250238
d -0.780325 1.910598
e -1.167703 -0.234484
f -0.147435 0.145292