数据源:链接: https://pan.baidu.com/s/1EFqJFXf70t2Rubkh6D19aw 提取码: syqg
数据源示例:
探索风速数据
步骤1 导入必要的库
import pandas as pd
import numpy as np
import datetime
步骤2 从以下地址导入数据
path1='pandas_exercise\exercise_data\wind.data'
步骤3 将数据作存储并且设置前三列为合适的索引
data=pd.read_csv(path1,sep='\s+',parse_dates=[[0,1,2]])
print(data.head())
步骤4 2061年?我们真的有这一年的数据?创建一个函数并用它去修复这个bug
def fix_century(x):
year=x.year-100 if x.year>1989 else x.year
return datetime.date(year,x.month,x.day)
步骤5 将日期设为索引,注意数据类型,应该是datetime64[ns]
data['Yr_Mo_Dy'] = data['Yr_Mo_Dy'].apply(fix_century)
print(data.head())
data=data.set_index('Yr_Mo_Dy')
print(data.head())
步骤6 对应每一个location,一共有多少数据值缺失
print(data.isnull().sum())
步骤7 对应每一个location,一共有多少完整的数据值
print(data.shape[0]-data.isnull().sum())
步骤8 对于全体数据,计算风速的平均值
print(data.mean().mean())
步骤9 创建一个名为loc_stats的数据框去计算并存储每个location的风速最小值,最大值,平均值和标准差
loc_stats=pd.DataFrame()
loc_stats['min']=data.min()
loc_stats['max']=data.max()
loc_stats['mean']=data.mean()
loc_stats['std']=data.std()
print(loc_stats)
步骤10 创建一个名为day_stats的数据框去计算并存储所有location的风速最小值,最大值,平均值和标准差
day_stats=pd.DataFrame()
day_stats['min']=data.min(axis=1)
day_stats['max']=data.max(axis=1)
day_stats['mean']=data.mean(axis=1)
day_stats['std']=data.std(axis=1)
print(day_stats)
步骤11 对于每一个location,计算一月份的平均风速 注意,1961年的1月和1962年的1月应该区别对待
data['date']=data.index
data['month']=data['date'].apply(lambda x:x.month)
data['year']=data['date'].apply(lambda x:x.year)
data['day']=data['date'].apply(lambda x:x.day)
jan_wind=data.query('month==1')
print(jan_wind.mean())
步骤12 对于数据记录按照年为频率取样
print(jan_wind.query('month==1 and day==1'))
步骤13 对于数据记录按照月为频率取样
print(jan_wind.query('day==1'))
输出
# 步骤3
RPT VAL ROS KIL ... MUL CLO BEL MAL
Yr_Mo_Dy ...
1961-01-01 15.04 14.96 13.17 9.29 ... 10.83 12.58 18.50 15.04
1961-01-02 14.71 NaN 10.83 6.50 ... 9.79 9.67 17.54 13.83
1961-01-03 18.50 16.88 12.33 10.13 ... 8.50 7.67 12.75 12.71
1961-01-04 10.58 6.63 11.75 4.58 ... 5.83 5.88 5.46 10.88
1961-01-05 13.33 13.25 11.42 6.17 ... 10.92 10.34 12.92 11.83
[5 rows x 12 columns]
# 步骤5
Yr_Mo_Dy RPT VAL ROS KIL ... CLA MUL CLO BEL MAL
0 1961-01-01 15.04 14.96 13.17 9.29 ... 10.25 10.83 12.58 18.50 15.04
1 1961-01-02 14.71 NaN 10.83 6.50 ... 10.04 9.79 9.67 17.54 13.83
2 1961-01-03 18.50 16.88 12.33 10.13 ... NaN 8.50 7.67 12.75 12.71
3 1961-01-04 10.58 6.63 11.75 4.58 ... 1.79 5.83 5.88 5.46 10.88
4 1961-01-05 13.33 13.25 11.42 6.17 ... 6.54 10.92 10.34 12.92 11.83
[5 rows x 13 columns]
RPT VAL ROS KIL ... MUL CLO BEL MAL
Yr_Mo_Dy ...
1961-01-01 15.04 14.96 13.17 9.29 ... 10.83 12.58 18.50 15.04
1961-01-02 14.71 NaN 10.83 6.50 ... 9.79 9.67 17.54 13.83
1961-01-03 18.50 16.88 12.33 10.13 ... 8.50 7.67 12.75 12.71
1961-01-04 10.58 6.63 11.75 4.58 ... 5.83 5.88 5.46 10.88
1961-01-05 13.33 13.25 11.42 6.17 ... 10.92 10.34 12.92 11.83
[5 rows x 12 columns]
# 步骤6
RPT 6
VAL 3
ROS 2
KIL 5
SHA 2
BIR 0
DUB 3
CLA 2
MUL 3
CLO 1
BEL 0
MAL 4
dtype: int64
# 步骤7
RPT 6568
VAL 6571
ROS 6572
KIL 6569
SHA 6572
BIR 6574
DUB 6571
CLA 6572
MUL 6571
CLO 6573
BEL 6574
MAL 6570
dtype: int64
# 步骤8
10.227982360836938
# 步骤9
min max mean std
RPT 0.67 35.80 12.362987 5.618413
VAL 0.21 33.37 10.644314 5.267356
ROS 1.50 33.84 11.660526 5.008450
KIL 0.00 28.46 6.306468 3.605811
SHA 0.13 37.54 10.455834 4.936125
BIR 0.00 26.16 7.092254 3.968683
DUB 0.00 30.37 9.797343 4.977555
CLA 0.00 31.08 8.495053 4.499449
MUL 0.00 25.88 8.493590 4.166872
CLO 0.04 28.21 8.707332 4.503954
BEL 0.13 42.38 13.121007 5.835037
MAL 0.67 42.54 15.599079 6.699794
# 步骤10
min max mean std
Yr_Mo_Dy
1961-01-01 9.29 18.50 13.018182 2.808875
1961-01-02 6.50 17.54 11.336364 3.188994
1961-01-03 6.17 18.50 11.641818 3.681912
1961-01-04 1.79 11.75 6.619167 3.198126
1961-01-05 6.17 13.33 10.630000 2.445356
... ... ... ...
1978-12-27 8.08 40.08 16.708333 7.868076
1978-12-28 5.00 41.46 15.150000 9.687857
1978-12-29 8.71 29.58 14.890000 5.756836
1978-12-30 9.13 28.79 15.367500 5.540437
1978-12-31 9.59 27.29 15.402500 5.702483
[6574 rows x 4 columns]
# 步骤11
print(jan_wind.mean())
RPT 14.847325
VAL 12.914560
ROS 13.299624
KIL 7.199498
SHA 11.667734
BIR 8.054839
DUB 11.819355
CLA 9.512047
MUL 9.543208
CLO 10.053566
BEL 14.550520
MAL 18.028763
month 1.000000
year 1969.500000
day 16.000000
dtype: float64
# 步骤12
RPT VAL ROS KIL ... date month year day
Yr_Mo_Dy ...
1961-01-01 15.04 14.96 13.17 9.29 ... 1961-01-01 1 1961 1
1962-01-01 9.29 3.42 11.54 3.50 ... 1962-01-01 1 1962 1
1963-01-01 15.59 13.62 19.79 8.38 ... 1963-01-01 1 1963 1
1964-01-01 25.80 22.13 18.21 13.25 ... 1964-01-01 1 1964 1
1965-01-01 9.54 11.92 9.00 4.38 ... 1965-01-01 1 1965 1
1966-01-01 22.04 21.50 17.08 12.75 ... 1966-01-01 1 1966 1
1967-01-01 6.46 4.46 6.50 3.21 ... 1967-01-01 1 1967 1
1968-01-01 30.04 17.88 16.25 16.25 ... 1968-01-01 1 1968 1
1969-01-01 6.13 1.63 5.41 1.08 ... 1969-01-01 1 1969 1
1970-01-01 9.59 2.96 11.79 3.42 ... 1970-01-01 1 1970 1
1971-01-01 3.71 0.79 4.71 0.17 ... 1971-01-01 1 1971 1
1972-01-01 9.29 3.63 14.54 4.25 ... 1972-01-01 1 1972 1
1973-01-01 16.50 15.92 14.62 7.41 ... 1973-01-01 1 1973 1
1974-01-01 23.21 16.54 16.08 9.75 ... 1974-01-01 1 1974 1
1975-01-01 14.04 13.54 11.29 5.46 ... 1975-01-01 1 1975 1
1976-01-01 18.34 17.67 14.83 8.00 ... 1976-01-01 1 1976 1
1977-01-01 20.04 11.92 20.25 9.13 ... 1977-01-01 1 1977 1
1978-01-01 8.33 7.12 7.71 3.54 ... 1978-01-01 1 1978 1
[18 rows x 16 columns]
# 步骤13
RPT VAL ROS KIL ... date month year day
Yr_Mo_Dy ...
1961-01-01 15.04 14.96 13.17 9.29 ... 1961-01-01 1 1961 1
1962-01-01 9.29 3.42 11.54 3.50 ... 1962-01-01 1 1962 1
1963-01-01 15.59 13.62 19.79 8.38 ... 1963-01-01 1 1963 1
1964-01-01 25.80 22.13 18.21 13.25 ... 1964-01-01 1 1964 1
1965-01-01 9.54 11.92 9.00 4.38 ... 1965-01-01 1 1965 1
1966-01-01 22.04 21.50 17.08 12.75 ... 1966-01-01 1 1966 1
1967-01-01 6.46 4.46 6.50 3.21 ... 1967-01-01 1 1967 1
1968-01-01 30.04 17.88 16.25 16.25 ... 1968-01-01 1 1968 1
1969-01-01 6.13 1.63 5.41 1.08 ... 1969-01-01 1 1969 1
1970-01-01 9.59 2.96 11.79 3.42 ... 1970-01-01 1 1970 1
1971-01-01 3.71 0.79 4.71 0.17 ... 1971-01-01 1 1971 1
1972-01-01 9.29 3.63 14.54 4.25 ... 1972-01-01 1 1972 1
1973-01-01 16.50 15.92 14.62 7.41 ... 1973-01-01 1 1973 1
1974-01-01 23.21 16.54 16.08 9.75 ... 1974-01-01 1 1974 1
1975-01-01 14.04 13.54 11.29 5.46 ... 1975-01-01 1 1975 1
1976-01-01 18.34 17.67 14.83 8.00 ... 1976-01-01 1 1976 1
1977-01-01 20.04 11.92 20.25 9.13 ... 1977-01-01 1 1977 1
1978-01-01 8.33 7.12 7.71 3.54 ... 1978-01-01 1 1978 1
[18 rows x 16 columns]