1
## load the label
ts_label_adr = (adr_input + '/permlty/ts_permeability.xlsx')
file_label = pd.ExcelFile(ts_label_adr)
labels = file_label.parse(header = None)
print(type(labels))
# data_indx = np.load('index.npy','r')
indx = np.random.randint(0, 8300, size = 1000)
np.save(open('index.npy','wb'), indx)
labels_sub = labels.iloc[indx]
np.save(open('index.npy','wb'), indx)
该语句是进行save的操作——没有值(print(type(labels)) ==> <class 'NoneType'>)。dataframe的格式
iloc用于dataframe。(如,选取第i行的数据,使用df.loc[[i]]/df.iloc[[i]])
只要是用panda打开的格式都是dataframe。本语句的作用:在8300个数据中随机抽取1000个数据用于后面的training testing和validating。
注意:使用index且存在index.npy中的原因是——data数据和permeability数据一一对应。(抽取的样本的编号的合集)
2 input处理
each sample: 100100100
- Way 1
each sample = one vector - Way 2
In each sample: each layer = matrix with size [100*100]
calculate the void ratio for each layer.
4 Split data into train, test and validation sets
'''
shuffle and split data into train and test sets
'''
def shuffle_split(data, label):
data = np.array(data)
label = np.array(label)
label = 10**11 * label
if len(data) == len(label):
print('checked out')
indx = np.random.permutation(len(data))
test_size = int(0.2*len(data))
test_indx = indx[:test_size]
train_indx = indx[test_size:]
val_indx = train_indx[:test_size]
train_indx = train_indx[test_size:]
train_dat, train_tar, val_dat, val_tar, test_dat, test_tar = data[train_indx], label[train_indx], data[val_indx], label[val_indx], data[test_indx], label[test_indx]
return train_dat, train_tar, val_dat, val_tar, test_dat, test_tar
train_dat, train_tar, val_dat, val_tar, test_dat, test_tar = shuffle_split(data, labels_sub)
del(data, file_label, labels_sub) # free memory
- label = 10**11 * label
permeability数值比较小,增大 - train, test and validation sets
遵循一定规律。
or epoch in range(num_epoch):
print epoch
optimizer.zero_grad()
for dat, tar in train_loader:
structure = Variable(dat.view(-1,100,100,100))
permeability = Variable(tar.view(-1,1))
- view 就像numpy中的reshape一样
目前给我了:
extracted features, the npy index => I only need to import the corresponding permeability data.