上次文章利用statsmodels建立回归模型,手动提取特征,得到的测试集的MSE为0.12064232053155005
本次利用pytorch深度学习模块,搭建一个神经网络测试一下预测效果会怎么样呢。
神经网络的优点也很明显,几乎不需要自己手动提取特征,全靠网络自动选择;并且数据拟合能力十分强大。
下面就用神经网络搭建一个回归模型,对比一下效果。
1、导入所用模块
import numpy as np
import torch
import torch.nn as nn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import torch.utils.data as Data
2、将数据加载、处理和转换
data = pd.read_table('zhengqi_train.txt')
# 分割训练集和测试集
x = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# 数据标准化
ss = preprocessing.StandardScaler()
x_train = ss.fit_transform(x_train)
x_test = ss.fit_transform(x_test)
# numpy转tensor
x_train = torch.from_numpy(x_train).float()
y_train = torch.from_numpy(y_train).float().view(-1,1)
x_test = torch.from_numpy(x_test).float()
y_test = torch.from_numpy(y_test).float().view(-1,1)
# batch_size = 64
torch_dataset = Data.TensorDataset(x_train, y_train)
loader = Data.DataLoader(dataset=torch_dataset, batch_size=64,
shuffle=True, num_workers=2)
# print(next(iter(loader)))
3、建立神经网络模型
class LR(nn.Module):
def __init__(self):
super(LR, self).__init__()
self.fc1 = nn.Linear(38, 48)
self.fc2 = nn.Linear(48, 32)
self.fc3 = nn.Linear(32, 16)
self.fc4 = nn.Linear(16, 8)
self.fc5 = nn.Linear(8, 1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = torch.relu(self.fc3(x))
x = torch.relu(self.fc4(x))
x = self.fc5(x)
return x
- 五个线性连接层,激活函数选择Relu(貌似sigmod效果也可以),输入层为38个特征,输出层为1。
4、模型初始化化
# 模型初始化
net = LR()
criterion = nn.MSELoss()
optm = torch.optim.Adam(net.parameters(), lr=0.0001, weight_decay=0.01)
epochs = 300
- 代价函数选择MSE
- 这里Adim和SGD优化效果貌似差不多,随意选择其一即可。
-学习率选择小一些效果可能会更好。 - 为了防止过拟合,添加L2正则化,其中λ=0.01
- 300次迭代
5、训练模型
# 训练模型
for e in range(epochs):
for i, (batch_x, batch_y) in enumerate(loader):
y_hat = net(batch_x)
# print(y_hat.shape)
loss = criterion(y_hat, batch_y)
optm.zero_grad()
loss.backward()
optm.step()
if (e + 1) % 50 == 0:
y = net(x_train)
loss = criterion(y, y_train)
y_pred = net(x_test)
error = criterion(y_pred, y_test)
print("Epoch:{}, trainLoss:{},testLoss:{}".format(e+1,loss.item(),error.item()))
- 输出如下
Epoch:50, trainLoss:0.1121646836400032,testLoss:0.11146756261587143
Epoch:100, trainLoss:0.098330557346344,testLoss:0.10170140862464905
Epoch:150, trainLoss:0.0903727188706398,testLoss:0.09786481410264969
Epoch:200, trainLoss:0.08492863923311234,testLoss:0.0965176373720169
Epoch:250, trainLoss:0.08087099343538284,testLoss:0.09639628231525421
Epoch:300, trainLoss:0.07784002274274826,testLoss:0.09672882407903671
可以看到在测试集上MSE达到了0.09672882407903671
,远远好过上一次线性回归模型的0.12064232053155005
。
6、生成比赛结果
test = pd.read_table('zhengqi_test.txt')
x = torch.from_numpy(ss.fit_transform(test.values)).float()
y = net(x)
print(y.shape)
np.savetxt('result.txt', y.detach().numpy())
最后,将输出的比赛结果,提交到天池上看一下成绩吧!