上篇文章中我们自己设计了一个神经网络,从头开始训练用于图片分类。由于我们只使用了5000张图片,我们只取得了80%左右的准确率。这篇文章中,我们使用VGG16作为我们的base_model,在这个基础上进行训练。keras.applications模块中,有几种训练好的base model,可以直接用来进行迁移学习。通过设计include_top=False,我们可以获得不含全连接层的基础网络。通过在后面加入自己的custom layers,我们将其可以用于不同的分类任务。
# finetune from the base model VGG16
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(150, 150, 3))
base_model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 150, 150, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 150, 150, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 150, 150, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 75, 75, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 75, 75, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 75, 75, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 37, 37, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 37, 37, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 37, 37, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 37, 37, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 18, 18, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 9, 9, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 4, 4, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
此时我们有两种做法,一种是使用base_model作为特征提取器,不参与训练,只训练自己加入的全连接层,第二种是base_model也参加训练,此时我们训练的是一个end-to-end model。第二种方法要更难训练一点,我们先看看第一种。
VGG16 as feature extractor
keras中通过设置layers.trainable,我们可以控制哪些层是可以训练的,哪些层是不可以训练的。基础代码和上一篇文章一样。区别就是如何使用base_model和新加入的层作为自己的model。
import os
import numpy as np
from keras.models import Sequential, Model
from keras import layers
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.applications.vgg16 import VGG16
from keras.utils.np_utils import to_categorical
from scipy.misc import imread, imresize
import matplotlib.pyplot as plt
imgs = []
labels = []
img_shape =(150,150)
# image generator
files = os.listdir('data/test')
# read 1000 files for the generator
for img_file in files[:1000]:
img = imread('data/test/' + img_file).astype('float32')
img = imresize(img, img_shape)
imgs.append(img)
imgs = np.array(imgs)
train_gen = ImageDataGenerator(
# rescale = 1./255,
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
val_gen = ImageDataGenerator(
# rescale = 1./255,
featurewise_center=True,
featurewise_std_normalization=True)
train_gen.fit(imgs)
val_gen.fit(imgs)
# 4500 training images
train_iter = train_gen.flow_from_directory('data/train',class_mode='binary',
target_size=img_shape, batch_size=16)
# 501 validation images
val_iter = val_gen.flow_from_directory('data/val', class_mode='binary',
target_size=img_shape, batch_size=16)
'''
# image generator debug
for x_batch, y_batch in img_iter:
print(x_batch.shape)
print(y_batch.shape)
plt.imshow(x_batch[0])
plt.show()
'''
# finetune from the base model VGG16
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(150, 150, 3))
base_model.summary()
out = base_model.layers[-1].output
out = layers.Flatten()(out)
out = layers.Dense(1024, activation='relu')(out)
# 因为前面输出的dense feature太多了,我们这里加入dropout layer来防止过拟合
out = layers.Dropout(0.5)(out)
out = layers.Dense(512, activation='relu')(out)
out = layers.Dropout(0.3)(out)
out = layers.Dense(1, activation='sigmoid')(out)
tuneModel = Model(inputs=base_model.input, outputs = out)
for layer in tuneModel.layers[:19]: # freeze the base model only use it as feature extractors
layer.trainable = False
tuneModel.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
history = tuneModel.fit_generator(
generator=train_iter,
steps_per_epoch=100,
epochs=100,
validation_data=val_iter,
validation_steps=32
)
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1,101)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.legend()
plt.show()
# 输出
Epoch 1/100
100/100 [==============================] - 677s 7s/step - loss: 0.4214 - acc: 0.8113 - val_loss: 0.1659 - val_acc: 0.9311
Epoch 2/100
100/100 [==============================] - 786s 8s/step - loss: 0.2618 - acc: 0.8900 - val_loss: 0.1576 - val_acc: 0.9351
可以看到两个epoch之后就基本达到93%的accuracy,感觉像magic,在自己数据和计算资源有限的情况下finetune确实是一种很有效的提升效果的方式啊。