Cannot reproduce the result of cats dogs classification
I follow the document to apply transfer learning on xception model to classify cats and dogs:
https://www.dataiku.com/learn/guide/visual/machine-learning/deep-learning-images.html
But the results I achieved is very pool.
Here is the architecture code:
from keras.layers import Input, Dense, Flatten
from keras.models import Model
from keras.applications import Xception
import os
import dataiku
def build_model(input_shapes, n_classes=None):
#### DEFINING INPUT AND BASE ARCHITECTURE
# You need to modify the name and shape of the "image_input"
# according to the preprocessing and name of your
# initial feature.
# This feature should to be preprocessed as an "Image", with a
# custom preprocessing.
image_input = Input(shape=(299,299,3), name="path_preprocessed")
base_model = Xception(include_top=False, weights=None, input_tensor=image_input)
#### LOADING WEIGHTS OF PRE TRAINED MODEL
# To leverage this architecture, it is better to use weights
# computed on a previous training on a large dataset (Imagenet).
# To do so, you need to download the file containing the weights
# and load them into your model.
# You can do it by using the macro "Download pre-trained model"
# of the "Deep Learning image" plugin (CPU or GPU version depending
# on your setup) available in the plugin store. For this architecture,
# you need to select:
# "Xception trained on Imagenet"
# This will download the weights and put them into a managed folder
folder = dataiku.Folder("xception_weights")
weights_path = "xception_imagenet_weights_notop.h5"
base_model.load_weights(os.path.join(folder.get_path(), weights_path),
by_name=True, skip_mismatch=True)
for layer in base_model.layers:
layer.trainable = False
#### ADDING FULLY CONNECTED CLASSIFICATION LAYER
x = base_model.layers[-1].output
x = Flatten()(x)
predictions = Dense(n_classes, activation="softmax")(x)
model = Model(input=base_model.input, output=predictions)
return model
def compile_model(model):
model.compile(
optimizer="adam",
loss="categorical_crossentropy"
)
return model
Here is code for training:
from dataiku.doctor.deep_learning.sequences import DataAugmentationSequence
from keras.preprocessing.image import ImageDataGenerator
from keras import callbacks
# A function that builds train and validation sequences.
# You can define your custom data augmentation based on the original train and validation sequences
# build_train_sequence_with_batch_size - function that returns train data sequence depending on
# batch size
# build_validation_sequence_with_batch_size - function that returns validation data sequence depending on
# batch size
def build_sequences(build_train_sequence_with_batch_size, build_validation_sequence_with_batch_size):
batch_size = 16
augmentator = ImageDataGenerator(
zoom_range=0.2,
shear_range=0.2,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True
)
train_sequence = build_train_sequence_with_batch_size(batch_size)
validation_sequence = build_validation_sequence_with_batch_size(batch_size)
augmented_sequence = DataAugmentationSequence(
train_sequence,
'path_preprocessed',
augmentator,
1
)
return augmented_sequence, validation_sequence
# A function that contains a call to fit a model.
# model - compiled model
# train_sequence - train data sequence, returned in build_sequence
# validation_sequence - validation data sequence, returned in build_sequence
# base_callbacks - a list of Dataiku callbacks, that are not to be removed. User callbacks can be added to this list
def fit_model(model, train_sequence, validation_sequence, base_callbacks):
epochs = 10
callback = callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.2,
patience=5
)
base_callbacks.append(callback)
model.fit_generator(train_sequence,
epochs=epochs,
callbacks=base_callbacks,
shuffle=True)
From the training curve, I do not think to increase epochs will be helpful. What could be the problem?
Answers
-
@Haifeng
: I'm having the same problem you had and my code looks very similar to yours. Have you been able to fix the analysis? -
-
@drdunga
Unfortunately, I was not able to fix this problem. -
Hi.
This problem may occur if your input dataset is not shuffled.
Can you please post here a snapshot of it ?
-
Thanks for the response @Ludovic_Pénet
. I'm new to DSS. Is there a particular method to do this snapshot? -
I do have shuffle=True passed in my call to the generator.
My training and test sets looks like this (partial, of course):
-
Hello,
TL;DR the current version of the tutorial does yield poor results. If you switch to a
ResNet50
architecture without freezing layers, you should obtain a much better classifier.Following your feedback, we have re-run the tutorial and also found poor results. After some investigation, this comes from a combination of two unexpected behaviours from the Keras library:
* freezing layers (i.e. settinglayer.trainable = False
) for an architecture containingBatchNormalization
layer(s) does not behave as expected and will most probably build an architecture with very poor performance. You can find more information on this github issue.
* loading weights by name (i.e. settingby_name=True
) does not work well for some architectures provided by Keras, includingXception
. Using by name is interresting when you have modified some layers of the architecture. In the context of the tutorial, we use the full architecture, so it should not be necessary. You can find more information on this behaviour in this github issue.Knowing those behaviours, we have used instead a
ResNet50
architecture without freezing the layers (because this architecture also contains batch normalization layers) and have obtained much better results.To do so, you need to first download the resnet weigths into a managed folder, using the same macro that you used to download the xception weights.
Then, the architecture code becomes (with changing the folder name by the actual name of your folder):
from keras.layers import Input, Dense, Flatten from keras.models import Model from keras.applications import ResNet50 import os import dataiku def build_model(input_shapes, n_classes=None): #### DEFINING INPUT AND BASE ARCHITECTURE # You need to modify the name and shape of the “image_input” # according to the preprocessing and name of your # initial feature. # This feature should to be preprocessed as an “Image”, with a # custom preprocessing. image_shape = (299, 299, 3) image_input_name = "path_preprocessed" image_input = Input(shape=image_shape, name=image_input_name) base_model = ResNet50(include_top=False, weights=None, input_tensor=image_input) #### LOADING WEIGHTS OF PRE TRAINED MODEL # To leverage this architecture, it is better to use weights # computed on a previous training on a large dataset (Imagenet). # To do so, you need to download the file containing the weights # and load them into your model. # You can do it by using the macro “Download pre-trained model” # of the “Deep Learning image” plugin (CPU or GPU version depending # on your setup) available in the plugin store. For this architecture, # you need to select: # “Resnet trained on Imagenet” # This will download the weights and put them into a managed folder folder = dataiku.Folder("pretrained_models") weights_path = "resnet_imagenet_weights_notop.h5" base_model.load_weights(os.path.join(folder.get_path(), weights_path)) #### ADDING FULLY CONNECTED CLASSIFICATION LAYER x = base_model.layers[-1].output x = Flatten()(x) predictions = Dense(n_classes, activation="softmax")(x) model = Model(input=base_model.input, output=predictions) return model def compile_model(model): model.compile( optimizer="adam", loss="categorical_crossentropy" ) return model
After the training, running the evaluation recipe on the test set gives us the AUC: 0.9568, which is excellent (note that you may obtain different values, as the training leverages randomness).
Thanks a lot for reporting the issue, we will work on updating the tutorial to provide a working architecture.
Best regards,
Nicolas
-
Thanks @Nicolas_Servel
! I'm re-training now with ResNet50. In the meantime, you might also make this clarification to the tutorial, which specifies the Train/Test Set (see attachment). The default was to Split the already split training set, I believe. -
Hello @drdunga
, thanks for your feedback.The tutorial expects the following:
* use the Train dataset for the training, with splitting 80/20 for train/test data. Using 2000 images for training + scoring the model shoud be sufficient, and much faster than using 4000 images.
* use the Test dataset only for the Evaluate recipe, to assess how the model behaves on data it has never seen before.
We will work on making it clearer in the tutorial.
Hope this helps,
Best regards,
Nicolas