Cannot reproduce the result of cats dogs classification

Haifeng
Haifeng Registered Posts: 3 ✭✭✭✭

I follow the document to apply transfer learning on xception model to classify cats and dogs:

https://www.dataiku.com/learn/guide/visual/machine-learning/deep-learning-images.html

But the results I achieved is very pool.

Here is the architecture code:

from keras.layers import Input, Dense, Flatten

from keras.models import Model

from keras.applications import Xception

import os

import dataiku

def build_model(input_shapes, n_classes=None):

#### DEFINING INPUT AND BASE ARCHITECTURE

# You need to modify the name and shape of the "image_input"

# according to the preprocessing and name of your

# initial feature.

# This feature should to be preprocessed as an "Image", with a

# custom preprocessing.

image_input = Input(shape=(299,299,3), name="path_preprocessed")



base_model = Xception(include_top=False, weights=None, input_tensor=image_input)

#### LOADING WEIGHTS OF PRE TRAINED MODEL

# To leverage this architecture, it is better to use weights

# computed on a previous training on a large dataset (Imagenet).

# To do so, you need to download the file containing the weights

# and load them into your model.

# You can do it by using the macro "Download pre-trained model"

# of the "Deep Learning image" plugin (CPU or GPU version depending

# on your setup) available in the plugin store. For this architecture,

# you need to select:

# "Xception trained on Imagenet"

# This will download the weights and put them into a managed folder

folder = dataiku.Folder("xception_weights")

weights_path = "xception_imagenet_weights_notop.h5"

base_model.load_weights(os.path.join(folder.get_path(), weights_path),

by_name=True, skip_mismatch=True)



for layer in base_model.layers:

layer.trainable = False



#### ADDING FULLY CONNECTED CLASSIFICATION LAYER

x = base_model.layers[-1].output

x = Flatten()(x)

predictions = Dense(n_classes, activation="softmax")(x)

model = Model(input=base_model.input, output=predictions)

return model

def compile_model(model):

model.compile(

optimizer="adam",

loss="categorical_crossentropy"

)

return model

Here is code for training:

from dataiku.doctor.deep_learning.sequences import DataAugmentationSequence

from keras.preprocessing.image import ImageDataGenerator

from keras import callbacks

# A function that builds train and validation sequences.

# You can define your custom data augmentation based on the original train and validation sequences

# build_train_sequence_with_batch_size - function that returns train data sequence depending on

# batch size

# build_validation_sequence_with_batch_size - function that returns validation data sequence depending on

# batch size

def build_sequences(build_train_sequence_with_batch_size, build_validation_sequence_with_batch_size):



batch_size = 16



augmentator = ImageDataGenerator(

zoom_range=0.2,

shear_range=0.2,

rotation_range=20,

width_shift_range=0.2,

height_shift_range=0.2,

horizontal_flip=True

)

train_sequence = build_train_sequence_with_batch_size(batch_size)

validation_sequence = build_validation_sequence_with_batch_size(batch_size)



augmented_sequence = DataAugmentationSequence(

train_sequence,

'path_preprocessed',

augmentator,

1

)

return augmented_sequence, validation_sequence



# A function that contains a call to fit a model.

# model - compiled model

# train_sequence - train data sequence, returned in build_sequence

# validation_sequence - validation data sequence, returned in build_sequence

# base_callbacks - a list of Dataiku callbacks, that are not to be removed. User callbacks can be added to this list

def fit_model(model, train_sequence, validation_sequence, base_callbacks):

epochs = 10



callback = callbacks.ReduceLROnPlateau(

monitor='val_loss',

factor=0.2,

patience=5

)

base_callbacks.append(callback)

model.fit_generator(train_sequence,

epochs=epochs,

callbacks=base_callbacks,

shuffle=True)

From the training curve, I do not think to increase epochs will be helpful. What could be the problem?

Answers

  • Philipp
    Philipp Registered Posts: 2 ✭✭✭✭

    @Haifeng
    : I'm having the same problem you had and my code looks very similar to yours. Have you been able to fix the analysis?

  • drdunga
    drdunga Registered Posts: 4 ✭✭✭✭

    I also am getting results close to chance. Any suggestions? @Haifeng
    @Philipp

  • Philipp
    Philipp Registered Posts: 2 ✭✭✭✭

    @drdunga
    Unfortunately, I was not able to fix this problem.

  • Ludovic_Pénet
    Ludovic_Pénet Dataiker, Registered Posts: 7 Dataiker

    Hi.

    This problem may occur if your input dataset is not shuffled.

    Can you please post here a snapshot of it ?

  • drdunga
    drdunga Registered Posts: 4 ✭✭✭✭

    Thanks for the response @Ludovic_Pénet
    . I'm new to DSS. Is there a particular method to do this snapshot?

  • drdunga
    drdunga Registered Posts: 4 ✭✭✭✭

    I do have shuffle=True passed in my call to the generator.

    My training and test sets looks like this (partial, of course):

    cats_dogs_test.pngcats_dogs_train.png

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker
    edited July 17

    Hello,

    TL;DR the current version of the tutorial does yield poor results. If you switch to a ResNet50 architecture without freezing layers, you should obtain a much better classifier.

    Following your feedback, we have re-run the tutorial and also found poor results. After some investigation, this comes from a combination of two unexpected behaviours from the Keras library:
    * freezing layers (i.e. setting layer.trainable = False) for an architecture containing BatchNormalization layer(s) does not behave as expected and will most probably build an architecture with very poor performance. You can find more information on this github issue.
    * loading weights by name (i.e. setting by_name=True) does not work well for some architectures provided by Keras, including Xception. Using by name is interresting when you have modified some layers of the architecture. In the context of the tutorial, we use the full architecture, so it should not be necessary. You can find more information on this behaviour in this github issue.

    Knowing those behaviours, we have used instead a ResNet50 architecture without freezing the layers (because this architecture also contains batch normalization layers) and have obtained much better results.

    To do so, you need to first download the resnet weigths into a managed folder, using the same macro that you used to download the xception weights.

    Then, the architecture code becomes (with changing the folder name by the actual name of your folder):

    from keras.layers import Input, Dense, Flatten
    from keras.models import Model
    from keras.applications import ResNet50
    import os
    import dataiku
    
    def build_model(input_shapes, n_classes=None):
        #### DEFINING INPUT AND BASE ARCHITECTURE
        # You need to modify the name and shape of the “image_input”
        # according to the preprocessing and name of your
        # initial feature.
        # This feature should to be preprocessed as an “Image”, with a
        # custom preprocessing.
        image_shape = (299, 299, 3)
        image_input_name = "path_preprocessed"
        image_input = Input(shape=image_shape, name=image_input_name)
        base_model = ResNet50(include_top=False, weights=None, input_tensor=image_input)
        
        #### LOADING WEIGHTS OF PRE TRAINED MODEL
        # To leverage this architecture, it is better to use weights
        # computed on a previous training on a large dataset (Imagenet).
        # To do so, you need to download the file containing the weights
        # and load them into your model.
        # You can do it by using the macro “Download pre-trained model”
        # of the “Deep Learning image” plugin (CPU or GPU version depending
        # on your setup) available in the plugin store. For this architecture,
        # you need to select:
        #    “Resnet trained on Imagenet”
        # This will download the weights and put them into a managed folder
        folder = dataiku.Folder("pretrained_models")
        weights_path = "resnet_imagenet_weights_notop.h5"
        base_model.load_weights(os.path.join(folder.get_path(), weights_path))
        
        #### ADDING FULLY CONNECTED CLASSIFICATION LAYER
        x = base_model.layers[-1].output
        x = Flatten()(x)
        predictions = Dense(n_classes, activation="softmax")(x)
        model = Model(input=base_model.input, output=predictions)
        return model
    
    def compile_model(model):
        model.compile(
            optimizer="adam",
            loss="categorical_crossentropy"
        )
        return model

    After the training, running the evaluation recipe on the test set gives us the AUC: 0.9568, which is excellent (note that you may obtain different values, as the training leverages randomness).

    Thanks a lot for reporting the issue, we will work on updating the tutorial to provide a working architecture.

    Best regards,

    Nicolas

  • drdunga
    drdunga Registered Posts: 4 ✭✭✭✭

    Thanks @Nicolas_Servel
    ! I'm re-training now with ResNet50. In the meantime, you might also make this clarification to the tutorial, which specifies the Train/Test Set (see attachment). The default was to Split the already split training set, I believe.

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Hello @drdunga
    , thanks for your feedback.

    The tutorial expects the following:

    * use the Train dataset for the training, with splitting 80/20 for train/test data. Using 2000 images for training + scoring the model shoud be sufficient, and much faster than using 4000 images.

    * use the Test dataset only for the Evaluate recipe, to assess how the model behaves on data it has never seen before.

    We will work on making it clearer in the tutorial.

    Hope this helps,

    Best regards,

    Nicolas

Setup Info
    Tags
      Help me…