Cannot reproduce the result of cats dogs classification

Haifeng
Level 1
Cannot reproduce the result of cats dogs classification

I follow the document to apply transfer learning on xception model to classify cats and dogs:



https://www.dataiku.com/learn/guide/visual/machine-learning/deep-learning-images.html



But the results I achieved is very pool. 



Here is the architecture code:



from keras.layers import Input, Dense, Flatten

from keras.models import Model

from keras.applications import Xception

import os

import dataiku



def build_model(input_shapes, n_classes=None):



    #### DEFINING INPUT AND BASE ARCHITECTURE

    # You need to modify the name and shape of the "image_input" 

    # according to the preprocessing and name of your 

    # initial feature.

    # This feature should to be preprocessed as an "Image", with a 

    # custom preprocessing.



    image_input = Input(shape=(299,299,3), name="path_preprocessed")





    base_model = Xception(include_top=False, weights=None, input_tensor=image_input)



    #### LOADING WEIGHTS OF PRE TRAINED MODEL

    # To leverage this architecture, it is better to use weights

    # computed on a previous training on a large dataset (Imagenet).

    # To do so, you need to download the file containing the weights

    # and load them into your model.

    # You can do it by using the macro "Download pre-trained model"

    # of the "Deep Learning image" plugin (CPU or GPU version depending

    # on your setup) available in the plugin store. For this architecture,

    # you need to select:

    #    "Xception trained on Imagenet"

    # This will download the weights and put them into a managed folder

    folder = dataiku.Folder("xception_weights")

    weights_path = "xception_imagenet_weights_notop.h5"



    base_model.load_weights(os.path.join(folder.get_path(), weights_path),

                       by_name=True, skip_mismatch=True)

    

    for layer in base_model.layers:

        layer.trainable = False

    

    #### ADDING FULLY CONNECTED CLASSIFICATION LAYER

    x = base_model.layers[-1].output

    x = Flatten()(x)

    predictions = Dense(n_classes, activation="softmax")(x)



    model = Model(input=base_model.input, output=predictions)

    return model



def compile_model(model):

    model.compile(

        optimizer="adam",

        loss="categorical_crossentropy"

    )

    return model



Here is code for training:



from dataiku.doctor.deep_learning.sequences import DataAugmentationSequence

from keras.preprocessing.image import ImageDataGenerator

from keras import callbacks



# A function that builds train and validation sequences.

# You can define your custom data augmentation based on the original train and validation sequences



#   build_train_sequence_with_batch_size        - function that returns train data sequence depending on

#                                                 batch size

#   build_validation_sequence_with_batch_size   - function that returns validation data sequence depending on

#                                                 batch size

def build_sequences(build_train_sequence_with_batch_size, build_validation_sequence_with_batch_size):

    

    batch_size = 16

    

    augmentator = ImageDataGenerator(

        zoom_range=0.2,

        shear_range=0.2,

        rotation_range=20,

        width_shift_range=0.2,

        height_shift_range=0.2,

        horizontal_flip=True

    )

    train_sequence = build_train_sequence_with_batch_size(batch_size)

    validation_sequence = build_validation_sequence_with_batch_size(batch_size)

    

    augmented_sequence = DataAugmentationSequence(

        train_sequence,

        'path_preprocessed',

        augmentator,

        1

    )

    return augmented_sequence, validation_sequence





# A function that contains a call to fit a model.



#   model                 - compiled model

#   train_sequence        - train data sequence, returned in build_sequence

#   validation_sequence   - validation data sequence, returned in build_sequence

#   base_callbacks        - a list of Dataiku callbacks, that are not to be removed. User callbacks can be added to this list

def fit_model(model, train_sequence, validation_sequence, base_callbacks):

    epochs = 10

    

    callback = callbacks.ReduceLROnPlateau(

        monitor='val_loss',

        factor=0.2,

        patience=5

    )



    base_callbacks.append(callback)

    model.fit_generator(train_sequence,

                        epochs=epochs,

                        callbacks=base_callbacks,

                        shuffle=True)



From the training curve, I do not think to increase epochs will be helpful. What could be the problem?

9 Replies
Philipp
Level 1

@Haifeng : I'm having the same problem you had and my code looks very similar to yours. Have you been able to fix the analysis?

drdunga
Level 2

I also am getting results close to chance.  Any suggestions? @Haifeng @Philipp 

Philipp
Level 1

@drdunga Unfortunately, I was not able to fix this problem.

Ludovic_Pénet
Dataiker

Hi.

This problem may occur if your input dataset is not shuffled.

Can you please post here a snapshot of it ?

drdunga
Level 2

Thanks for the response @Ludovic_Pénet .  I'm new to DSS.  Is there a particular method to do this snapshot?  

drdunga
Level 2

I do have shuffle=True passed in my call to the generator.  

My training and test sets looks like this (partial, of course):

 
 

cats_dogs_test.pngcats_dogs_train.png

 

Nicolas_Servel
Dataiker

Hello,

TL;DR the current version of the tutorial does yield poor results. If you switch to a ResNet50 architecture without freezing layers, you should obtain a much better classifier.

Following your feedback, we have re-run the tutorial and also found poor results. After some investigation, this comes from a combination of two unexpected behaviours from the Keras library:
* freezing layers (i.e. setting layer.trainable = False) for an architecture containing BatchNormalization layer(s) does not behave as expected and will most probably build an architecture with very poor performance. You can find more information on this  github issue.
* loading weights by name (i.e. setting by_name=True) does not work well for some architectures provided by Keras, including Xception. Using by name is interresting when you have modified some layers of the architecture. In the context of the tutorial, we use the full architecture, so it should not be necessary. You can find more information on this behaviour in this github issue.

Knowing those behaviours, we have used instead a ResNet50 architecture without freezing the layers (because this architecture also contains batch normalization layers) and have obtained much better results.

To do so, you need to first download the resnet weigths into a managed folder, using the same macro that you used to download the xception weights.

Then, the architecture code becomes (with changing the folder name by the actual name of your folder):

 

from keras.layers import Input, Dense, Flatten
from keras.models import Model
from keras.applications import ResNet50
import os
import dataiku

def build_model(input_shapes, n_classes=None):
    #### DEFINING INPUT AND BASE ARCHITECTURE
    # You need to modify the name and shape of the “image_input”
    # according to the preprocessing and name of your
    # initial feature.
    # This feature should to be preprocessed as an “Image”, with a
    # custom preprocessing.
    image_shape = (299, 299, 3)
    image_input_name = "path_preprocessed"
    image_input = Input(shape=image_shape, name=image_input_name)
    base_model = ResNet50(include_top=False, weights=None, input_tensor=image_input)
    
    #### LOADING WEIGHTS OF PRE TRAINED MODEL
    # To leverage this architecture, it is better to use weights
    # computed on a previous training on a large dataset (Imagenet).
    # To do so, you need to download the file containing the weights
    # and load them into your model.
    # You can do it by using the macro “Download pre-trained model”
    # of the “Deep Learning image” plugin (CPU or GPU version depending
    # on your setup) available in the plugin store. For this architecture,
    # you need to select:
    #    “Resnet trained on Imagenet”
    # This will download the weights and put them into a managed folder
    folder = dataiku.Folder("pretrained_models")
    weights_path = "resnet_imagenet_weights_notop.h5"
    base_model.load_weights(os.path.join(folder.get_path(), weights_path))
    
    #### ADDING FULLY CONNECTED CLASSIFICATION LAYER
    x = base_model.layers[-1].output
    x = Flatten()(x)
    predictions = Dense(n_classes, activation="softmax")(x)
    model = Model(input=base_model.input, output=predictions)
    return model

def compile_model(model):
    model.compile(
        optimizer="adam",
        loss="categorical_crossentropy"
    )
    return model

 

After the training, running the evaluation recipe on the test set gives us the AUC: 0.9568, which is excellent (note that you may obtain different values, as the training leverages randomness).

Thanks a lot for reporting the issue, we will work on updating the tutorial to provide a working architecture.

Best regards,

Nicolas

drdunga
Level 2

Thanks @Nicolas_Servel !  I'm re-training now with ResNet50.  In the meantime, you might also make this clarification to the  tutorial, which specifies the Train/Test Set (see attachment).  The default was to Split the already split training set, I believe.

 

0 Kudos
Nicolas_Servel
Dataiker

Hello @drdunga , thanks for your feedback.

The tutorial expects the following:

* use the Train dataset for the training, with splitting 80/20 for train/test data. Using 2000 images for training + scoring the model shoud be sufficient, and much faster than using 4000 images.

* use the Test dataset only for the Evaluate recipe, to assess how the model behaves on data it has never seen before.

We will work on making it clearer in the tutorial.

Hope this helps,

Best regards,

Nicolas