Dataiku Deep Learning Tutorial: Emotion Classification In Videos
This post was originally published by phData. Please be sure to check out Emotion Classification on Video with phData for more.
INTRODUCTION
At phData, we have seen the value that deep learning brings to organizations that can successfully harness it. From reducing diagnostic errors in radiology to more accurately detecting manufacturing defects, we’ve certainly seen our share of wins, but not without pain. Most organizations will fail to adopt deep learning because of the complexity of its systems and development.
In this blog, we’ll show you how we have used the Dataiku Data Science Studio (Dataiku DSS), a tool that we already use to simplify data science, to build out deep learning applications with relative ease. More specifically, we will show how to build an emotion classification system on videos using the Dataiku deep learning for images plugin. Our emotion classification dataset will be the speech video data from RAVDESS. An example of one of the videos can be found on YouTube.
We will leverage the Dataiku deep learning for images plugin, which allows us to download pre-trained deep learning networks and provides recipes such as image classification retraining and scoring to classify emotions for the videos. Dataiku’s deep learning plugin uses Tensorflow and Keras on Python for image classification.
The approach that we will take for video classification is to break each emotion video into a fixed number of frames and then use these images to train a deep residual neural network (known as resnet) to classify emotions within each image. This resnet network has been previously trained on the ImageNet dataset, so we do not have to start from scratch. Finally, we’ll evaluate the predicted emotion for a video by taking a majority vote on labels predicted across all its frames.
DATAIKU DEEP LEARNING TUTORIAL OVERVIEW
OBTAINING DATAIKU DSS
The Dataiku free trial version can be used on your local machine, or you can deploy Dataiku to AWS using their AWS Marketplace image.
USING THE DATAIKU DEEP LEARNING FOR IMAGES PLUGIN
Before we dive into the tutorial, let’s get familiar with some of the recipes provided by the Dataiku deep learning plugin and their requirements.
The retraining image classification model recipe takes a previously trained Tensorflow neural network and retrains one or more layers on a new set of images.
Inputs:
- A folder containing the previously trained weights (in Tensorflow h5 format)
- A folder containing images to use for training
- A dataset containing the relative path of each image in the folder along with the label to use for that image
Outputs:
- A folder containing the new weights and information about the network structure
The image classification recipe uses a trained neural network to generate classification scores for images.
Inputs:
- A folder containing the network weights and information about the network structure
- A folder containing the images to classify
Outputs:
- A dataset containing classes and their respective scores for each image
STEPS REQUIRED FOR THE PROJECT
Refer to the diagram and the plan below, which indicates where each step fits into the project’s workflow.
STEP 1: PREPARE THE EMOTION CLASSIFICATION DATASET
- Download emotion video data using a shell code recipe.
- Extract frames from videos and create a dataset using a Python code recipe.
- Split the data into training and testing sets using a split visual recipe.
STEP 2: TRAIN THE MODEL
- Download resnet weights using the download pre-trained model macro.
- Create, configure, and run the retraining image classification model recipe.
STEP 3: SCORE FRAMES AND EVALUATE LABELS FOR EMOTION VIDEOS
- Create a folder containing all frames/images of videos to be tested.
- Use the image classification recipe to score test images.
- Extract labels for images.
- Evaluate and assign labels to videos.
STEP 4: VISUALIZE RESULTS
- Analyze results.
- Build a Dataiku dashboard displaying the confusion matrix and accuracy analysis.
PREREQUISITES
A: INSTALL THE DEEP LEARNING FOR IMAGES PLUGIN
Open up Plugins settings.
Install the appropriate Dataiku deep learning for images plugin, depending on if you are using CPU or GPU.
B: CREATE A PYTHON CODE ENVIRONMENT CONTAINING OPENCV
Open Administration and go to Code Envs.
Create a new Python env called py27opencv
. Install the Jupyter notebook packages if you want to be able to use a notebook to experiment with code.
Open this environment and go to Packages to install, and add.
opencv-python==4.2.0.32
Select save and update to install the packages.
STEP 1: PREPARE THE EMOTION CLASSIFICATION DATASET
A: DOWNLOAD EMOTION VIDEO DATASET
Create a shell recipe to download videos.
For the output, create a new folder called Emotion Videos.
Use this bash script to download and extract the video files (note that the extracted videos take up 6.31 GB):
#!/bin/bash for i in $(seq -f "%02g" 1 24); do curl "https://zenodo.org/record/1188976/files/Video_Speech_Actor_$i.zip?download=1" -o "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip" unzip "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip" -d $DKU_OUTPUT_0_FOLDER_PATH mv $DKU_OUTPUT_0_FOLDER_PATH/Actor_$i/02-* $DKU_OUTPUT_0_FOLDER_PATH rm -rf "$DKU_OUTPUT_0_FOLDER_PATH/Actor_$i" rm "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip" done
Note that we only keep videos starting with 02-*, this is because the 01-* videos are duplicates that include audio. Run the recipe.
B: EXTRACT FRAMES FROM VIDEOS
Create a Python code recipe.
For the input, select Emotion Videos. For the output, create a local folder called Emotion Images. We also want to create a new dataset, EmotionImagesCSV
, to hold information about the images.
In the Advanced settings menu, select the code environment containing OpenCV (py27opencv
).
Save and run this code in the recipe. It will iterate through all video files in Emotion Videos, extracting frames from each video at different points. We extract frames at 5% intervals (i.e. the frame that occurs when the video is 5% complete, 10% complete, etc.) and store them as png files in Emotion Images. It will also extract the information on every frame such as the video interval for the frame, image file name, emotion, etc., and write it to EmotionImagesCSV.
You should now be able to see frames sampled from each video in Emotion Images.
EmotionImagesCSV
should contain metadata on each frame with columns like image path, label, actor, etc. These values are obtained from the file name according to the following table:
C: SPLIT DATA INTO TRAIN AND TEST
Apply the split recipe on EmotionImagesCSV
to create TestingImages
and TrainingImages
. Group the rows on video_path
. We do this to make sure that all the frames of a video either belong to train or test data. This is to prevent frames from one video ending up in both the training and test set, which ensures that our network is learning to recognize emotion instead of specific characteristics of each video.
Then, run the recipe.
STEP 2: TRAIN THE MODEL
A: DOWNLOAD THE DATAIKU TENSORFLOW RESNET MODEL USING MACRO
Go to Macros.
Click on Download pre-trained model.
Put the output folder name as Original Resnet
and set the pre-trained model to download to Resnet Trained on ImageNet.
Run Macro. You should be able to see .h5 files in the model folder Original Resnet.
B: CREATE THE RETRAINING RECIPE
Create a new folder TrainedResnet. This is where we will store the weights after retraining the model on our dataset.
Go to add Recipe. Under Plugins, choose Deep Learning on images.
Choose Retraining Image Classification model.
For input, set Label Dataset to TrainingImages
, Image Folder to Emotion Images
, and Model folder to Original Resnet
. Set the output model folder to TrainedResnet
.
Click Create Recipe. This will open a window that will allow you to set training ratio, hyperparameters, and columns that contain the path and label information on training images.
Set the Image filename column to path
and the Label column to label
.
Set the rest of the configuration as shown below.
The optimization settings use the Keras stochastic gradient descent optimizer, and we use custom parameters to set the momentum value and to use Nesterov momentum. Details about the algorithm and parameters can be found in the Keras documentation.
We also use data augmentation to apply random transformations to each image to increase the number of training images. The details of each of these parameters can be found in the Keras documentation for ImageDataGenerator.
If you are using the Dataiku deep learning for images plugin (GPU), add the following settings:
Run the Retraining Recipe.
Retrained model files will be available in the output folder TrainedResnet
.
STEP 3: SCORE FRAMES AND EVALUATE LABELS FOR EMOTION VIDEOS
The Dataiku image classification (score) recipe takes in a Dataiku Tensorflow model folder and a folder with images to be scored. It returns a dataset with the path of the image and JSON object containing predicted labels as keys and their respective prediction probabilities as values.
A: PREPARE THE TEST IMAGES FOLDER
Add Python recipe. Set inputs to TestingImages
dataset and Emotion Images
folder. Set output to a new folder Testing Image Files
.
Add the following code snippet and run the recipe. It will load all images from Emotion Images in the Testing Image Files folder whose paths are in the TestingImages dataset.
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandas utils as pdu # Read recipe inputs -- CHANGE THIS IDENTIFIER TO MATCH YOUR INPUT FOLDER emotion_test = dataiku.Dataset("TestingImages") emotion_test_df = emotion_test.get_dataframe() local_images = dataiku.Folder("StYjUWGk") local_images_info = local_images.get_info() # Write recipe outputs -- CHANGE THIS IDENTIFIER TO MATCH YOUR OUTPUT FOLDER test_images = dataiku.Folder("cUcmzSP8") test_images_info = test_images.get_info() # Clear recipe output before writing test_images.clear() for ind, row in emotion_test_df.iterrows(): stream = local_images.get_download_stream(row.image_path) test_images.upload_stream(row.image_path, stream)
B: USE PLUGIN RECIPE TO SCORE
Go to the Dataiku deep learning for images plugin. Click on Image Classification.
Set input to Testing Image Files
and the new output dataset name to ScoredImages
.
Run the recipe. Your ScoredImages file should look like this:
C: EXTRACT PREDICTED LABELS FOR IMAGES
To get a final class for a frame/image, we will use the label that has been predicted with maximum confidence.
Add Prepare recipe. Set input to ScoredImages
and output to a new dataset ScoredImagesPrepared
.
Add a New Step in the Script. Use the Python function.
Set the Mode to row: return a row for each row.
Click on Edit Python Source Code and add the following snippet of code:
def process(row): max_val = 0 max_label = None emotions = ['calm', 'sad', 'surprised', 'neutral', 'fearful', 'angry', 'happy', 'disgust'] for e in emotions: p = float(row['prediction_{}'.format(e)]) if p > max_val: max_val = p max_label = e row['max_prediction'] = max_val row['max_label'] = max_label row['label'] = row['images'].split('_')[1] row['correct'] = 1 if row['label'] == row['max_label'] else 0 row['video_path'] = '/{}.mp4'.format(row['images'].split('_')[0]) return row
Run the recipe.
ScoredImagesPrepared should now have a new column max_label
(label predicted with maximum probability). This is the final label that is assigned to the frame/image and we will also use this to classify labels at a video level.
There is also a column called correct
. It is evaluated as 1 or 0 based on whether the original label of the image is equal to max_label
.
EVALUATE LABELS FOR EMOTION VIDEOS
Add a Group recipe. Set input to ScoredImagesPrepared
and call the new output file ScoredImages_prepared_by_video_path
. Create recipe.
Add column video_path to Group Keys
.
Select Concat for max_label.
You can leave other columns as it is. Run the group recipe.
The next step is the final one. This is when we finally assign labels to the videos!
Use the Prepare recipe once again with input as ScoredImages_prepared_by_video_path
. Set output to a new dataset ScoredVideos
.
Add New Step in the Script. Use Python function. Set the Mode to row: return a row for each row. Edit Python Code and add this:
from collections import Counter def most_frequent(List): occurence_count = Counter(List) return occurence_count.most_common(1)[0][0] def process(row): row['label'] = row['label_first'] row['most_freq_label'] = most_frequent(row['max_label_concat'].split(',')) row['most_freq_correct'] = 1 if row['label'] == row['most_freq_label'] else 0 return row row['most_freq_correct'] = 1 if row['label'] == row['most_freq_label'] else 0 return row
This code will take the most frequent label for every video and store it in the most_freq_label column. We also create the most_freq_correct column that compares the original label with the most frequent label and gives 1 or 0 accordingly.
STEP 4: VISUALIZE THE PREDICTIONS
A: ANALYZING THE RESULTS
Click Explore on ScoredVideos. Go to the most_freq_correct
column and click on Analyze.
You should see the percentage of correct and incorrect predictions.
B: BUILDING A CONFUSION MATRIX
Go to Charts->Tables -> Colored.
Set rows to label, columns to most_freq_label
, and content to Count of records
.
Publish this chart to a Dataiku dashboard by clicking the Publish button on the top right corner.
C: UNIVARIATE ANALYSIS
Click Explore on ScoredVideos
dataset. Go to the Statistics tab.
Click on New Card at the top right corner. Then choose Univariate Analysis.
Drag and drop the most_freq_correct
variable into Variables to describe. Then, click Create Card.
Click on the top right corner of the card and publish to the Dataiku dashboard.
CONCLUSION
In this post, we examined how Dataiku’s deep learning for images plugin can be used to perform advanced deep learning with relative ease. Of the 287 videos in our test set, we were able to correctly classify the emotion within the video 97% of the time. Dataiku DSS allows advanced users to directly apply custom code and Python packages like OpenCV, while the Dataiku deep learning for images plugin uses Keras and Tensorflow to simplify the usage of deep learning networks. The same technique we’ve used here for emotion video classification can be applied to many different video classification tasks, from determining if retail customers in surveillance footage are enjoying themselves to identifying defective parts on a manufacturing line. Looking for more ideas about how you can use cutting-edge tools to advance your business? Need deep learning consulting for your advanced machine learning projects? phData’s Machine Learning practice is here to help!
The code for this project has been made available on phData's Github.
Answers
-
Following the suggest steps, I met one error message as following with CPU legacy plugin:
Job failed: Error in Python process: At line 135: <type 'exceptions.IOError'>: [Errno None] None: 'None'
Any comments on this issue,
Thanks,