Model deploy

RohitRanga
Level 3
Model deploy

For deploying a model from DSS, is it currently mandatory to go via the Lab feature? As in, do we have to train a model from inside the Lab? 

0 Kudos
9 Replies
tgb417

@RohitRanga ,

I see you found this thread.

https://community.dataiku.com/t5/Using-Dataiku/How-to-create-a-model-to-deploy-from-a-pickle/m-p/105... 

You did not mention anything about what kind of model you want to use.  Is it something like a Scikit learn model.  Or are you trying to do something like transfer learning.  You also did not mention how integrated you would like the model to be with other standard Dataiku DSS features.  So you can call almost any existing model from a Jupiter notebook or python or R recipe.  However those models may not be able to fully integrate with DSSโ€™s model management tools. 

There are definitely some ways to import models trained elsewhere, and get more of that functionality.  Iโ€™m not an expert on this topic.  

However check out this thread.

https://community.dataiku.com/t5/Using-Dataiku/import-sklearn-model-trained-outside-of-Dataiku-into-...

This references this example flow.

https://community.dataiku.com/t5/Using-Dataiku/import-sklearn-model-trained-outside-of-Dataiku-into-...

Here is some documentation on evaluating external model.  I note that the documentation calls this out as a โ€œvery advanced topicโ€

https://doc.dataiku.com/dss/latest/python-api/saved_models.html

Let us know a bit more about what you are trying to do.  There may be some folks here who are able to help you.   Second, if you are using the paid version of Dataiku DSS, you may want to reach out to the support team.  They can be very helpful.

 

--Tom
RohitRanga
Level 3
Author

Thanks @tgb417 for the prompt support once again!

Yes I was referring to a scikit-learn model. I have it trained using my own code logic (usual stuff, nothing fancy) without using any 'Lab' features. I have it saved inside a Managed folder. I would like to deploy the model as a REST api. Let me know if you need more info. Thanks!

0 Kudos
tgb417

This describes how to integrate a different scikit learn model into DSS.

https://knowledge.dataiku.com/latest/courses/advanced-code/custom-models/custom-model.html

But in this use case you are likely to retrain.  

Here is some more info.

https://doc.dataiku.com/dss/latest/machine-learning/custom-models.html

This is about deploying arbitrary code via the api.

https://doc.dataiku.com/dss/latest/machine-learning/custom-models.html

And this may be the info you need.

https://doc.dataiku.com/dss/latest/apinode/endpoint-python-prediction.html 

I would also open a support ticket with your specific use case the team at Dataiku has been particularly supportive.  

 

--Tom
RohitRanga
Level 3
Author

@tgb417  I have been working on this recently and there are some follow-up points:

-  The trained scikit-learn model was saved inside a managed folder as a .pkl file

- Creating custom models (https://knowledge.dataiku.com/latest/courses/advanced-code/custom-models/custom-model.html) does not help since I am training a multi-label classifier which is not supported from the Lab. This means there would be no "Saved Model" in the flow, and I would not be able to use the native Predict, Train, Score and Evaluate recipes. Please correct me if I am wrong or missed something here.

- I had to use the other link you shared (https://doc.dataiku.com/dss/latest/apinode/endpoint-python-prediction.html)

0 Kudos
AlexandreL
Dataiker

Hi,

If your model is already trained and saved as a pickle file, the quickest option would be to use a python prediction endpoint. To do so, you need to add a managed folder to your flow, with your .pkl file in it. You could even create a python recipe that takes a dataset as input, uses your code to train the model, and then saved it in the managed folder, that would allow you to retrain the model easily.

Once this is done, you can create a python api endpoint, this endpoint will load the model from the managed folder and use it to score the request data (https://doc.dataiku.com/dss/latest/apinode/endpoint-python-function.html#using-managed-folders). In the working folders setting, add the managed folder that contains your model. 

Then in your endpoint code you can adopt the following structure:

import dataiku
import os
from dataiku.apinode import utils

# Get model from folder
model_folder = folders[0]
model_path = os.path.join(model_folder,"model.pkl")
with open(model_path, "rb") as input_file:
    model = pickle.load(input_file)

def predict(input):
	# Preprocess input
	# Compute prediction
        prediction = model.predict(input)
	return prediction
Ajinkya_Bankar
Level 1

Hi @RohitRanga 

Can you please share a  use case for saving the trained scikit-learn model inside a managed folder as a .pkl file? Also, how did you load the saved model for a prediction? It would be great if you share some python code examples. Thanks in advance.

0 Kudos
RohitRanga
Level 3
Author

Thanks a lot!

RohitRanga
Level 3
Author

I was able to get this done, and the challenging aspect for me now, is to do the stuff after deployment - monitoring the performance of the model after deployment, capturing the inputs received by the model, detecting data drift and concept drift, etc. It would have been much simpler if multi-label classification was supported within the Lab. I could have followed the Dataiku Academy videos.

0 Kudos
hmjk347
Level 2

@RohitRanga , could you share your sample code , how were you able to do achieve this . Eager to know  as i also have similar requirement for my upcoming project.

 

Thanks

0 Kudos