Model deploy
For deploying a model from DSS, is it currently mandatory to go via the Lab feature? As in, do we have to train a model from inside the Lab?
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
I see you found this thread.
You did not mention anything about what kind of model you want to use. Is it something like a Scikit learn model. Or are you trying to do something like transfer learning. You also did not mention how integrated you would like the model to be with other standard Dataiku DSS features. So you can call almost any existing model from a Jupiter notebook or python or R recipe. However those models may not be able to fully integrate with DSS’s model management tools.
There are definitely some ways to import models trained elsewhere, and get more of that functionality. I’m not an expert on this topic.
However check out this thread.
This references this example flow.
Here is some documentation on evaluating external model. I note that the documentation calls this out as a “very advanced topic”
https://doc.dataiku.com/dss/latest/python-api/saved_models.html
Let us know a bit more about what you are trying to do. There may be some folks here who are able to help you. Second, if you are using the paid version of Dataiku DSS, you may want to reach out to the support team. They can be very helpful.
-
Thanks @tgb417
for the prompt support once again!Yes I was referring to a scikit-learn model. I have it trained using my own code logic (usual stuff, nothing fancy) without using any 'Lab' features. I have it saved inside a Managed folder. I would like to deploy the model as a REST api. Let me know if you need more info. Thanks!
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
This describes how to integrate a different scikit learn model into DSS.
https://knowledge.dataiku.com/latest/courses/advanced-code/custom-models/custom-model.html
But in this use case you are likely to retrain.Here is some more info.
https://doc.dataiku.com/dss/latest/machine-learning/custom-models.html
This is about deploying arbitrary code via the api.
https://doc.dataiku.com/dss/latest/machine-learning/custom-models.html
And this may be the info you need.
https://doc.dataiku.com/dss/latest/apinode/endpoint-python-prediction.html
I would also open a support ticket with your specific use case the team at Dataiku has been particularly supportive. -
@tgb417
I have been working on this recently and there are some follow-up points:- The trained scikit-learn model was saved inside a managed folder as a .pkl file
- Creating custom models (https://knowledge.dataiku.com/latest/courses/advanced-code/custom-models/custom-model.html) does not help since I am training a multi-label classifier which is not supported from the Lab. This means there would be no "Saved Model" in the flow, and I would not be able to use the native Predict, Train, Score and Evaluate recipes. Please correct me if I am wrong or missed something here.
- I had to use the other link you shared (https://doc.dataiku.com/dss/latest/apinode/endpoint-python-prediction.html)
-
Hi,
If your model is already trained and saved as a pickle file, the quickest option would be to use a python prediction endpoint. To do so, you need to add a managed folder to your flow, with your .pkl file in it. You could even create a python recipe that takes a dataset as input, uses your code to train the model, and then saved it in the managed folder, that would allow you to retrain the model easily.
Once this is done, you can create a python api endpoint, this endpoint will load the model from the managed folder and use it to score the request data (https://doc.dataiku.com/dss/latest/apinode/endpoint-python-function.html#using-managed-folders). In the working folders setting, add the managed folder that contains your model.
Then in your endpoint code you can adopt the following structure:import dataiku import os from dataiku.apinode import utils # Get model from folder model_folder = folders[0] model_path = os.path.join(model_folder,"model.pkl") with open(model_path, "rb") as input_file: model = pickle.load(input_file) def predict(input): # Preprocess input # Compute prediction prediction = model.predict(input) return prediction
-
Thanks a lot!
-
I was able to get this done, and the challenging aspect for me now, is to do the stuff after deployment - monitoring the performance of the model after deployment, capturing the inputs received by the model, detecting data drift and concept drift, etc. It would have been much simpler if multi-label classification was supported within the Lab. I could have followed the Dataiku Academy videos.
-
Hi @RohitRanga
Can you please share a use case for saving the trained scikit-learn model inside a managed folder as a .pkl file? Also, how did you load the saved model for a prediction? It would be great if you share some python code examples. Thanks in advance.
-
@RohitRanga
, could you share your sample code , how were you able to do achieve this . Eager to know as i also have similar requirement for my upcoming project.Thanks