Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I see you found this thread.
You did not mention anything about what kind of model you want to use. Is it something like a Scikit learn model. Or are you trying to do something like transfer learning. You also did not mention how integrated you would like the model to be with other standard Dataiku DSS features. So you can call almost any existing model from a Jupiter notebook or python or R recipe. However those models may not be able to fully integrate with DSS’s model management tools.
There are definitely some ways to import models trained elsewhere, and get more of that functionality. I’m not an expert on this topic.
However check out this thread.
This references this example flow.
Here is some documentation on evaluating external model. I note that the documentation calls this out as a “very advanced topic”
Let us know a bit more about what you are trying to do. There may be some folks here who are able to help you. Second, if you are using the paid version of Dataiku DSS, you may want to reach out to the support team. They can be very helpful.
Thanks @tgb417 for the prompt support once again!
Yes I was referring to a scikit-learn model. I have it trained using my own code logic (usual stuff, nothing fancy) without using any 'Lab' features. I have it saved inside a Managed folder. I would like to deploy the model as a REST api. Let me know if you need more info. Thanks!
This describes how to integrate a different scikit learn model into DSS.
But in this use case you are likely to retrain.
Here is some more info.
This is about deploying arbitrary code via the api.
And this may be the info you need.
I would also open a support ticket with your specific use case the team at Dataiku has been particularly supportive.
@tgb417 I have been working on this recently and there are some follow-up points:
- The trained scikit-learn model was saved inside a managed folder as a .pkl file
- Creating custom models (https://knowledge.dataiku.com/latest/courses/advanced-code/custom-models/custom-model.html) does not help since I am training a multi-label classifier which is not supported from the Lab. This means there would be no "Saved Model" in the flow, and I would not be able to use the native Predict, Train, Score and Evaluate recipes. Please correct me if I am wrong or missed something here.
- I had to use the other link you shared (https://doc.dataiku.com/dss/latest/apinode/endpoint-python-prediction.html)
If your model is already trained and saved as a pickle file, the quickest option would be to use a python prediction endpoint. To do so, you need to add a managed folder to your flow, with your .pkl file in it. You could even create a python recipe that takes a dataset as input, uses your code to train the model, and then saved it in the managed folder, that would allow you to retrain the model easily.
Once this is done, you can create a python api endpoint, this endpoint will load the model from the managed folder and use it to score the request data (https://doc.dataiku.com/dss/latest/apinode/endpoint-python-function.html#using-managed-folders). In the working folders setting, add the managed folder that contains your model.
Then in your endpoint code you can adopt the following structure:
import dataiku import os from dataiku.apinode import utils # Get model from folder model_folder = folders model_path = os.path.join(model_folder,"model.pkl") with open(model_path, "rb") as input_file: model = pickle.load(input_file) def predict(input): # Preprocess input # Compute prediction prediction = model.predict(input) return prediction
Can you please share a use case for saving the trained scikit-learn model inside a managed folder as a .pkl file? Also, how did you load the saved model for a prediction? It would be great if you share some python code examples. Thanks in advance.
I was able to get this done, and the challenging aspect for me now, is to do the stuff after deployment - monitoring the performance of the model after deployment, capturing the inputs received by the model, detecting data drift and concept drift, etc. It would have been much simpler if multi-label classification was supported within the Lab. I could have followed the Dataiku Academy videos.