I have pre-trained models that I would like to use in dataiku recipe.
Hi, I have pre-trained model on my local machine that I would like to use in a recipe. One model is trained using the alibi-detect library and the other one is the popular SAM model. Appreciate any tips on how to use these models in a dataiku recipe.
Best Answer
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @pbena64
,You mentioned in your original post that the model is on your local machine. Please see the following documentation to see the difference between read/write APIs for local vs non-local files: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html?_gl=1*mji5ks*_ga*MTU1ODM1OTc5OS4xNjkzMzMzMDMw*_ga_B3YXRYMY48*MTcwMzExMjIzOC4zNzkuMS4xNzAzMTEzMTMwLjU4LjAuMA..#local-vs-non-local
You'll need to use the get_download_stream() API: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_download_stream
import dataiku import pickle remote_folder = dataiku.Folder("pkl-models") # This is a managed folder on S3 connection # Save pickle with remote_folder.get_writer("/test-model.pkl") as writer: pickle.dump(clf, writer) # Assuming clf is a sklearn object # Load pickle with remote_folder.get_download_stream('test-model.pkl') as f: clf_loaded = pickle.load(f)
Thanks,
Jordan
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @pbena64
,The DSS Developer Docs includes multiple examples of loading and re-using pretrained models: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/index.html
You can load your upload your model to DSS Managed folder on the local filesystem and load it into a python notebook or recipe: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_path
Thanks,
Jordan
-
Hi @JordanB
,Thanks for the help. My problem is similar to the second paragraph of your reply. However, the link takes me to a page with the following statement:
"This method can only be called for managed folders that are stored on the local filesystem of the DSS server. For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs."
which is the case for me. I would appreciate it if you could please point me to the read/write APIs mentioned in the quote.
Thanks!
-
Hi @JordanB
,Thanks for the response. It solved my problem. However, the issue I have now is that Pickle is available only for TensorFlow (keras specifically)=>2.13.0, while the latest available I can get with the env installation is 2.12.0.
I will start a new question for this.
Regards,