Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi, I have pre-trained model on my local machine that I would like to use in a recipe. One model is trained using the alibi-detect library and the other one is the popular SAM model. Appreciate any tips on how to use these models in a dataiku recipe.
Hi @pbena64,
You mentioned in your original post that the model is on your local machine. Please see the following documentation to see the difference between read/write APIs for local vs non-local files: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html?_gl=1*mji5ks*_ga*MTU1ODM1OTc5OS4x...
You'll need to use the get_download_stream() API: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_do...
import dataiku
import pickle
remote_folder = dataiku.Folder("pkl-models") # This is a managed folder on S3 connection
# Save pickle
with remote_folder.get_writer("/test-model.pkl") as writer:
pickle.dump(clf, writer) # Assuming clf is a sklearn object
# Load pickle
with remote_folder.get_download_stream('test-model.pkl') as f:
clf_loaded = pickle.load(f)
Thanks,
Jordan
Hi @pbena64,
The DSS Developer Docs includes multiple examples of loading and re-using pretrained models: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/index.html
You can load your upload your model to DSS Managed folder on the local filesystem and load it into a python notebook or recipe: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_pa...
Thanks,
Jordan
Hi @JordanB,
Thanks for the help. My problem is similar to the second paragraph of your reply. However, the link takes me to a page with the following statement:
"This method can only be called for managed folders that are stored on the local filesystem of the DSS server. For non-filesystem managed folders (HDFS, S3, โฆ), you need to use the various read/download and write/upload APIs."
which is the case for me. I would appreciate it if you could please point me to the read/write APIs mentioned in the quote.
Thanks!
Hi @pbena64,
You mentioned in your original post that the model is on your local machine. Please see the following documentation to see the difference between read/write APIs for local vs non-local files: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html?_gl=1*mji5ks*_ga*MTU1ODM1OTc5OS4x...
You'll need to use the get_download_stream() API: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_do...
import dataiku
import pickle
remote_folder = dataiku.Folder("pkl-models") # This is a managed folder on S3 connection
# Save pickle
with remote_folder.get_writer("/test-model.pkl") as writer:
pickle.dump(clf, writer) # Assuming clf is a sklearn object
# Load pickle
with remote_folder.get_download_stream('test-model.pkl') as f:
clf_loaded = pickle.load(f)
Thanks,
Jordan
Hi @JordanB,
Thanks for the response. It solved my problem. However, the issue I have now is that Pickle is available only for TensorFlow (keras specifically)=>2.13.0, while the latest available I can get with the env installation is 2.12.0.
I will start a new question for this.
Regards,