Saving model with HDFS
haoxian
Registered Posts: 13 ✭✭✭
Hello,
My company blocked local server filesystem.
I would like to know how to save sklearn, gensim, pyspark models in this situation.
Is this still possible?
I was trying to save the model this way:
# Recipe outputs model_scikit = dataiku.Folder("PRNy6bsT").get_path() for file in os.listdir(model_scikit): try: os.remove(file) except: pass serials = [ {'pkl': 'schema.pkl', 'obj': SCHEMA}, {'pkl': 'trf_num.pkl', 'obj': trf_num}, {'pkl': 'trf_cat.pkl', 'obj': trf_cat}, {'pkl': 'model.pkl', 'obj': gs.best_estimator_}, ] for serial in serials: fp = os.path.join(model_scikit, serial['pkl']) joblib.dump(serial['obj'], fp)
Thank you very much.
Sincerely,
HW
Best Answer
-
Hi,
If you want to write to a managed folder that's not based on the local filesystem you'd need to use upload_stream and get_download_stream methods of the Folder object. It'll allow you to write and read data through DSS.
Regards