Advanced Designer Learning Path is now live! Read More

Saving model with HDFS

Level 2
Saving model with HDFS


My company blocked local server filesystem. 
I would like to know how to save sklearn, gensim, pyspark models in this situation. 
Is this still possible? 

I was trying to save the model this way: 

# Recipe outputs
model_scikit = dataiku.Folder("PRNy6bsT").get_path()

for file in os.listdir(model_scikit):
    try: os.remove(file)
    except: pass

serials = [
    {'pkl': 'schema.pkl', 'obj': SCHEMA},
    {'pkl': 'trf_num.pkl', 'obj': trf_num},
    {'pkl': 'trf_cat.pkl', 'obj': trf_cat},
    {'pkl': 'model.pkl', 'obj': gs.best_estimator_},

for serial in serials:
    fp = os.path.join(model_scikit, serial['pkl'])
    joblib.dump(serial['obj'], fp)

Thank you very much. 



0 Kudos
1 Reply


If you want to write to a managed folder that's not based on the local filesystem you'd need to use upload_stream and get_download_stream methods of the Folder object. It'll allow you to write and read data through DSS.


Andrey Avtomonov
R&D Engineer @ Dataiku
0 Kudos
A banner prompting to get Dataiku DSS