Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Saving model with HDFS

Level 2
Saving model with HDFS

Hello, 

My company blocked local server filesystem. 
I would like to know how to save sklearn, gensim, pyspark models in this situation. 
Is this still possible? 

I was trying to save the model this way: 

# Recipe outputs
model_scikit = dataiku.Folder("PRNy6bsT").get_path()

for file in os.listdir(model_scikit):
    try: os.remove(file)
    except: pass

serials = [
    {'pkl': 'schema.pkl', 'obj': SCHEMA},
    {'pkl': 'trf_num.pkl', 'obj': trf_num},
    {'pkl': 'trf_cat.pkl', 'obj': trf_cat},
    {'pkl': 'model.pkl', 'obj': gs.best_estimator_},
]

for serial in serials:
    fp = os.path.join(model_scikit, serial['pkl'])
    joblib.dump(serial['obj'], fp)

Thank you very much. 

Sincerely, 

HW

0 Kudos
1 Reply
Dataiker
Dataiker

Hi,

If you want to write to a managed folder that's not based on the local filesystem you'd need to use upload_stream and get_download_stream methods of the Folder object. It'll allow you to write and read data through DSS.

Regards

Andrey Avtomonov
R&D Engineer @ Dataiku
0 Kudos