Saving model with HDFS

Options
haoxian
haoxian Registered Posts: 13 ✭✭✭
edited July 16 in Using Dataiku

Hello,

My company blocked local server filesystem.
I would like to know how to save sklearn, gensim, pyspark models in this situation.
Is this still possible?

I was trying to save the model this way:

# Recipe outputs
model_scikit = dataiku.Folder("PRNy6bsT").get_path()

for file in os.listdir(model_scikit):
    try: os.remove(file)
    except: pass

serials = [
    {'pkl': 'schema.pkl', 'obj': SCHEMA},
    {'pkl': 'trf_num.pkl', 'obj': trf_num},
    {'pkl': 'trf_cat.pkl', 'obj': trf_cat},
    {'pkl': 'model.pkl', 'obj': gs.best_estimator_},
]

for serial in serials:
    fp = os.path.join(model_scikit, serial['pkl'])
    joblib.dump(serial['obj'], fp)

Thank you very much.

Sincerely,

HW

Best Answer

  • Andrey
    Andrey Dataiker Alumni Posts: 119 ✭✭✭✭✭✭✭
    Answer ✓
    Options

    Hi,

    If you want to write to a managed folder that's not based on the local filesystem you'd need to use upload_stream and get_download_stream methods of the Folder object. It'll allow you to write and read data through DSS.

    Regards

Setup Info
    Tags
      Help me…