Code Studio to load HD5 File

kai_fang
kai_fang Registered Posts: 7 ✭✭

We recently start to integrate Code Studio with our instance. And we use to save HD5 file in a managed folder, and use code like :
    with pd.HDFStore(pathToFile, mode='r', complevel=COMPRESSION_LEVEL,   complib=COMPRESSION_METHOD) as hdf:         hdf_keys = hdf.keys()

This works well with pathToFile generated from the managed folder by doing managed_folder.get_path() to find the corresponding folder.

When I move to use Code Studio, it turns out that I have to use managed_folder.get_download_stream() to access my data in managed folder, however, I do not find a good way to load the HDFStore with the stream data.

Do anyone face the same issue before and would really appreciate if you could share your approach to load the HDStore file.

Best,

Kai

Operating system used: Linux

Answers

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker

    Hi @kai_fang,

    pandas.HDFStore requires a local file path, so in order to load if from a managed folder, we first need to copy it from the folder to a temporary local file.

    For example:

    import os
    import shutil
    import tempfile import dataiku
    import pandas as pd folder = dataiku.Folder("MY_FOLDER") with tempfile.TemporaryDirectory() as temp_dir:
    temp_path = os.path.join(temp_dir, "temp.h5") # Copy the file from the remote folder to a temp local file with folder.get_download_stream("FILE_IN_FOLDER.h5") as folder_stream: with open(temp_path, "wb") as temp_stream: shutil.copyfileobj(folder_stream, temp_stream) # Load the temp HDF file with pd.HDFStore(temp_path, mode="r") as hdf: hdf_keys = hdf.keys()

Setup Info
    Tags
      Help me…