Code Studio to load HD5 File
We recently start to integrate Code Studio with our instance. And we use to save HD5 file in a managed folder, and use code like :
with pd.HDFStore(pathToFile, mode='r', complevel=COMPRESSION_LEVEL, complib=COMPRESSION_METHOD) as hdf: hdf_keys = hdf.keys()
This works well with pathToFile generated from the managed folder by doing managed_folder.get_path() to find the corresponding folder.
When I move to use Code Studio, it turns out that I have to use managed_folder.get_download_stream() to access my data in managed folder, however, I do not find a good way to load the HDFStore with the stream data.
Do anyone face the same issue before and would really appreciate if you could share your approach to load the HDStore file.
Best,
Kai
Operating system used: Linux
Answers
-
Hi @kai_fang,
pandas.HDFStore
requires a local file path, so in order to load if from a managed folder, we first need to copy it from the folder to a temporary local file.For example:
import os
import shutil
import tempfile import dataiku
import pandas as pd folder = dataiku.Folder("MY_FOLDER") with tempfile.TemporaryDirectory() as temp_dir:
temp_path = os.path.join(temp_dir, "temp.h5") # Copy the file from the remote folder to a temp local file with folder.get_download_stream("FILE_IN_FOLDER.h5") as folder_stream: with open(temp_path, "wb") as temp_stream: shutil.copyfileobj(folder_stream, temp_stream) # Load the temp HDF file with pd.HDFStore(temp_path, mode="r") as hdf: hdf_keys = hdf.keys()