Manage folder partition in python recipes

mline · June 2020

Hello,
I have a partitionned folder and I need to python recipe to create the associated partitionned dataset.
In the notebook, I can use the following code :

*******

files = dataiku.Folder("xxxx")
files_info = files.get_info()

#chemin des fichiers
paths=files.list_paths_in_partition()

df_measure=pd.DataFrame()
for itemName in paths:

with files.get_download_stream(itemName) as j:
contents=j.read()
parsed_json=json.loads(contents)

...
df_measure=df_measure.append(sub_df, ignore_index=True)

# Write recipe outputs
measurement = dataiku.Dataset("Measurement")
measurement.write_with_schema(df_measure)

*******

But, when back in the recipe, the partitions are managed by DSS.
I also need to remove files.list_paths_in_partition() and for itemName in paths.

How can I load the right file in files.get_download_stream(itemName) ???

Thanks a lot

Best regards

fchataigner2 · June 2020

In the actions of your partitioned folder, you can pick the "create dataset" one. This will create a dataset which is merely a view of the files in the folder. You can then activate partitioning on this dataset.

Manage folder partition in python recipes

Answers

Categories

Setup Info

Tags