Manage folder partition in python recipes
Hello,
I have a partitionned folder and I need to python recipe to create the associated partitionned dataset.
In the notebook, I can use the following code :
*******
files = dataiku.Folder("xxxx")
files_info = files.get_info()
#chemin des fichiers
paths=files.list_paths_in_partition()
df_measure=pd.DataFrame()
for itemName in paths:
with files.get_download_stream(itemName) as j:
contents=j.read()
parsed_json=json.loads(contents)
...
df_measure=df_measure.append(sub_df, ignore_index=True)
# Write recipe outputs
measurement = dataiku.Dataset("Measurement")
measurement.write_with_schema(df_measure)
*******
But, when back in the recipe, the partitions are managed by DSS.
I also need to remove files.list_paths_in_partition() and for itemName in paths.
How can I load the right file in files.get_download_stream(itemName) ???
Thanks a lot
Best regards
Answers
-
In the actions of your partitioned folder, you can pick the "create dataset" one. This will create a dataset which is merely a view of the files in the folder. You can then activate partitioning on this dataset.