Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I have a partitionned folder and I need to python recipe to create the associated partitionned dataset.
In the notebook, I can use the following code :
*******
files = dataiku.Folder("xxxx")
files_info = files.get_info()
#chemin des fichiers
paths=files.list_paths_in_partition()
df_measure=pd.DataFrame()
for itemName in paths:
with files.get_download_stream(itemName) as j:
contents=j.read()
parsed_json=json.loads(contents)
...
df_measure=df_measure.append(sub_df, ignore_index=True)
# Write recipe outputs
measurement = dataiku.Dataset("Measurement")
measurement.write_with_schema(df_measure)
*******
But, when back in the recipe, the partitions are managed by DSS.
I also need to remove files.list_paths_in_partition() and for itemName in paths.
How can I load the right file in files.get_download_stream(itemName) ???
Thanks a lot
Best regards
In the actions of your partitioned folder, you can pick the "create dataset" one. This will create a dataset which is merely a view of the files in the folder. You can then activate partitioning on this dataset.