Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello!
I'm trying to export a partitioned dataset (date range of months) to multiple CSV files.
However, I would like them all to end up in the same directory, with different file names, instead of ending up in sub folders, named according to the partitions.
In others words, I would want the folders in the following image to be files.
The current pattern I use in the output folder settings is: %Y-%M/.* - which is why I get the above result.
However, when I try to use something like this "%Y-%M.*" instead to get the result I would like, DSS throws the following error when running the export recipe.
Thank you very much in advance for your answer(s)!
Rémi
Hi @Rémi,
This is not currently possible with the Visual recipes hence the error "Partitioning scheme is not representable as folders". You will have to use a Python recipe to define the name of the files you want to export each partition for.
Here is a sample Python recipe that will write a file for each partition in a dataset based on a filter e.g 2022
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
output_folder_id = "replace_folder_id"
dataset_name = "replace_dataset_name"
dataset = dataiku.Dataset(dataset_name, ignore_flow=True)
df = dataset.get_dataframe()
# Write recipe outputs
output_folder = dataiku.Folder(output_folder_id)
folder_exports_info = output_folder.get_info()
partitions =dataset.list_partitions(raise_if_empty=True)
dataset_partitions_df = {}
for partition in partitions:
# filter here e.g want 2022
if partition.startswith('2022'):
dataset = dataiku.Dataset(dataset_name, ignore_flow=True)
dataset.add_read_partitions(partition)
dataset_partition_df = dataset.get_dataframe()
dataset_partitions_df[partition] = dataset_partition_df
#upload the csv to managed folder :
output_folder.upload_stream(partition + ".csv", dataset_partition_df.to_csv(index=False).encode("utf-8"))
This creates files with the full partition name:
Let me know if that helps!
Hi @Rémi,
This is not currently possible with the Visual recipes hence the error "Partitioning scheme is not representable as folders". You will have to use a Python recipe to define the name of the files you want to export each partition for.
Here is a sample Python recipe that will write a file for each partition in a dataset based on a filter e.g 2022
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
output_folder_id = "replace_folder_id"
dataset_name = "replace_dataset_name"
dataset = dataiku.Dataset(dataset_name, ignore_flow=True)
df = dataset.get_dataframe()
# Write recipe outputs
output_folder = dataiku.Folder(output_folder_id)
folder_exports_info = output_folder.get_info()
partitions =dataset.list_partitions(raise_if_empty=True)
dataset_partitions_df = {}
for partition in partitions:
# filter here e.g want 2022
if partition.startswith('2022'):
dataset = dataiku.Dataset(dataset_name, ignore_flow=True)
dataset.add_read_partitions(partition)
dataset_partition_df = dataset.get_dataframe()
dataset_partitions_df[partition] = dataset_partition_df
#upload the csv to managed folder :
output_folder.upload_stream(partition + ".csv", dataset_partition_df.to_csv(index=False).encode("utf-8"))
This creates files with the full partition name:
Let me know if that helps!
Thank you very much for your answer Alex!
Your Python code is very similar to what I ended up doing to bypass the problem 😉
Have a great day & weekend!
Rémi