Automating dataset exports on a monthly basis
@Turribeach
apologies for tagging, but noticed you are the most experienced user i've found so far on Dataiku and would really appreciate your support on this.
Currently trying to figure out a way to do some exports for a specific dataset - exports to be generated through .xlsx format by triggering an automated scenario or something similar, on a monthly basis and to be saved either on my desktop locally or throughout a folder i would create in Dataiku.
Tried already to create an automated scenario and add a trigger, but did not found a way to link it to the dataset, also tried it through a python script but also without success as asking to install a module and i have no admin rights and so on and even so, not sure if that would work entirely in the end.
Unfortunately, i'm not an experienced user, different background, currently exploring Dataiku on a project and no one else to go to for support.
All the best,
Alin
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Welcome to the Dataiku Community. We are so glad to have you join us.
First I regularly do exports of DSS Datasets to file both on local drives and on Google Drive. I’ve also done SFTP sites as well.
There are a number of code and visual approaches.
Here is a discussion of the export to folder visual recipe that can e used
Note the export to folder is lower on the right side under “Other Recipes” not in the top right.
If you are using a partitioned dataset you can also for example create a new file in it own monthly separate folder. I use this approach for Archives some of my datasets.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Hi, in general exporting to Excel is a dark pattern, specially if you are doing it in a scheduled basis which indicates some sort of business or system process is depending on this file. It will be much better to bring those users into Dataiku and having them check the data in Dataiku. You company has spent significant money on Dataiku licenses to move away from Excel solutions. If you end up creating Excel files as output I think you are going against that pattern.
In any case it's not my call in how you should use Dataiku. iTom has already covered the main options available, either you use an Export to Folder or a Python recipe. In order to add a dataset to a scenario so that it's "refreshed" you need to add it to a Build scenario step in a Scenario. This also works for Dataiku folders so if you have an Export to Folder recipe which outputs the Excel file into a Dataiku folder simply add the folder to the scenario Buid step and Dataiku will export the file to Excel everytime the scenario runs. As noted by the post Tom linked "The Export to Folder recipe does use the input dataset name as the exported filename for any files exported to the folder". So if you need to save the Excel file with a dynamic file name (file_2024-06-01.xls) you will need to use a Python recipe. To avoid having to have special packages in the Python code environment you could first use the Export to Folder recipe to export the dataset to Excel in a folder and then use a Python recipe to simply copy the file from one folder to the next, no special packages needed. Below is a screen shot of a proposed flow.With regards to using a local folder in your computer this is not something you can easily do as Dataiku will not normally have access to your local folders. I suggest you use a Dataiku folder.