Create output dataset with current date as suffix to its name

sonal_18 · February 2022

Hello Team,

We are creating datasets through dataiku flows and storing it inside S3 bucket.

Our use case is we want to name our datasets having the current month_year as suffix to it.

Suppose we build dataset called "xyz" on January 2022 then the dataset's name should be "xyz_01_22".

We are able to do this manually by using global variables to store the dates manually and using it inside the dataset's setting with the help of "$".

Could you please help us on how we can automate it?

Sarina · February 2022

Hi @sonal_18
,

If you would be up for a slightly different structure, you could take advantage of file partitioning and write to an output partition of MM_YY. For example, with a dataset that is set up in this format, the output files will have the structure of MM_YY/out-s0.csv.gz:

Screen Shot 2022-02-21 at 4.50.11 PM.png

However, it is not possible to change the actual filename using this method. Instead you can only set the parent subfolders. If setting the filename itself is important to you, then the main option would be to create the file through a Python recipe, so that you can specify the output filename. You could use he managed folder APIs to write to an output managed folder on S3.

Please let me know if you have any questions about these approaches. It might help to see some screenshots of your flow as well, in order to understand your exact use case.

Thank you!
Sarina

Create output dataset with current date as suffix to its name

Answers

Categories

Setup Info

Tags