Defining a global variable in the base name of the output file for a dataset

BrianC · December 19

Hello

I am working on a flow that has a python recipe that sets global variables. In the output dataset of the recipe a couple of these variables are being used to set the path and filename of the dataset which is stored in Azure.

From researching on how to define the filename it states to set the "Force single output file" checked and the "Output file name" to a value in the Settings/Advanced section. When the Output file name is hardcoded to a value it works fine.

What is really the ideal solution is to include a datetime as well to the filename. It has been found that one can define this as something like ${current_date} to indicate the global variable to be used as part of the filename. So, one would have it defined as filename_${current_date} as the basename. Unfortunately, when running the recipe, the global variable value is not being used, and the output file looks like filename_${current_date}.csv.gz instead of something like filename_20241219192359.csv.gz. I also saw other answers as using {{ }} or $${ } around the variable but there is no difference in the results. Also, I have tried it with a dataset in a managed filesystem with the same result. It appears that it is simply taking the full output file base name as a string.

Is there a way to do this here? Is there a setting that is being missed in allowing this to work? I can do this in the python recipe with a "dummy" output dataset I suppose but just wondering if it can be done as described above first. Also, the same variable mentioned above (defined as ${current_date}) is used in the path of the container and is working fine in that the value is being replaced.

Thanks

Turribeach · December 19

You are going about the wrong way I think. Dataiku datasets aren't meant to be used to publish a file to feed an external system, they for Dataiku's exclusive use. You have no control on the format of the dataset file and it is something Dataiku could change at any point. And as you have found out you can't control the file name dynamically. What you should do is create a managed folder in your Azure connection and have your Python recipe write a file directly to the managed folder. You will not only be able to fully control the file format but also the file name. Sample code below, which you can obviously customise to use your variables:

output_folder = dataiku.Folder("folder_name")
filename = "my_file.csv"
output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))

BrianC · December 20

https://community.dataiku.com/discussion/comment/45095#Comment_45095

Thanks for the follow up. I set it up in a Python recipe and it works fine. Thanks.

Defining a global variable in the base name of the output file for a dataset

Answers

Categories

Setup Info

Tags