Defining a global variable in the base name of the output file for a dataset

BrianC
BrianC Registered Posts: 2 ✭✭

Hello

I am working on a flow that has a python recipe that sets global variables. In the output dataset of the recipe a couple of these variables are being used to set the path and filename of the dataset which is stored in Azure.

From researching on how to define the filename it states to set the "Force single output file" checked and the "Output file name" to a value in the Settings/Advanced section. When the Output file name is hardcoded to a value it works fine.

What is really the ideal solution is to include a datetime as well to the filename. It has been found that one can define this as something like ${current_date} to indicate the global variable to be used as part of the filename. So, one would have it defined as filename_${current_date} as the basename. Unfortunately, when running the recipe, the global variable value is not being used, and the output file looks like filename_${current_date}.csv.gz instead of something like filename_20241219192359.csv.gz. I also saw other answers as using {{ }} or $${ } around the variable but there is no difference in the results. Also, I have tried it with a dataset in a managed filesystem with the same result. It appears that it is simply taking the full output file base name as a string.

Is there a way to do this here? Is there a setting that is being missed in allowing this to work? I can do this in the python recipe with a "dummy" output dataset I suppose but just wondering if it can be done as described above first. Also, the same variable mentioned above (defined as ${current_date}) is used in the path of the container and is working fine in that the value is being replaced.

Thanks

Tagged:

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
    edited December 19

    You are going about the wrong way I think. Dataiku datasets aren't meant to be used to publish a file to feed an external system, they for Dataiku's exclusive use. You have no control on the format of the dataset file and it is something Dataiku could change at any point. And as you have found out you can't control the file name dynamically. What you should do is create a managed folder in your Azure connection and have your Python recipe write a file directly to the managed folder. You will not only be able to fully control the file format but also the file name. Sample code below, which you can obviously customise to use your variables:

    output_folder = dataiku.Folder("folder_name")
    filename = "my_file.csv"
    output_folder.upload_data(filename, df.to_csv(index=False).encode("utf-8"))
    

  • BrianC
    BrianC Registered Posts: 2 ✭✭

    Thanks for the follow up. I set it up in a Python recipe and it works fine. Thanks.

Setup Info
    Tags
      Help me…