How to automate multiple runs of a flow with different source file each time

Darren_Gage
Darren_Gage Registered Posts: 11 ✭✭✭✭
Hi

I have a flow that I want to run multiple times (manually triggered), each run requires a new source file. I'd like to load the files into a folder (file_A, file_X, file_c etc) and then kick off the flow to automatically run through each file in turn, so run_1 uses file_A and once complete it starts again with run_2 using file_X etc etc

What is the best way to achieve this? I was thinking uploading all the files to a folder then passing in a list of file names (either in a csv, or editable dataset or pasted into a script) and for each file name the flow would run and look for that file in the folder - maybe passing each file name in turn as a variable? - I just don't know how to achieve this or if there is a better way.

Any advice (with code tips if possible) would be greatly appreciated.

Thank you

Answers

  • Mattsco
    Mattsco Dataiker, Registered Posts: 125 Dataiker

    Hi Darren,

    This looks like a good use case for the partitioning features of DSS.

    Partitioning allows you to run the same recipes in parallel specifying the partitions you want to run.

    In your case you would have a folder with the files:

    And from the settings of the folder you can define a partitioning pattern:

    Then if you create a recipe with the output partitions by the same dimension you can run the recipes in parallel:

    You can check the documentation on this topic, it's pretty advanced!

    https://doc.dataiku.com/dss/latest/partitions/index.html

    Matt

  • Darren_Gage
    Darren_Gage Registered Posts: 11 ✭✭✭✭
    Hi Matt

    Thanks for this detailed reply, I'll take a look at partioning to see if this will help - does look advanced for me at this time. You mentioned "parallel" in this solution but I believe I need to run these sequentially as the output from the first run with File_A is then used as one of the inputs for the next run eg File_B (I'm identifying changes between the subsequent files) apologies for not labouring this in the question. Do you have any additional ideas? Thanks
  • Mattsco
    Mattsco Dataiker, Registered Posts: 125 Dataiker
    So for that I would just import file_A and file_B as 2 datasets, do stuff on the first dataset and maybe join it with the second one.
    Sorry it's difficult to understand the use case from my side.
    If you want to give me more details you can contact me at matt@dataiku.com
Setup Info
    Tags
      Help me…