Dynamic Number of Output Roles

gblack686 · May 2020

Is it possible to set a dynamic number of output roles? Perhaps by using python to create the recipe.json file?

Alex_Combessie · May 2020

Hi,

What are you trying to achieve?

Input and output roles in a plugin recipe are fixed in number and type (this is by design). But there maybe other ways to achieve your goal, depending on the context.

Cheers,

Alex

gblack686 · May 2020

We want to sync an SFTP folder to to Redshift.

Right now we are downloading from SFTP to a project folder. Then we are creating a recipe to sync all files to S3 then Redshift. (We are then going to package this flow into a macro)

Right now we have to add a specific output for each file, as well as rename accordingly. I want to use python to read the folder contents and create outputs based on the counts and file names. Is this possible?

Alex_Combessie · May 2020

Hi,

I would advise to package this code as a macro, not as a recipe. You can have the macro take as input the folder, and then automatically creating the Sync recipes and outputs dynamically.

Alternatively, it is possible from a recipe to write to outputs which are not declared. However you would lose the data lineage which the flow provides, so I wouldn’t recommend it.

Additional question: what are the different files in the input folder? Are we talking about files of the same schema recorded at different dates? In that case, there may be a better solution based in partitioning. I can further explain if needed.

Hope it helps,

Alex

gblack686 · May 2020

Hi,

Thanks for the help. The folder contents are mapping codes/helper tables that are subject to change by the user.

Yes I think the macro is the way to go, as we would be using this in 75% of our projects going forward. Do you recommend any python dss methods for download from sftp > sync to s3 > sync to redshift?

Alex_Combessie · May 2020

Hi,

Gotcha. In that case, indeed the macro is the best way to go. Given your use case is about syncing data to different connections, I recommend to:

1. Create datasets dynamically using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#creating-datasets.

2. Link these datasets with Sync recipes using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/recipes.html#example-creating-a-sync-recipe

Hope it helps,

Alex

gblack686 · May 2020

I notice how recipe = builder.build() only creates the recipe, and doesn't actually run it?

I'm left with empty datasets that still need to be built?

Am I missing a final method?

Alex_Combessie · May 2020

Hi,

By design, you cannot run a recipe from the recipe itself. After creating the recipe and its outputs, you need to create a job which builds them.

As you are developing a macro, then you can use this API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/jobs.html#starting-new-jobs

Hope it helps,

Alex

gblack686 · May 2020

This worked beautifully thanks!

My last step would be dynamically running a download recipe within the macro and parameterizing the folder path, etc. Open to workarounds as well.

Alex_Combessie · May 2020

Hi,

The DownloadRecipeCreator class should come handy. Do not hesitate to create some recipes manually and use get_definition_and_payload to understand the expected structure as every type of recipe expects specific definition dictionaries.

Hope it helps,

Alex

Dynamic Number of Output Roles

Answers

Categories

Setup Info

Tags