Dynamic Number of Output Roles

gblack686
Level 4
Dynamic Number of Output Roles

Is it possible to set a dynamic number of output roles? Perhaps by using python to create the recipe.json file? 

 

9 Replies
Alex_Combessie
Dataiker Alumni

Hi,

What are you trying to achieve?

Input and output roles in a plugin recipe are fixed in number and type (this is by design). But there maybe other ways to achieve your goal, depending on the context.

Cheers,

Alex

gblack686
Level 4
Author

We want to sync an SFTP folder to to Redshift.

Right now we are downloading from SFTP to a project folder. Then we are creating a recipe to sync all files to S3 then Redshift. (We are then going to package this flow into a macro)

Right now we have to add a specific output for each file, as well as rename accordingly.  I want to use python to read the folder contents and create outputs based on the counts and file names. Is this possible? 

Alex_Combessie
Dataiker Alumni

Hi,

I would advise to package this code as a macro, not as a recipe. You can have the macro take as input the folder, and then automatically creating the Sync recipes and outputs dynamically.

Alternatively, it is possible from a recipe to write to outputs which are not declared. However you would lose the data lineage which the flow provides, so I wouldnโ€™t recommend it.

Additional question: what are the different files in the input folder? Are we talking about files of the same schema recorded at different dates? In that case, there may be a better solution based in partitioning. I can further explain if needed.

Hope it helps,

Alex

gblack686
Level 4
Author

Hi,

Thanks for the help. The folder contents are mapping codes/helper tables that are subject to change by the user. 

Yes I think the macro is the way to go, as we would be using this in 75% of our projects going forward.  Do you recommend any python dss methods for download from sftp > sync to s3 > sync to redshift?  

 

Alex_Combessie
Dataiker Alumni

Hi,

Gotcha. In that case, indeed the macro is the best way to go. Given your use case is about syncing data to different connections, I recommend to:

1. Create datasets dynamically using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#creating-datasets.

2. Link these datasets with Sync recipes using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/recipes.html#example-creating-a-sync-r...

Hope it helps,

Alex

gblack686
Level 4
Author

I notice how recipe = builder.build() only creates the recipe, and doesn't actually run it?

I'm left with empty datasets that still need to be built? 

Am I missing a final method?

Alex_Combessie
Dataiker Alumni

Hi,

By design, you cannot run a recipe from the recipe itself.  After creating the recipe and its outputs, you need to create a job which builds them.

As you are developing a macro, then you can use this API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/jobs.html#starting-new-jobs

Hope it helps,

Alex

gblack686
Level 4
Author

This worked beautifully thanks!

My last step would be dynamically running a download recipe within the macro and parameterizing the folder path, etc.  Open to workarounds as well. 

Alex_Combessie
Dataiker Alumni

Hi,

The DownloadRecipeCreator class should come handy. Do not hesitate to create some recipes manually and use get_definition_and_payload  to understand the expected structure as every type of recipe expects specific definition dictionaries.

Hope it helps,

Alex