Join us on Wednesday, June 3rd for a deep dive into Customer Predictive Analytics Learn more

Dynamic Number of Output Roles

Level 3
Dynamic Number of Output Roles

Is it possible to set a dynamic number of output roles? Perhaps by using python to create the recipe.json file? 

 

9 Replies
Dataiker
Dataiker

Hi,

What are you trying to achieve?

Input and output roles in a plugin recipe are fixed in number and type (this is by design). But there maybe other ways to achieve your goal, depending on the context.

Cheers,

Alex

Level 3
Author

We want to sync an SFTP folder to to Redshift.

Right now we are downloading from SFTP to a project folder. Then we are creating a recipe to sync all files to S3 then Redshift. (We are then going to package this flow into a macro)

Right now we have to add a specific output for each file, as well as rename accordingly.  I want to use python to read the folder contents and create outputs based on the counts and file names. Is this possible? 

Dataiker
Dataiker

Hi,

I would advise to package this code as a macro, not as a recipe. You can have the macro take as input the folder, and then automatically creating the Sync recipes and outputs dynamically.

Alternatively, it is possible from a recipe to write to outputs which are not declared. However you would lose the data lineage which the flow provides, so I wouldn’t recommend it.

Additional question: what are the different files in the input folder? Are we talking about files of the same schema recorded at different dates? In that case, there may be a better solution based in partitioning. I can further explain if needed.

Hope it helps,

Alex

Level 3
Author

Hi,

Thanks for the help. The folder contents are mapping codes/helper tables that are subject to change by the user. 

Yes I think the macro is the way to go, as we would be using this in 75% of our projects going forward.  Do you recommend any python dss methods for download from sftp > sync to s3 > sync to redshift?  

 

Dataiker
Dataiker

Hi,

Gotcha. In that case, indeed the macro is the best way to go. Given your use case is about syncing data to different connections, I recommend to:

1. Create datasets dynamically using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#creating-datasets.

2. Link these datasets with Sync recipes using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/recipes.html#example-creating-a-sync-r...

Hope it helps,

Alex

Level 3
Author

I notice how recipe = builder.build() only creates the recipe, and doesn't actually run it?

I'm left with empty datasets that still need to be built? 

Am I missing a final method?

Dataiker
Dataiker

Hi,

By design, you cannot run a recipe from the recipe itself.  After creating the recipe and its outputs, you need to create a job which builds them.

As you are developing a macro, then you can use this API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/jobs.html#starting-new-jobs

Hope it helps,

Alex

Level 3
Author

This worked beautifully thanks!

My last step would be dynamically running a download recipe within the macro and parameterizing the folder path, etc.  Open to workarounds as well. 

Dataiker
Dataiker

Hi,

The DownloadRecipeCreator class should come handy. Do not hesitate to create some recipes manually and use get_definition_and_payload  to understand the expected structure as every type of recipe expects specific definition dictionaries.

Hope it helps,

Alex