Dynamic Number of Output Roles
Is it possible to set a dynamic number of output roles? Perhaps by using python to create the recipe.json file?
Answers
-
Hi,
What are you trying to achieve?
Input and output roles in a plugin recipe are fixed in number and type (this is by design). But there maybe other ways to achieve your goal, depending on the context.
Cheers,
Alex
-
We want to sync an SFTP folder to to Redshift.
Right now we are downloading from SFTP to a project folder. Then we are creating a recipe to sync all files to S3 then Redshift. (We are then going to package this flow into a macro)
Right now we have to add a specific output for each file, as well as rename accordingly. I want to use python to read the folder contents and create outputs based on the counts and file names. Is this possible?
-
Hi,
I would advise to package this code as a macro, not as a recipe. You can have the macro take as input the folder, and then automatically creating the Sync recipes and outputs dynamically.
Alternatively, it is possible from a recipe to write to outputs which are not declared. However you would lose the data lineage which the flow provides, so I wouldn’t recommend it.
Additional question: what are the different files in the input folder? Are we talking about files of the same schema recorded at different dates? In that case, there may be a better solution based in partitioning. I can further explain if needed.
Hope it helps,
Alex
-
Hi,
Thanks for the help. The folder contents are mapping codes/helper tables that are subject to change by the user.
Yes I think the macro is the way to go, as we would be using this in 75% of our projects going forward. Do you recommend any python dss methods for download from sftp > sync to s3 > sync to redshift?
-
Hi,
Gotcha. In that case, indeed the macro is the best way to go. Given your use case is about syncing data to different connections, I recommend to:
1. Create datasets dynamically using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#creating-datasets.
2. Link these datasets with Sync recipes using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/recipes.html#example-creating-a-sync-recipe
Hope it helps,
Alex
-
I notice how recipe = builder.build() only creates the recipe, and doesn't actually run it?
I'm left with empty datasets that still need to be built?
Am I missing a final method?
-
Hi,
By design, you cannot run a recipe from the recipe itself. After creating the recipe and its outputs, you need to create a job which builds them.
As you are developing a macro, then you can use this API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/jobs.html#starting-new-jobs
Hope it helps,
Alex
-
This worked beautifully thanks!
My last step would be dynamically running a download recipe within the macro and parameterizing the folder path, etc. Open to workarounds as well.
-
Hi,
The DownloadRecipeCreator class should come handy. Do not hesitate to create some recipes manually and use get_definition_and_payload to understand the expected structure as every type of recipe expects specific definition dictionaries.
Hope it helps,
Alex