Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Is it possible to set a dynamic number of output roles? Perhaps by using python to create the recipe.json file?
Hi,
What are you trying to achieve?
Input and output roles in a plugin recipe are fixed in number and type (this is by design). But there maybe other ways to achieve your goal, depending on the context.
Cheers,
Alex
We want to sync an SFTP folder to to Redshift.
Right now we are downloading from SFTP to a project folder. Then we are creating a recipe to sync all files to S3 then Redshift. (We are then going to package this flow into a macro)
Right now we have to add a specific output for each file, as well as rename accordingly. I want to use python to read the folder contents and create outputs based on the counts and file names. Is this possible?
Hi,
I would advise to package this code as a macro, not as a recipe. You can have the macro take as input the folder, and then automatically creating the Sync recipes and outputs dynamically.
Alternatively, it is possible from a recipe to write to outputs which are not declared. However you would lose the data lineage which the flow provides, so I wouldnโt recommend it.
Additional question: what are the different files in the input folder? Are we talking about files of the same schema recorded at different dates? In that case, there may be a better solution based in partitioning. I can further explain if needed.
Hope it helps,
Alex
Hi,
Thanks for the help. The folder contents are mapping codes/helper tables that are subject to change by the user.
Yes I think the macro is the way to go, as we would be using this in 75% of our projects going forward. Do you recommend any python dss methods for download from sftp > sync to s3 > sync to redshift?
Hi,
Gotcha. In that case, indeed the macro is the best way to go. Given your use case is about syncing data to different connections, I recommend to:
1. Create datasets dynamically using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#creating-datasets.
2. Link these datasets with Sync recipes using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/recipes.html#example-creating-a-sync-r...
Hope it helps,
Alex
I notice how recipe = builder.build() only creates the recipe, and doesn't actually run it?
I'm left with empty datasets that still need to be built?
Am I missing a final method?
Hi,
By design, you cannot run a recipe from the recipe itself. After creating the recipe and its outputs, you need to create a job which builds them.
As you are developing a macro, then you can use this API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/jobs.html#starting-new-jobs
Hope it helps,
Alex
This worked beautifully thanks!
My last step would be dynamically running a download recipe within the macro and parameterizing the folder path, etc. Open to workarounds as well.
Hi,
The DownloadRecipeCreator class should come handy. Do not hesitate to create some recipes manually and use get_definition_and_payload to understand the expected structure as every type of recipe expects specific definition dictionaries.
Hope it helps,
Alex