Dynamic Number of Output Roles

Options
gblack686
gblack686 Partner, Registered Posts: 62 Partner

Is it possible to set a dynamic number of output roles? Perhaps by using python to create the recipe.json file?

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    What are you trying to achieve?

    Input and output roles in a plugin recipe are fixed in number and type (this is by design). But there maybe other ways to achieve your goal, depending on the context.

    Cheers,

    Alex

  • gblack686
    gblack686 Partner, Registered Posts: 62 Partner
    Options

    We want to sync an SFTP folder to to Redshift.

    Right now we are downloading from SFTP to a project folder. Then we are creating a recipe to sync all files to S3 then Redshift. (We are then going to package this flow into a macro)

    Right now we have to add a specific output for each file, as well as rename accordingly. I want to use python to read the folder contents and create outputs based on the counts and file names. Is this possible?

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    I would advise to package this code as a macro, not as a recipe. You can have the macro take as input the folder, and then automatically creating the Sync recipes and outputs dynamically.

    Alternatively, it is possible from a recipe to write to outputs which are not declared. However you would lose the data lineage which the flow provides, so I wouldn’t recommend it.

    Additional question: what are the different files in the input folder? Are we talking about files of the same schema recorded at different dates? In that case, there may be a better solution based in partitioning. I can further explain if needed.

    Hope it helps,

    Alex

  • gblack686
    gblack686 Partner, Registered Posts: 62 Partner
    Options

    Hi,

    Thanks for the help. The folder contents are mapping codes/helper tables that are subject to change by the user.

    Yes I think the macro is the way to go, as we would be using this in 75% of our projects going forward. Do you recommend any python dss methods for download from sftp > sync to s3 > sync to redshift?

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    Gotcha. In that case, indeed the macro is the best way to go. Given your use case is about syncing data to different connections, I recommend to:

    1. Create datasets dynamically using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/datasets.html#creating-datasets.

    2. Link these datasets with Sync recipes using https://doc.dataiku.com/dss/latest/python-api/rest-api-client/recipes.html#example-creating-a-sync-recipe

    Hope it helps,

    Alex

  • gblack686
    gblack686 Partner, Registered Posts: 62 Partner
    Options

    I notice how recipe = builder.build() only creates the recipe, and doesn't actually run it?

    I'm left with empty datasets that still need to be built?

    Am I missing a final method?

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    By design, you cannot run a recipe from the recipe itself. After creating the recipe and its outputs, you need to create a job which builds them.

    As you are developing a macro, then you can use this API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/jobs.html#starting-new-jobs

    Hope it helps,

    Alex

  • gblack686
    gblack686 Partner, Registered Posts: 62 Partner
    Options

    This worked beautifully thanks!

    My last step would be dynamically running a download recipe within the macro and parameterizing the folder path, etc. Open to workarounds as well.

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options

    Hi,

    The DownloadRecipeCreator class should come handy. Do not hesitate to create some recipes manually and use get_definition_and_payload to understand the expected structure as every type of recipe expects specific definition dictionaries.

    Hope it helps,

    Alex

Setup Info
    Tags
      Help me…