Variable inputs for application

Options
rump
rump Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7

Hi all,

Context: I have an application that uploads a dataset to a server through an API. We can say that it's the last step of the flow as I need to have the data there and not in dataiku. Therefore, I don't need the output of the application-as-recipe. I've been told that the recipe must have an output, so I'm using it as an "audit" report for the uploading work. The thing is that I have quite a few datasets to upload and the flow gets messy, so I was thinking of modifying the recipe a bit to take multiple datasets at once and upload them together.

Problem: How can I configure an application so that it takes N (undefined) datasets? I would need to specify the table target (in the server) for each N (undefined) dataset. Is possible to do something like this?

Thanks,

Ru

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @rump

    There may be others who have better ideas. And I still cannot fully understand your use case, so, I will have to be a bit general in my response. And I may be missing the point all together. However, my guess is that you will be using:

    • Some kind of code recipe (python, R, shell script) to gather up the data into a partitioned or single labeled dataset (every row has a value in a column defining it’s source), you may end up using the append option when you get new rows to append records from n(undefined) source dataset.
    • Then do your processing in dataiku. This process will have to either retain or create the identification of the destination API.
    • then you use another (python, R, or shell script) to send the data to it’s appropriate destination. I don’t think the “API Connect” plug in will be able to deal with an undefined n destination API Endpoints. If the APIs are on the same host, the API Connect plugin may be helpful.
  • rump
    rump Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7
    Options

    Hi @tgb417

    Imagine this scenario:

    - I have a flow that produces a series of datasets: the Results Datasets (RDs).

    - I have an app-as-recipe that uploads 1 dataset through an API: let's call it the APP. Do not worry about the details of the connection. The APP takes care of that. This APP is used along different flows, so the flow itself doesn't matter either.

    - Goal of the flow: having the RDs in a particular server. The flow produces RD and the APP takes 1 dataset and upload it to the server. The APP produces an output (bc it's a recipe), but I am only interested in having the dataset in the server (not in the flow). Therefore, this output of this APP (recipe) does not matter.

    Current solution: I take 1 RD, put it as input of my APP and upload it to the server. This makes the flow have 3 objects (1 input, 1 recipe & 1 output) per each RD.

    Desired solution: I want to take N RDs, put them as input of my APP (connected to the same recipe) and upload it to the server. This would make the flow have N + 2 objects (N inputs, 1 recipe & 1 output).

    More info about the APP: it takes 4 parameters for each dataset (input). I would require to have 4xN parameters (4 parameters for each input) being N variable. Basically, it would be like having a feature of add a dataset that the join recipe has. You can add as many inputs as you want and you would have new parameters per input. Something like this but in an app-as-recipe.

    Hope it's a bit clearer now

    Thanks in advanced!

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    Hi @rump
    ,

    Why are you using an app-as-recipe?

    Could you just use a Python recipe where the input was a dataset where each row was the 4 arguments you need to identify an input dataset. The output would be whatever you have now.

    You may also be able to build that the inputs arguments programmatically by using the Dataiku API to gather up output datasets in the flow.

    Marlan

  • rump
    rump Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7
    Options

    Hi @Marlan
    I thought of using an app-as-recipe because I need it to be reusable in other projects, flows... if I used a python recipe, I'd need to copy-paste the code and adapt it to the inputs manually, right?

    Thanks

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    Yeah that is the purpose of app-as-recipe so totally a reasonable thing to do. There are other options though. The most obvious is to create a recipe plugin that does the same thing as the app-as-recipe. If you haven't developed plugins before there is definitely a learning curve but overall I'd say you get a better and cleaner result in many situations.

    Another option would be to create a Python function that does all the work of calling the API and call that from project specific recipes. You can store the function in a git repo and then import it into the project library of any project you want to use it in.

    I would choose one of these options over app-as-recipe as I find that apps-as-recipe are generally more involved to work with and have more limitations. I think those are more useful when they represent multiple steps in a longer flow vs. the one recipe you have in this case.

    I'm not sure any of these options solve the problem you are asking about but they make a solution easier to develop.

    Marlan

Setup Info
    Tags
      Help me…