Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on June 14, 2024 5:51AM
Likes: 0
Replies: 5
Hi all,
Context: I have an application that uploads a dataset to a server through an API. We can say that it's the last step of the flow as I need to have the data there and not in dataiku. Therefore, I don't need the output of the application-as-recipe. I've been told that the recipe must have an output, so I'm using it as an "audit" report for the uploading work. The thing is that I have quite a few datasets to upload and the flow gets messy, so I was thinking of modifying the recipe a bit to take multiple datasets at once and upload them together.
Problem: How can I configure an application so that it takes N (undefined) datasets? I would need to specify the table target (in the server) for each N (undefined) dataset. Is possible to do something like this?
Thanks,
Ru
There may be others who have better ideas. And I still cannot fully understand your use case, so, I will have to be a bit general in my response. And I may be missing the point all together. However, my guess is that you will be using:
Hi @tgb417
Imagine this scenario:
- I have a flow that produces a series of datasets: the Results Datasets (RDs).
- I have an app-as-recipe that uploads 1 dataset through an API: let's call it the APP. Do not worry about the details of the connection. The APP takes care of that. This APP is used along different flows, so the flow itself doesn't matter either.
- Goal of the flow: having the RDs in a particular server. The flow produces RD and the APP takes 1 dataset and upload it to the server. The APP produces an output (bc it's a recipe), but I am only interested in having the dataset in the server (not in the flow). Therefore, this output of this APP (recipe) does not matter.
Current solution: I take 1 RD, put it as input of my APP and upload it to the server. This makes the flow have 3 objects (1 input, 1 recipe & 1 output) per each RD.
Desired solution: I want to take N RDs, put them as input of my APP (connected to the same recipe) and upload it to the server. This would make the flow have N + 2 objects (N inputs, 1 recipe & 1 output).
More info about the APP: it takes 4 parameters for each dataset (input). I would require to have 4xN parameters (4 parameters for each input) being N variable. Basically, it would be like having a feature of add a dataset that the join recipe has. You can add as many inputs as you want and you would have new parameters per input. Something like this but in an app-as-recipe.
Hope it's a bit clearer now
Thanks in advanced!
Hi @rump
,
Why are you using an app-as-recipe?
Could you just use a Python recipe where the input was a dataset where each row was the 4 arguments you need to identify an input dataset. The output would be whatever you have now.
You may also be able to build that the inputs arguments programmatically by using the Dataiku API to gather up output datasets in the flow.
Marlan
Hi @Marlan
I thought of using an app-as-recipe because I need it to be reusable in other projects, flows... if I used a python recipe, I'd need to copy-paste the code and adapt it to the inputs manually, right?
Thanks
Yeah that is the purpose of app-as-recipe so totally a reasonable thing to do. There are other options though. The most obvious is to create a recipe plugin that does the same thing as the app-as-recipe. If you haven't developed plugins before there is definitely a learning curve but overall I'd say you get a better and cleaner result in many situations.
Another option would be to create a Python function that does all the work of calling the API and call that from project specific recipes. You can store the function in a git repo and then import it into the project library of any project you want to use it in.
I would choose one of these options over app-as-recipe as I find that apps-as-recipe are generally more involved to work with and have more limitations. I think those are more useful when they represent multiple steps in a longer flow vs. the one recipe you have in this case.
I'm not sure any of these options solve the problem you are asking about but they make a solution easier to develop.
Marlan