Can I have a dataset both as input and as output of a recipe (a kind of “Recursive recipe”)?
UserBird
Dataiker, Alpha Tester Posts: 535 Dataiker
I'd like to write a recipe where one of the inputs is also an output. The loop exists intentionally: the data-set should get enriched every time the recipe is run and would converge.
Tagged:
Best Answer
-
It is not possible to have a dataset both as input and as output of a recipe (this would require the user to specify a convergence criterion, and a way to write to a dataset while reading from it).
But there is hope!
- If the goal is to enrich a dataset, one should have two datasets: foo and foo_enriched
- If it's about iterating until convergence, this can be done inside one recipe (for instance in a Python recipe), and have as output of the recipe the dataset after convergence.
- If it's about updating a dataset on a regular basis (e.g. daily), then partitioning might be the solution.
Answers
-
The answer given by jrouquie is correct. But there are also some (unofficial) hacks to work around:
- Notebooks
- Writing in files (I personally do it for caching API calls)
- SQL