Can I have a dataset both as input and as output of a recipe (a kind of “Recursive recipe”)?

Options
UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
I'd like to write a recipe where one of the inputs is also an output. The loop exists intentionally: the data-set should get enriched every time the recipe is run and would converge.
Tagged:

Best Answer

  • jrouquie
    jrouquie Dataiker Alumni Posts: 87 ✭✭✭✭✭✭✭
    Answer ✓
    Options

    It is not possible to have a dataset both as input and as output of a recipe (this would require the user to specify a convergence criterion, and a way to write to a dataset while reading from it).

    But there is hope!

    • If the goal is to enrich a dataset, one should have two datasets: foo and foo_enriched
    • If it's about iterating until convergence, this can be done inside one recipe (for instance in a Python recipe), and have as output of the recipe the dataset after convergence.
    • If it's about updating a dataset on a regular basis (e.g. daily), then partitioning might be the solution.

Answers

  • jereze
    jereze Alpha Tester, Dataiker Alumni Posts: 190 ✭✭✭✭✭✭✭✭
    Options

    The answer given by jrouquie is correct. But there are also some (unofficial) hacks to work around:

    • Notebooks
    • Writing in files (I personally do it for caching API calls)
    • SQL
Setup Info
    Tags
      Help me…