Preparation Recipes - ROWS mode documentation

florianbriand
florianbriand Registered Posts: 12 ✭✭✭✭

Is there any documentation about the "ROWS" mode for Preparation Recipes ?

The existing documentation only mention the CELL and ROW modes ( https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html )

Specifically, is there any ways to control the execution of the process ? For example, if I want to run the "process" function only one time, how can I achieve that ?

Best Answer

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Answer ✓

    Hi,

    We indeed do not have a detailed example for the ROWS mode, but it does work very similarly in a plugin as in the normal processor itself: https://doc.dataiku.com/dss/latest/preparation/processors/python-custom.html#rows-mode

    In other words, you can create a processor in the UI, switch it to rows mode, inspect and understand the sample code and port it to your plugin.

    You cannot get the process function to be called only once. It would require the entire data to be in memory, which would not scale. Instead, the process function will be called once per row, and each time you return as many rows as you want. In other words, you can return zero rows, while "remembering" the previous rows in a Python variable, and emitting them later, for instance when some kind of "trigger" is reached. Beware that remembering all rows in a dataset could easily lead to out-of-memory situations.

Setup Info
    Tags
      Help me…