Preparation Recipes - ROWS mode documentation

florianbriand · ‎02-05-2020

Is there any documentation about the "ROWS" mode for Preparation Recipes ?

The existing documentation only mention the CELL and ROW modes ( https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html )

Specifically, is there any ways to control the execution of the process ? For example, if I want to run the "process" function only one time, how can I achieve that ?

Clément_Stenac · ‎02-07-2020

Hi,

We indeed do not have a detailed example for the ROWS mode, but it does work very similarly in a plugin as in the normal processor itself: https://doc.dataiku.com/dss/latest/preparation/processors/python-custom.html#rows-mode

In other words, you can create a processor in the UI, switch it to rows mode, inspect and understand the sample code and port it to your plugin.

You cannot get the process function to be called only once. It would require the entire data to be in memory, which would not scale. Instead, the process function will be called once per row, and each time you return as many rows as you want. In other words, you can return zero rows, while "remembering" the previous rows in a Python variable, and emitting them later, for instance when some kind of "trigger" is reached. Beware that remembering all rows in a dataset could easily lead to out-of-memory situations.

View solution in original post

Clément_Stenac · ‎02-07-2020

Hi,

We indeed do not have a detailed example for the ROWS mode, but it does work very similarly in a plugin as in the normal processor itself: https://doc.dataiku.com/dss/latest/preparation/processors/python-custom.html#rows-mode

In other words, you can create a processor in the UI, switch it to rows mode, inspect and understand the sample code and port it to your plugin.

You cannot get the process function to be called only once. It would require the entire data to be in memory, which would not scale. Instead, the process function will be called once per row, and each time you return as many rows as you want. In other words, you can return zero rows, while "remembering" the previous rows in a Python variable, and emitting them later, for instance when some kind of "trigger" is reached. Beware that remembering all rows in a dataset could easily lead to out-of-memory situations.

Sign up to take part

Preparation Recipes - ROWS mode documentation

Preparation Recipes - ROWS mode documentation