Preparation Recipes - ROWS mode documentation

Solved!
florianbriand
Level 2
Preparation Recipes - ROWS mode documentation

Is there any documentation about the "ROWS" mode for Preparation Recipes ?

The existing documentation only mention the CELL and ROW modes ( https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html )

 

Specifically, is there any ways to control the execution of the process ? For example, if I want to run the "process" function only one time, how can I achieve that ?

0 Kudos
1 Solution
Clรฉment_Stenac

Hi,

We indeed do not have a detailed example for the ROWS mode, but it does work very similarly in a plugin as in the normal processor itself: https://doc.dataiku.com/dss/latest/preparation/processors/python-custom.html#rows-mode

In other words, you can create a processor in the UI, switch it to rows mode, inspect and understand the sample code and port it to your plugin.

You cannot get the process function to be called only once. It would require the entire data to be in memory, which would not scale. Instead, the process function will be called once per row, and each time you return as many rows as you want. In other words, you can return zero rows, while "remembering" the previous rows in a Python variable, and emitting them later, for instance when some kind of "trigger" is reached. Beware that remembering all rows in a dataset could easily lead to out-of-memory situations.

View solution in original post

0 Kudos
1 Reply
Clรฉment_Stenac

Hi,

We indeed do not have a detailed example for the ROWS mode, but it does work very similarly in a plugin as in the normal processor itself: https://doc.dataiku.com/dss/latest/preparation/processors/python-custom.html#rows-mode

In other words, you can create a processor in the UI, switch it to rows mode, inspect and understand the sample code and port it to your plugin.

You cannot get the process function to be called only once. It would require the entire data to be in memory, which would not scale. Instead, the process function will be called once per row, and each time you return as many rows as you want. In other words, you can return zero rows, while "remembering" the previous rows in a Python variable, and emitting them later, for instance when some kind of "trigger" is reached. Beware that remembering all rows in a dataset could easily lead to out-of-memory situations.

0 Kudos