Survey banner
Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

Bulk Formula

Sometimes, I'd like to use the formula processor in the prepare recipe to process many columns at the same time. For example, I have a dataset with 375 columns. I need to perform simple products from two of my columns against most of those columns. To do so, I need to create several hundred formula processors, each with a simple recipe.

This will take a long time, and isn't easy to do using the API either.

Instead, I'd love to have a bulk formula recipe. It would allow me to list columns or select all, then identify a single column name by a variable. in this case, my inputs would look something like this:


name: [col]_product
formula: quantity*[col]


Then, I'd only need to write two formulas instead of 20. This would generate hundreds of columns automatically.

This way, I can quickly transform feature-rich datasets as needed.

Alternatively, I need to create a Python processor to handle this, but the formula processor has the advantage of being able to run in SQL.