Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

Bulk Formula

Sometimes, I'd like to use the formula processor in the prepare recipe to process many columns at the same time. For example, I have a dataset with 375 columns. I need to perform simple products from two of my columns against most of those columns. To do so, I need to create several hundred formula processors, each with a simple recipe.

This will take a long time, and isn't easy to do using the API either.

Instead, I'd love to have a bulk formula recipe. It would allow me to list columns or select all, then identify a single column name by a variable. in this case, my inputs would look something like this:

 

name: [col]_product
formula: quantity*[col]

 

Then, I'd only need to write two formulas instead of 20. This would generate hundreds of columns automatically.

This way, I can quickly transform feature-rich datasets as needed.

Alternatively, I need to create a Python processor to handle this, but the formula processor has the advantage of being able to run in SQL.