Accessing metrics in Dataiku Formula

schang
Level 1
Accessing metrics in Dataiku Formula
 

Hi, I would like to use the metric in the screenshot in Dataiku formula.

How can I access this?

Also, what's the most simple way of finding a column sum in prepare recipe?

 

Thanks

0 Kudos
4 Replies
Ignacio_Toledo

Hi @schang!

Could you provide more information on what metric are you interested in? The screenshot you provided is just a log file that shows an error, but it doesn't give any relevant information as far as I can see.

About finding the sum of a column in the prepare recipe, are you looking to create a new column with this value, or you just want to know what is the sum of a column?

Cheers

0 Kudos
schang
Level 1
Author

Can you see the screenshot below?

I basically need a method to calculate a row value divided by another column's sum in prepare recipe, something quite simple and common to do on excel. I know this is easy with python but this is for a user who does not code and it'll be surprising to find out doing something so simple in excel is not feasible easily using visual recipes.

 

 

0 Kudos
Ignacio_Toledo

Hi @schang

Yes, I can see the screenshot and I understand now your use case.

Apparently, there is no way to access the metrics values from a visual recipe, at least I was not able to find anything in the docs.

Also, and again I might be wrong, the formula language available for some visual recipes do not include functions to calculate statistics from a column (sum, mean, std, etc), in a way that its results can be used to be applied to the cells. All operations, and this apply apparently also to the python option within the prepare recipe, are row by row.

I was able to find a solution using only visual recipes, but I find it not too elegant or practical. Given a dataset with columns A, B, C, where we want to dived the B rows by the sum of all the rows in column C:

  1. Create a new dataset using the Windows recipe:
    • In the "windows definitions" section, select as Order Column the row A (this is irrelevant for this use case, but it is needed for the recipe to work), and activate the Window Frame definition. Do no set any limits, just activate it
    • in the Aggregations section, highlight the "Sum" aggregation for column C
    • In the Output section, you will se a new column named "C_sum", this is were the result will be stored
    • run the recipe
  2. Now, process the new dataset with the extra column with a "Prepare Recipe":
    • Add a "Formula Step"
    • Name the output column
    • in the Expression block write B / C_sum 

And that is it, you have what you needed. But I agree, it is not as simple as with excel.

Maybe a dataiker could provide more input? @CoreyS ?

Cheers!

schang
Level 1
Author

Thanks a lot, @Ignacio_Toledo .

Interested to know what the advice from dataiku is for this case and if they're willing to add a functionality to access metrics from formula in prepare recipes.

0 Kudos