Accessing metrics in Dataiku Formula

Options
schang
schang Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 6 ✭✭✭✭

Hi, I would like to use the metric in the screenshot in Dataiku formula.

How can I access this?

Also, what's the most simple way of finding a column sum in prepare recipe?

Thanks

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Hi @schang
    !

    Could you provide more information on what metric are you interested in? The screenshot you provided is just a log file that shows an error, but it doesn't give any relevant information as far as I can see.

    About finding the sum of a column in the prepare recipe, are you looking to create a new column with this value, or you just want to know what is the sum of a column?

    Cheers

  • schang
    schang Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 6 ✭✭✭✭
    Options

    Can you see the screenshot below?

    I basically need a method to calculate a row value divided by another column's sum in prepare recipe, something quite simple and common to do on excel. I know this is easy with python but this is for a user who does not code and it'll be surprising to find out doing something so simple in excel is not feasible easily using visual recipes.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Hi @schang
    ,

    Yes, I can see the screenshot and I understand now your use case.

    Apparently, there is no way to access the metrics values from a visual recipe, at least I was not able to find anything in the docs.

    Also, and again I might be wrong, the formula language available for some visual recipes do not include functions to calculate statistics from a column (sum, mean, std, etc), in a way that its results can be used to be applied to the cells. All operations, and this apply apparently also to the python option within the prepare recipe, are row by row.

    I was able to find a solution using only visual recipes, but I find it not too elegant or practical. Given a dataset with columns A, B, C, where we want to dived the B rows by the sum of all the rows in column C:

    1. Create a new dataset using the Windows recipe:
      • In the "windows definitions" section, select as Order Column the row A (this is irrelevant for this use case, but it is needed for the recipe to work), and activate the Window Frame definition. Do no set any limits, just activate it
      • in the Aggregations section, highlight the "Sum" aggregation for column C
      • In the Output section, you will se a new column named "C_sum", this is were the result will be stored
      • run the recipe
    2. Now, process the new dataset with the extra column with a "Prepare Recipe":
      • Add a "Formula Step"
      • Name the output column
      • in the Expression block write B / C_sum

    And that is it, you have what you needed. But I agree, it is not as simple as with excel.

    Maybe a dataiker could provide more input? @CoreyS
    ?

    Cheers!

  • schang
    schang Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 6 ✭✭✭✭
    Options

    Thanks a lot, @Ignacio_Toledo
    .

    Interested to know what the advice from dataiku is for this case and if they're willing to add a functionality to access metrics from formula in prepare recipes.

Setup Info
    Tags
      Help me…