Group Recipe - Min / Max calculation problem

dfwmike
dfwmike Registered Posts: 5 ✭✭✭✭

Are there any tips or tricks beyond what's covered in the training for using group recipes?

I'm having problems getting the correct min and max values calculated using group recipes. I can confirm that sometimes they calculate the correct values and sometimes they don't.

This is probably user error, but the dataset is large enough to not allow me to not use a sample, making it difficult or impossible to trace the error back to a previous recipe step.

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @dfwmike
    ,

    I'm not sure how to help you without a more detailed example of the cases where you find wrong values. Could you share a more concrete example?

    Also, are you using partitioned datasets? Just from experience, I've had some group issues when not using the partitioned datasets in the right way.

  • dfwmike
    dfwmike Registered Posts: 5 ✭✭✭✭

    Thanks, @Ignacio_Toledo
    !

    I am using a dataset imported from Snowflake by someone else who is not reachable at the moment. Is there a way I can tell if the dataset is partitioned?

    I have data that looks like this:

    ID Value

    A 10

    A 11

    B 9

    A 15

    B 8

    When I group to get the min, I get results that look like this...

    ID min_Value

    A 11

    B 8

    When I group to get the max, I get results like this...

    ID max_Value

    A 11

    B 9

    For B, it works perfectly. But for A, it gives me the same value in both situations and the value it gives me is neither the min nor the max.

    The value is calculated in a formula in the prepare recipe that immediately precedes the groupings.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @dfwmike
    ,

    The easiest way to check if a dataset is partitioned is by looking at is "icon" in the flow:

    partitioned_dataset.png

    As you can see it looks like a "set" of superimposed icons.

    Could you share a screen shot of your group recipe? Like this example:

    group_recipe.png

    I'm mostly interested in the areas within the red boxes.

    Cheers!

  • dfwmike
    dfwmike Registered Posts: 5 ✭✭✭✭

    @Ignacio_Toledo

    Thanks again! The dataset is not partitioned.

    Attached is a screenshot of the group recipe where I'm calculating the ADR max by CUSTOMER_VEHICLE_VRM.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Thanks for the capture. From your recipe I can't seem to find a clear indication of what might be happening. Maybe, and following your previous example, some of the IDs look similar but are actually different? Like instead of just having "A" values you also have "A " (and A plus a space), but in that case you would be seeing apparently repeating IDs (or repeated 'CUSTOMER_VEHICLE_VRM')

    Just to discard a final issue, could you share, from the same recipe, what you have in the tabs "Input/output" and "Advanced", like in these examples:

    group_recipe_inout.png

    group_recipe_adv.png

     Hope that with this extra info we might get something!

  • dfwmike
    dfwmike Registered Posts: 5 ✭✭✭✭

    I don't think the CUSTOMER_VEHICLE_VRM values are different because at an earlier point in the flow, I do a count by CUSTOMER_VEHICLE_VRM and I get the correct number.

    Attached are screenshots from the requested tabs.

    Thanks!

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @dfwmike
    ,

    Thanks for the screenshots. Sadly I'm running out of the ideas.

    If you select both min and max to be calculated at the same time in the grouping recipe, do you still get the same min and max values?

    Is there any subset of the data that you could share publicly? To try to replicate the problem.

    Hope we can find the problem!

  • dfwmike
    dfwmike Registered Posts: 5 ✭✭✭✭

    Yes, I've replicated the problem doing min and max in the same recipe and in separate recipes. I started with one recipe and split them out when I encountered this problem to see if that would help (and of course it didn't).

    Unfortunately, I can't share the dataset at present. I'd have to mask the customer identifiers it contains to do that.

    Thank you for your help! I've opened a ticket with Dataiku support.

    Mike

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    I was going to suggest next to create a support ticket, so great!

    Let us know what they found, it will be interesting to find out the cause.

    Good luck!

  • EliasH
    EliasH Dataiker, Registered Posts: 34 Dataiker

    The issue for @dfwmike
    has been resolved, thank you @Ignacio_Toledo
    for your assistance!

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭

    Hi @EliasH
    and @dfwmike
    , Would it be possible to expand a little on the nature of the problem and it's final solution ? I use this functionality a lot so there is some interest in the reliability of it (haven't encountered any challenges with it to date..)

Setup Info
    Tags
      Help me…