Group by with empty value and with Null value

Max334
Max334 Registered Posts: 3 ✭✭

Hello everyone !

I have a dataset with empty values in one of the columns (col1) and I use a group by recipe on an other column (col2) without empty values with col1_distinct as aggregation. I get a volume of 21, 199 and 1608 for the 3 col2 fields.

But I wanted to add a condition on col1 with a prepare recipe with a formula :

if(isNonBlank(col1),
if(col1<= col3, col1, ''),'').

This condition finally didn't change anything for col1 but I got, with the same group by on col2, a volume of 22, 200 and 1609 for the 3 col2 fields. After testing, I found that the group by recipe counts the empty value as a distinct value which explains +1 in volumes.

But Dataiku is supposed to interpret '' as an empty value and I don't understand why the group by recipe make the difference ? (A chart on the dataset is correct)

And how transform these '' values on empty values (for a group by recipe).

Thank you :)

Best Answer

Answers

  • LucOBrien
    LucOBrien Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 23 ✭✭✭✭

    Hello,

    This could be an example of how an empty string ('') is different than just a null. The empty string is a technically data type with length 0 whereas a null (or NA, or None, or ... Depending on what language is being used) is a special marker that means literally empty and would have nothing to count.

    As an experiment, can you try this:

    if(isNonBlank(col1),
    if(col1<= col3, col1,null),null)

    I'm wondering if this will match your counts prior to the prepare recipe.

  • Max334
    Max334 Registered Posts: 3 ✭✭

    I saw on forums to write a null we have to write an empty string and the marker null don't exist in Dataiku (included None and NA). And I tried your solution and Dataiku didn't recognize the null marker and highlighted it in red.
    I didn't see anything about null in the documentation too and the only solution I found is to make a python recipe.

    But I don't know if it's a problem in my dataiku or in dataiku in general and I don't understand why a group by recipe make the difference between a null and an empty string if in Dataiku it's the same thing :/ .

  • LucOBrien
    LucOBrien Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 23 ✭✭✭✭

    That's strange, the formula I suggested worked for me.

    Have you figured it out since I last messaged?

Setup Info
    Tags
      Help me…