Group by with empty value and with Null value

Hello everyone !
I have a dataset with empty values in one of the columns (col1) and I use a group by recipe on an other column (col2) without empty values with col1_distinct as aggregation. I get a volume of 21, 199 and 1608 for the 3 col2 fields.
But I wanted to add a condition on col1 with a prepare recipe with a formula :
if(isNonBlank(col1),
if(col1<= col3, col1, ''),'').
This condition finally didn't change anything for col1 but I got, with the same group by on col2, a volume of 22, 200 and 1609 for the 3 col2 fields. After testing, I found that the group by recipe counts the empty value as a distinct value which explains +1 in volumes.
But Dataiku is supposed to interpret '' as an empty value and I don't understand why the group by recipe make the difference ? (A chart on the dataset is correct)
And how transform these '' values on empty values (for a group by recipe).
Thank you :)
Best Answer
-
Ok I found the solution : when I run
if(isNonBlank(col1) && col1> col3, '', col1)
that works beacause the '' value isn't the default value.
Answers
-
Hello,
This could be an example of how an empty string ('') is different than just a null. The empty string is a technically data type with length 0 whereas a null (or NA, or None, or ... Depending on what language is being used) is a special marker that means literally empty and would have nothing to count.
As an experiment, can you try this:
if(isNonBlank(col1),
if(col1<= col3, col1,null),null)I'm wondering if this will match your counts prior to the prepare recipe.
-
I saw on forums to write a null we have to write an empty string and the marker null don't exist in Dataiku (included None and NA). And I tried your solution and Dataiku didn't recognize the null marker and highlighted it in red.
I didn't see anything about null in the documentation too and the only solution I found is to make a python recipe.
But I don't know if it's a problem in my dataiku or in dataiku in general and I don't understand why a group by recipe make the difference between a null and an empty string if in Dataiku it's the same thing :/ . -
That's strange, the formula I suggested worked for me.
Have you figured it out since I last messaged?