Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Output data size increase with Group by recipe

Ramki_2022
Level 1
Output data size increase with Group by recipe

Can anyone help me to understand why there is increase in data size post "group by" step.

input data size 140 kb and output dataset 398kb, however there is a reduction in # of output rows. is there any way to reduce the data size. 


Operating system used: win10

0 Kudos
3 Replies
Jurre
Neuron
Neuron

Hi and welcome @Ramki_2022 ,

Do you introduce computed columns within that group recipe ? That might possibly be a reason.. it would be helpful if you could share a bit more info on what happens within that group recipe.

cheers, 

Jurre

0 Kudos
Ramki_2022
Level 1
Author

String values columns are considered as "Group Keys" and decimal values are aggregated as "sum" and on the output "_sum" from aggregated columns names are removed.

i have not created any custom aggregations.

0 Kudos
Jurre
Neuron
Neuron

Remarkable, the only thing which comes to mind right now is maybe a change in datatype in one of your columns to something what takes up more space; bigger integers or something. Interesting challenge, as soon as there is some room to play i'll do some tests. 

0 Kudos