Output data size increase with Group by recipe
Can anyone help me to understand why there is increase in data size post "group by" step.
input data size 140 kb and output dataset 398kb, however there is a reduction in # of output rows. is there any way to reduce the data size.
Operating system used: win10
Answers
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi and welcome @Ramki_2022
,Do you introduce computed columns within that group recipe ? That might possibly be a reason.. it would be helpful if you could share a bit more info on what happens within that group recipe.
cheers,
Jurre
-
String values columns are considered as "Group Keys" and decimal values are aggregated as "sum" and on the output "_sum" from aggregated columns names are removed.
i have not created any custom aggregations.
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Remarkable, the only thing which comes to mind right now is maybe a change in datatype in one of your columns to something what takes up more space; bigger integers or something. Interesting challenge, as soon as there is some room to play i'll do some tests.