Output data size increase with Group by recipe
![Ramki_2022](https://us.v-cdn.net/6038231/uploads/Dataiku/nAvatar19.png)
Can anyone help me to understand why there is increase in data size post "group by" step.
input data size 140 kb and output dataset 398kb, however there is a reduction in # of output rows. is there any way to reduce the data size.
Operating system used: win10
Answers
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭Options
Hi and welcome @Ramki_2022
,Do you introduce computed columns within that group recipe ? That might possibly be a reason.. it would be helpful if you could share a bit more info on what happens within that group recipe.
cheers,
Jurre
-
String values columns are considered as "Group Keys" and decimal values are aggregated as "sum" and on the output "_sum" from aggregated columns names are removed.
i have not created any custom aggregations.
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭Options
Remarkable, the only thing which comes to mind right now is maybe a change in datatype in one of your columns to something what takes up more space; bigger integers or something. Interesting challenge, as soon as there is some room to play i'll do some tests.