Output data size increase with Group by recipe

Options
Ramki_2022
Ramki_2022 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭

Can anyone help me to understand why there is increase in data size post "group by" step.

input data size 140 kb and output dataset 398kb, however there is a reduction in # of output rows. is there any way to reduce the data size.


Operating system used: win10

Answers

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Hi and welcome @Ramki_2022
    ,

    Do you introduce computed columns within that group recipe ? That might possibly be a reason.. it would be helpful if you could share a bit more info on what happens within that group recipe.

    cheers,

    Jurre

  • Ramki_2022
    Ramki_2022 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭
    Options

    String values columns are considered as "Group Keys" and decimal values are aggregated as "sum" and on the output "_sum" from aggregated columns names are removed.

    i have not created any custom aggregations.

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Remarkable, the only thing which comes to mind right now is maybe a change in datatype in one of your columns to something what takes up more space; bigger integers or something. Interesting challenge, as soon as there is some room to play i'll do some tests.

Setup Info
    Tags
      Help me…