Can we get the sum or count of a column in dataiku

Manyam
Manyam Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

For example, I have two columns

ColumnA-id ColumnB-sessioncnt

12345 20

23456 10

Grandtotal 30

Can I able to get this format. Please help me in that as I am new to dataiku.

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @kusuma
    ,

    I think this older post might help you:

    Cheers.

  • Manyam
    Manyam Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

    Thanks for the reply!

  • Manyam
    Manyam Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

    Thanks! But in my case it was different like I need to get the sum for the output. It has only two columns, one is ID and another one is visits count. So, here I need the sum of 2 nd column as grand total in above or below of the column.

    Attached screenshot for reference. Kindly help me in case if it was possible. Thanks in advance!

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @kusuma
    ,

    I don't see a dataiku feature that would allow you to produce this kind of summary table. What we usually do in this case, when we are creating a final summary table, is to either export the data to an Excel file, and do the last steps with Excel; or use a python recipe to create an insight that is shown as a summary table, or directly writes an Excel file with that format.

    If you want a python recipe example, let me know and I can add it here.

    Cheers

  • Manyam
    Manyam Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

    Thanks a lot @Ignacio_Toledo
    . Yes, please add python recipe. Thanks in advance!

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Working on it!

  • NN
    NN Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 145 Neuron

    @kusuma
    , is the objective to show it as an insight in a dataiku dashboard?

    One thing i had tried once was to create a dummy column (like SUM_COL in example image below)

    and then go to charts and use a Pivot table and add the dummy column to the COLUMNS parameter.
    But one limitation with this is that we cannot add the words like Total / Grand Total etc.

    img1.jpgimg2.jpg

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
    edited July 17

    Hi @kusuma
    ,

    This is one option for a python code that will write the output in a Dataiku dataset:

    import dataiku
    import pandas as pd, numpy as np
    
    data_for_summary = dataiku.Dataset("data_for_summary")
    data_for_summary_df = data_for_summary.get_dataframe()
    
    summarized_df = data_for_summary_df.append(
        pd.DataFrame(
            [['Grand Total', data_for_summary_df.Sessions.sum()]]
            , columns=["User Logon", "Sessions"])
    )
    
    # Write recipe outputs
    summary = dataiku.Dataset("summary")
    summary.write_with_schema(summarized_df)

    Where the input dataset was called "data_for_summary" and the output "summary", as shown in this next screenshot:

    Screenshot from 2022-05-15 17-28-42.png

    But @NN
    solution looks even better, I think!

    Now, I think what DSS is currently missing is an option to create summary tables like this one using insights or the chart capabilities.

    Cheers!

  • NN
    NN Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 145 Neuron

    @Ignacio_Toledo
    Completely Agree with you..
    Summary tables is something all the BI tools i worked with used to have and i too do wish for something similar in dataiku.

  • Manyam
    Manyam Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

    Apologies for late response, @Ignacio_Toledo
    Thanks a lot for sharing the code!

  • Manyam
    Manyam Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

    @NN
    Yes, Thanks a lot for sharing it, it was very helpful for me.

Setup Info
    Tags
      Help me…