Can we get the sum or count of a column in dataiku

kusuma
Level 2
Can we get the sum or count of a column in dataiku

For example, I have two columns 

ColumnA-id         ColumnB-sessioncnt

12345                             20

23456                                10

Grandtotal                       30

Can I able to get this format. Please help me in that as I am new to dataiku.

0 Kudos
11 Replies
Ignacio_Toledo

Hi @kusuma,

I think this older post might help you:

How do I create sum or total data in same column on dashboard? 

Cheers.

0 Kudos
kusuma
Level 2
Author

Thanks for the reply!

 

0 Kudos
kusuma
Level 2
Author

Thanks! But in my case it was different like I need to get the sum for the output. It has only two columns, one is ID and another one is visits count. So, here I need the sum of 2 nd column as grand total in above or below of the column.

Attached screenshot for reference. Kindly help me in case if it was possible. Thanks in advance!

0 Kudos
Ignacio_Toledo

Hi @kusuma,

I don't see a dataiku feature that would allow you to produce this kind of summary table. What we usually do in this case, when we are creating a final summary table, is to either export the data to an Excel file, and do the last steps with Excel; or use a python recipe to create an insight that is shown as a summary table, or directly writes an Excel file with that format.

If you want a python recipe example, let me know and I can add it here.

Cheers

0 Kudos
kusuma
Level 2
Author

Thanks a lot @Ignacio_Toledo . Yes, please add python recipe. Thanks in advance!

Working on it!

0 Kudos

Hi @kusuma,

This is one option for a python code that will write the output in a Dataiku dataset:

import dataiku
import pandas as pd, numpy as np

data_for_summary = dataiku.Dataset("data_for_summary")
data_for_summary_df = data_for_summary.get_dataframe()

summarized_df = data_for_summary_df.append(
    pd.DataFrame(
        [['Grand Total', data_for_summary_df.Sessions.sum()]]
        , columns=["User Logon", "Sessions"])
)

# Write recipe outputs
summary = dataiku.Dataset("summary")
summary.write_with_schema(summarized_df)

Where the input dataset was called "data_for_summary" and the output "summary", as shown in this next screenshot:

Screenshot from 2022-05-15 17-28-42.png

But @NN  solution looks even better, I think!

Now, I think what DSS is currently missing is an option to create summary tables like this one using insights or the chart capabilities.

Cheers!

NN

@Ignacio_Toledo Completely Agree with you.. 
Summary tables is something all the BI tools i worked with used to have and i too do wish for something similar in dataiku.

kusuma
Level 2
Author

Apologies for late response, @Ignacio_Toledo Thanks  a lot for sharing the code!

NN

@kusuma , is the objective to show it as an insight in a dataiku dashboard?

One thing i had tried once was to create a dummy column (like SUM_COL in example image below)

and then go to charts and use a Pivot table and add the dummy column to the COLUMNS parameter.
But one limitation with this is that we cannot add the words like Total / Grand Total etc.

img1.jpgimg2.jpg

kusuma
Level 2
Author

@NN Yes, Thanks a lot for sharing it, it was very helpful for me. 

0 Kudos