Increase the number of Top N values while doing Analyze on one column in Prepare recipe

Highlighted
raviagrawal
Level 2
Increase the number of Top N values while doing Analyze on one column in Prepare recipe
Jump to solution
Hi,



We have a dataset which has around 100K records and has column with name Brand. When we do Analyze on Brand column, as per sample data it shows distinct values. But when we compute on "Whole data" it just shows Top 10 Distinct values only though it has around 20 distinct values. Now to apply Mass Action on data, we need to have list of all products in Analyze so that we can apply Mass Action. Is there any way to increase the number of distinct values to apply Mass Action on all values



Regards,

Ravi Agrawal
1 Solution

Accepted Solutions
Jediv Dataiker
Dataiker
Re: Increase the number of Top N values while doing Analyze on one column in Prepare recipe
Jump to solution
Hi Ravi,

Due to the potential large cardinality of the categorical data in a given column, Dataiku restricts you to top 10 values while analyzing the entire dataset. However If you would like a distinct count for all records in a dataset, I suggest using the Group By recipe on your dataset. If your data is already stored in a SQL database, you can also use the Charts section, then switch your sampling (on the left hand side) to In-Database, then build a bar chart or a pivot chart with the column you care about. This will generate a SQL query that will return the response you expect.

View solution in original post

0 Kudos
1 Reply
Jediv Dataiker
Dataiker
Re: Increase the number of Top N values while doing Analyze on one column in Prepare recipe
Jump to solution
Hi Ravi,

Due to the potential large cardinality of the categorical data in a given column, Dataiku restricts you to top 10 values while analyzing the entire dataset. However If you would like a distinct count for all records in a dataset, I suggest using the Group By recipe on your dataset. If your data is already stored in a SQL database, you can also use the Charts section, then switch your sampling (on the left hand side) to In-Database, then build a bar chart or a pivot chart with the column you care about. This will generate a SQL query that will return the response you expect.

View solution in original post

0 Kudos
Labels (1)