Increase the number of Top N values while doing Analyze on one column in Prepare recipe
raviagrawal
Partner, Registered Posts: 18 Partner
Hi,
We have a dataset which has around 100K records and has column with name Brand. When we do Analyze on Brand column, as per sample data it shows distinct values. But when we compute on "Whole data" it just shows Top 10 Distinct values only though it has around 20 distinct values. Now to apply Mass Action on data, we need to have list of all products in Analyze so that we can apply Mass Action. Is there any way to increase the number of distinct values to apply Mass Action on all values
Regards,
Ravi Agrawal
We have a dataset which has around 100K records and has column with name Brand. When we do Analyze on Brand column, as per sample data it shows distinct values. But when we compute on "Whole data" it just shows Top 10 Distinct values only though it has around 20 distinct values. Now to apply Mass Action on data, we need to have list of all products in Analyze so that we can apply Mass Action. Is there any way to increase the number of distinct values to apply Mass Action on all values
Regards,
Ravi Agrawal
Tagged:
Best Answer
-
Hi Ravi,
Due to the potential large cardinality of the categorical data in a given column, Dataiku restricts you to top 10 values while analyzing the entire dataset. However If you would like a distinct count for all records in a dataset, I suggest using the Group By recipe on your dataset. If your data is already stored in a SQL database, you can also use the Charts section, then switch your sampling (on the left hand side) to In-Database, then build a bar chart or a pivot chart with the column you care about. This will generate a SQL query that will return the response you expect.