Increase the number of Top N values while doing Analyze on one column in Prepare recipe

raviagrawal Partner, Registered Posts: 18 Partner

We have a dataset which has around 100K records and has column with name Brand. When we do Analyze on Brand column, as per sample data it shows distinct values. But when we compute on "Whole data" it just shows Top 10 Distinct values only though it has around 20 distinct values. Now to apply Mass Action on data, we need to have list of all products in Analyze so that we can apply Mass Action. Is there any way to increase the number of distinct values to apply Mass Action on all values


Ravi Agrawal

Best Answer

  • Jediv
    Jediv Dataiker Posts: 17 Dataiker
    Answer ✓
    Hi Ravi,

    Due to the potential large cardinality of the categorical data in a given column, Dataiku restricts you to top 10 values while analyzing the entire dataset. However If you would like a distinct count for all records in a dataset, I suggest using the Group By recipe on your dataset. If your data is already stored in a SQL database, you can also use the Charts section, then switch your sampling (on the left hand side) to In-Database, then build a bar chart or a pivot chart with the column you care about. This will generate a SQL query that will return the response you expect.
Setup Info
      Help me…