How to Automate Clustering with Anomaly Detection for Each Partition in Dataiku?

raihanhd
raihanhd Registered Posts: 7 ✭✭✭

Hello Dataiku Community,

I’m working on a project where I’ve partitioned my dataset by category and year. For example, my partitions look like this:

  • Category A | 2021
  • Category A | 2022
  • Category A | 2023
  • Category A | 2024
  • Category B | 2021
  • Category B | 2022
  • Category B | 2023
  • Category B | 2024
  • Category C | 2021
  • Category C | 2022
  • Category C | 2023
  • Category C | 2024

Now, I want to apply anomaly detection clustering automatically for each partition (e.g., one clustering model for “Category A | 2021,” another for “Category B | 2022,” and so on).

My Questions:

  1. Is it possible to automate clustering with anomaly detection for each partition directly in Dataiku without doing it manually for each combination of category and year?
  2. If automation is possible, what’s the best approach to set this up? For example:
    • Can I leverage the Partitioning feature for anomaly detection clustering?
    • Are there specific plugins, visual recipes, or scripting options to streamline this process?

I’d appreciate any guidance or examples to help me efficiently cluster my data while handling multiple partitions.

Thank you in advance for your help!

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,400 Dataiker

    Hi,
    Yes using a partitioned model should work here you can train a model for each partition and score partitons later with the relevant model

    https://doc.dataiku.com/dss/latest/machine-learning/partitioned.html#limitations

Setup Info
    Tags
      Help me…