How to Automate Clustering with Anomaly Detection for Each Partition in Dataiku?
Hello Dataiku Community,
I’m working on a project where I’ve partitioned my dataset by category and year. For example, my partitions look like this:
- Category A | 2021
- Category A | 2022
- Category A | 2023
- Category A | 2024
- Category B | 2021
- Category B | 2022
- Category B | 2023
- Category B | 2024
- Category C | 2021
- Category C | 2022
- Category C | 2023
- Category C | 2024
Now, I want to apply anomaly detection clustering automatically for each partition (e.g., one clustering model for “Category A | 2021,” another for “Category B | 2022,” and so on).
My Questions:
- Is it possible to automate clustering with anomaly detection for each partition directly in Dataiku without doing it manually for each combination of category and year?
- If automation is possible, what’s the best approach to set this up? For example:
- Can I leverage the Partitioning feature for anomaly detection clustering?
- Are there specific plugins, visual recipes, or scripting options to streamline this process?
I’d appreciate any guidance or examples to help me efficiently cluster my data while handling multiple partitions.
Thank you in advance for your help!
Tagged:
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,400 DataikerHi,
Yes using a partitioned model should work here you can train a model for each partition and score partitons later with the relevant model
https://doc.dataiku.com/dss/latest/machine-learning/partitioned.html#limitations
