After some time spent on dataiku's website and forums, I can't find an answer to my question.
First of all, I specify that I use the free version of DSS.
My wish is to partition a CSV file, according to the value of a category column, and then to pass each partition in an anomaly detection algorithm (forest isolation). To simplify, let's say that I want to partition according to a client category. The flow would be launched once a day, and each day there can be a different number of categories. The number of categories is more or less 100.
You can find attached a schema explaining my aim.
Could someone tell me if this is possible? If so, what are the steps to follow?
Thank you very much in advance.
You can certainly run a partitioned model based on a partitioned dataset. You should be able to do this with the free edition of DSS. As for the steps, I think our academy can explain it better than I would be able to in a forum post. We have a module on training a partitioned random forest model against a partitioned dataset here: https://academy.dataiku.com/partitioned-models/543579. The module will explain the concepts and then walk you through a tutorial project for this use case. If you need a more in depth tutorial for partitioning, we also have an academy module on that here: https://academy.dataiku.com/advanced-partitioning/657681. Both of these modules should help with your use case.
Feel free to reach out if you have any questions about the tutorials!
Hope this helps!
First of all, I love using the Community edition of DSS.
As a long-time user of the Community Edition of DSS, I found that one of the limitations of the version for me was the lack of Partitioning. When I look at this page I see that Partition has not been listed as a feature of the Community Edition. If you know a way to get Partitions and Partitioned models to work with the Community Edition, I'd love to learn more and share this with others.
One of the other limitations I've found with the community edition is the lack of what Dataiku calls Scenario Support. The ability to schedule jobs to run on a schedule or by some type of trigger condition.
Can you tell us a bit more about the content that you are doing your project?
I overlooked the partition limitation on the edition comparison, apologies for the mistake. Unfortunately you will need the business or enterprise edition for partitioning.
You are correct, thank you for catching my mistake. Unfortunately there isn't a way to get partitioning on the free edition of DSS. I overlooked that limitation when reviewing the edition comparisons.