Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Building multiple models in dataiku

Solved!
MNOP
Level 3
Building multiple models in dataiku

I have master data in the flow.
I want to subset the data for each market and, in that market, select submarkets and develop models.
There are around 60 models that need to be built.
I am wondering what would be the best approach to do this in Dataiku in terms of storage space optimization, ease of maintenance, etc.

Has anyone attempted such a large modelling exercise in Dataiku before that I can refer to?


Operating system used: Windows


Operating system used: Windows

0 Kudos
1 Solution
AlexT
Dataiker

Hi @MNOP ,
You try and leverage partitioned models if having a single with the submarket as feature is not sufficient :
https://knowledge.dataiku.com/latest/ml-analytics/partitioned-models/tutorial-partitioned-models.htm...
This will create a model for each partition which seems to be what you are looking ~60 partition should not be an issue but do keep in mind each time to train the model will store on disk 60 models to this can grow quickly if you retrain daily and don't do any clean-up.

https://doc.dataiku.com/dss/latest/operations/disk-usage.html#sessions

Thanks

View solution in original post

0 Kudos
3 Replies
AlexT
Dataiker

Hi @MNOP ,
You try and leverage partitioned models if having a single with the submarket as feature is not sufficient :
https://knowledge.dataiku.com/latest/ml-analytics/partitioned-models/tutorial-partitioned-models.htm...
This will create a model for each partition which seems to be what you are looking ~60 partition should not be an issue but do keep in mind each time to train the model will store on disk 60 models to this can grow quickly if you retrain daily and don't do any clean-up.

https://doc.dataiku.com/dss/latest/operations/disk-usage.html#sessions

Thanks

0 Kudos
MNOP
Level 3
Author

@AlexT Thanks for the reply. Since our markets are dissimilar, we prefer building models at a market level. 
So, the partitioned models will be suitable for our use case.
One challenge we face while developing the partitioned model is that the features selected are common to all the partitioned models. Is it possible to select market-specific features while developing partitioned models? 

For example, certain features applicable to the "California" market need not be applicable to "New York" models. Is it possible to handle these cases?

0 Kudos
AlexT
Dataiker

Hi @MNOP ,

Switching off particular features for a particular partition is not possible.
See:
https://doc.dataiku.com/dss/latest/machine-learning/partitioned.html#parameters-and-settings

This would be a feature request. Feel free to submit an idea.
https://community.dataiku.com/t5/Community-Resources/How-to-suggest-Dataiku-ideas/ta-p/15018#:~:text....


To handle individual feature handling settings, you must create multiple models currently.


Thanks

0 Kudos