We're looking at whether Dataiku is suitable for our workload which is something like this:
We currently have a system which builds multiple models for multiple clients, for example churn prediction, product recommendation systems. We use an automated process to produce a few hundred models each week based on new data uploaded by clients. Typically we re-train all models each week to take account of changes such as new products, new stores or changes in client data processing (e.g. they added a new custom field, or re-coded something), though some we do daily and some models we "freeze" for consistency. The models are used for batch scoring.
Different models have different fields (e.g. some clients have specific data such as sales office or distance to depot). For binary prediction, our larger datasets are ~1m customers per client (e.g. rows) with ~300 features. For recommendations ~1m transactions.
As a newbie I have some questions about Dataiku:
Thanks in advance for any advice,
Dataiku DSS will provide lots of interesting features to complete this type of project.
Let me try to provide some insights on your questions :