Dataiku features
Hello community! I am currently evaluating some MLOps tools in the market and I am keen to learn if Dataiku DSS offers the following features. Thanks in advance!
1. Versioning of data, code and models/pipelines for reproducibility
2. Explainability for model predictions
3. Ability to inject ground-truth (for real world inputs) back into the ML workflow (eg. text classification)
4. Advanced deployment strategies like AB testing, Multi armed bandits, canary deployment, etc. (any)
5. Experiment tracking and model registry (like MLflow)
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Welcome to the Dataiku community.
I can partially answer your questions. Regarding
1a. Code is definitely always placed into a git repo that underpins the entire dss system. Most of the expected git features are available built in.
1b. Data versioning is a harder challenge due to typically large datasets. One can version the transformations you make to the data you have. But retaining and versioning the source data would be more about the data repository, SQL, s3, Hdfs, etc….
2 there are lots of explainability feature supporting model building. They are built in dash-board-able way… however, not all models are as explainable as other models. Those difference are reflected in what the tool provides by each built in model type.
3. There are a number of ways to inject ground truth back into your model building.
4. There is definitely an A/B testing plugin https://www.dataiku.com/product/plugins/ab-test-calculator/ to make that easier.
5. All models are tracked, you can do a variety of comparison methods as well between version of the models.
All that said DSS support coding in both Python and R. So if you need any feature from any of these languages you can set those thing up in a Jupiter Notebook or Code recipient and add it to your flow.
I’m just a user of the system. I’d suggest that you talk with some of the folks at Dataiku. Also for lots more details about the features of the tool, there are a number of training materials available at the academy. https://academy.dataiku.com/ You can install a free version of DSS and try it yourself. https://www.dataiku.com/product/get-started/ Almost all of the things you are asking about are usable or you can experiment with in the free version. If I were testing the thing you are asking about, I’d do the local install. The fully SaS version currently has some limitations due to the share tenant nature of that environment. And it sound like you might want to test some things that would be restricted in that environment. The Dataiku team can also arrange a full license key to test items that might be restricted in the free version.
Hope that helps.
-
Thanks a lot Tom @tgb417
, for the detailed explanation! I will check out the links that you have shared.