Integrate DVC for data and feature versioning into DSS
It would be useful to integrate DVC into DSS. We have been asked many times about a solution which can provide also dataset and feature versioning (a feature repository would be useful as well), so one could select the latest version of a dataset and put it into a pipeline. DSS makes it lot easier to set up a MLOps pipeline, but with regards to theser areas it needs some improvements. My 2 cents.
Regards.
Giuseppe
Comments
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi @gnaldi62
thank you for your idea! For the benefit of the community, we’ll more for my benefit, can you define DVC. I know it’s easy to assume we all know acronyms, but it will be easier for all members of this community, again mostly me 🤪, if we stay away from acronyms on the ideas. Thank you for your support and for the education! -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
@CoreyS
,I’m wondering if this https://dvc.org/ is what @gnaldi62
is talking about.If I’m correct about what DVC is. Given that Dataiku DSS has an extensive underpinning of git. And this tool set is based on git. Have you experimented with trying to use both Dataiku and DVC on a single repository? I don’t know if it would work. So I’d invite you to try only in a dev environment you can afford corrupt.
—Tom
-
gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron
Sorry. DVC is actually not an acronym but a product . URL is dvc.org It's a version control system specifically developed for ML projects.
Regards,
Giuseppe
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
-
gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron
Noticed that in DSS 10 includes MLFlow integration. So I suppose my suggestion could be considered out-of-date so far.
Rgds
Giuseppe