Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Integrate DVC for data and feature versioning into DSS

0 Kudos

It would be useful to integrate DVC into DSS. We have been asked many times about a solution which can provide also dataset and feature versioning (a feature repository would be useful as well), so one could select the latest version of a dataset and put it into a pipeline.  DSS makes it lot easier to set up a MLOps pipeline, but with regards to theser areas it needs some improvements. My 2 cents.

Regards.

Giuseppe

5 Comments
CoreyS
Community Manager
Community Manager

Hi @gnaldi62 thank you for your idea! For the benefit of the community, we’ll more for my benefit, can you define DVC. I know it’s easy to assume we all know acronyms, but it will be easier for all members of this community, again mostly me 🤪, if we stay away from acronyms on the ideas. Thank you for your support and for the education!

tgb417
Neuron
Neuron

@CoreyS ,

I’m wondering if this https://dvc.org/ is what @gnaldi62 is talking about.  

@gnaldi62 ,

If I’m correct about what DVC is.  Given that Dataiku DSS has an extensive underpinning of git.  And this tool set is based on git.  Have you experimented with trying to use both Dataiku and DVC on a single repository?  I don’t know if it would work.   So I’d invite you to try only in a dev environment you can afford corrupt.

—Tom

gnaldi62
Neuron
Neuron

Sorry. DVC is actually not an acronym but a product . URL is dvc.org It's a version control system specifically developed for ML projects.

Regards,

Giuseppe

CoreyS
Community Manager
Community Manager

Thanks for the clarification @gnaldi62 and @tgb417!

gnaldi62
Neuron
Neuron

Noticed that in DSS 10 includes MLFlow integration. So I suppose my suggestion could be considered out-of-date so far.

Rgds

Giuseppe