Integrate DVC for data and feature versioning into DSS

0 Kudos

It would be useful to integrate DVC into DSS. We have been asked many times about a solution which can provide also dataset and feature versioning (a feature repository would be useful as well), so one could select the latest version of a dataset and put it into a pipeline.  DSS makes it lot easier to set up a MLOps pipeline, but with regards to theser areas it needs some improvements. My 2 cents.



Dataiker Alumni

Hi @gnaldi62 thank you for your idea! For the benefit of the community, we’ll more for my benefit, can you define DVC. I know it’s easy to assume we all know acronyms, but it will be easier for all members of this community, again mostly me 🤪, if we stay away from acronyms on the ideas. Thank you for your support and for the education!

@CoreyS ,

I’m wondering if this is what @gnaldi62 is talking about.  

@gnaldi62 ,

If I’m correct about what DVC is.  Given that Dataiku DSS has an extensive underpinning of git.  And this tool set is based on git.  Have you experimented with trying to use both Dataiku and DVC on a single repository?  I don’t know if it would work.   So I’d invite you to try only in a dev environment you can afford corrupt.


Sorry. DVC is actually not an acronym but a product . URL is It's a version control system specifically developed for ML projects.



Dataiker Alumni

Thanks for the clarification @gnaldi62 and @tgb417!

Noticed that in DSS 10 includes MLFlow integration. So I suppose my suggestion could be considered out-of-date so far.