Integrate DVC for data and feature versioning into DSS

0 Kudos

It would be useful to integrate DVC into DSS. We have been asked many times about a solution which can provide also dataset and feature versioning (a feature repository would be useful as well), so one could select the latest version of a dataset and put it into a pipeline.  DSS makes it lot easier to set up a MLOps pipeline, but with regards to theser areas it needs some improvements. My 2 cents.

Regards.

Giuseppe

5 Comments
CoreyS
Dataiker Alumni

Hi @gnaldi62 thank you for your idea! For the benefit of the community, weโ€™ll more for my benefit, can you define DVC. I know itโ€™s easy to assume we all know acronyms, but it will be easier for all members of this community, again mostly me ๐Ÿคช, if we stay away from acronyms on the ideas. Thank you for your support and for the education!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!

Hi @gnaldi62 thank you for your idea! For the benefit of the community, weโ€™ll more for my benefit, can you define DVC. I know itโ€™s easy to assume we all know acronyms, but it will be easier for all members of this community, again mostly me ๐Ÿคช, if we stay away from acronyms on the ideas. Thank you for your support and for the education!

@CoreyS ,

Iโ€™m wondering if this https://dvc.org/ is what @gnaldi62 is talking about.  

@gnaldi62 ,

If Iโ€™m correct about what DVC is.  Given that Dataiku DSS has an extensive underpinning of git.  And this tool set is based on git.  Have you experimented with trying to use both Dataiku and DVC on a single repository?  I donโ€™t know if it would work.   So Iโ€™d invite you to try only in a dev environment you can afford corrupt.

โ€”Tom

--Tom

@CoreyS ,

Iโ€™m wondering if this https://dvc.org/ is what @gnaldi62 is talking about.  

@gnaldi62 ,

If Iโ€™m correct about what DVC is.  Given that Dataiku DSS has an extensive underpinning of git.  And this tool set is based on git.  Have you experimented with trying to use both Dataiku and DVC on a single repository?  I donโ€™t know if it would work.   So Iโ€™d invite you to try only in a dev environment you can afford corrupt.

โ€”Tom

Sorry. DVC is actually not an acronym but a product . URL is dvc.org It's a version control system specifically developed for ML projects.

Regards,

Giuseppe

Sorry. DVC is actually not an acronym but a product . URL is dvc.org It's a version control system specifically developed for ML projects.

Regards,

Giuseppe

CoreyS
Dataiker Alumni

Thanks for the clarification @gnaldi62 and @tgb417!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!

Thanks for the clarification @gnaldi62 and @tgb417!

Noticed that in DSS 10 includes MLFlow integration. So I suppose my suggestion could be considered out-of-date so far.

Rgds

Giuseppe

Noticed that in DSS 10 includes MLFlow integration. So I suppose my suggestion could be considered out-of-date so far.

Rgds

Giuseppe