Integrate DVC for data and feature versioning into DSS

gnaldi62
gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

It would be useful to integrate DVC into DSS. We have been asked many times about a solution which can provide also dataset and feature versioning (a feature repository would be useful as well), so one could select the latest version of a dataset and put it into a pipeline. DSS makes it lot easier to set up a MLOps pipeline, but with regards to theser areas it needs some improvements. My 2 cents.

Regards.

Giuseppe

0
0 votes

New · Last Updated

Comments

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Hi @gnaldi62
    thank you for your idea! For the benefit of the community, we’ll more for my benefit, can you define DVC. I know it’s easy to assume we all know acronyms, but it will be easier for all members of this community, again mostly me 🤪, if we stay away from acronyms on the ideas. Thank you for your support and for the education!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @CoreyS
    ,

    I’m wondering if this https://dvc.org/ is what @gnaldi62
    is talking about.

    @gnaldi62
    ,

    If I’m correct about what DVC is. Given that Dataiku DSS has an extensive underpinning of git. And this tool set is based on git. Have you experimented with trying to use both Dataiku and DVC on a single repository? I don’t know if it would work. So I’d invite you to try only in a dev environment you can afford corrupt.

    —Tom

  • gnaldi62
    gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

    Sorry. DVC is actually not an acronym but a product . URL is dvc.org It's a version control system specifically developed for ML projects.

    Regards,

    Giuseppe

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Thanks for the clarification @gnaldi62
    and @tgb417
    !

  • gnaldi62
    gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

    Noticed that in DSS 10 includes MLFlow integration. So I suppose my suggestion could be considered out-of-date so far.

    Rgds

    Giuseppe

Setup Info
    Tags
      Help me…