Initializing a project from Git

yashpuranik
yashpuranik Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 69 Neuron

Dataiku allows connecting a project to a Git repo and push updates to it.

Suppose I create a project on instance A and push to Git. And now I want to pull the same project on instance B. This is difficult, because as soon as I create a new blank project on instance B, I have initialized a git repository that may not be easily merged with the remote repo on Git. I can fix these by SSH into instance B and fixing the merge conflicts with the command line, but that is not an easy option available to non-admins.

Can we add a second option to import a project to initialize directly from Git in addition to importing from zips?

3
3 votes

New · Last Updated

Comments

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,984 Neuron

    What is the objective of this idea? To keep two projects in sync in two different Designer Nodes? If so why can't this be automated with a scenario that deploy project X from instance A to instance B using the Dataiku APIs?

  • yashpuranik
    yashpuranik Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 69 Neuron

    A use case where we see something like this could be useful is some organizations prefer keeping their dev/prod environments completely separated in different subnets. In this scenario, a Design Node on a dev environment won't be able to connect to a Design Node on the prod environment with a scenario. Connecting them with the Git provider does mean that the Git provider is connected to both subnets, but that seems to be more acceptable for some orgs.....

    Of course you could set up something like a GitHub Action to use the Dataiku API to do this job, but adding another options within the product seems relatively easy and increases usability in my opinion.

    And yes, I am aware that DSS philosophy calls for only one Design Node in general with multiple infrastructures to mimic prod or other environments, but practically many organizations choose to have multiple design nodes for all kinds of reasons (different group ownerships, VPN or subnet configurations or other internal politics)

Setup Info
    Tags
      Help me…