Initializing a project from Git

Dataiku allows connecting a project to a Git repo and push updates to it.

Suppose I create a project on instance A and push to Git. And now I want to pull the same project on instance B. This is difficult, because as soon as I create a new blank project on instance B, I have initialized a git repository that may not be easily merged with the remote repo on Git. I can fix these by SSH into instance B and fixing the merge conflicts with the command line, but that is not an easy option available to non-admins.

Can we add a second option to import a project to initialize directly from Git in addition to  importing from zips?

yashpuranik
2 Comments

What is the objective of this idea? To keep two projects in sync in two different Designer Nodes? If so why can't this be automated with a scenario that deploy project X from instance A to instance B using the Dataiku APIs?

What is the objective of this idea? To keep two projects in sync in two different Designer Nodes? If so why can't this be automated with a scenario that deploy project X from instance A to instance B using the Dataiku APIs?

A use case where we see something like this could be useful is some organizations prefer keeping their dev/prod environments completely separated in different subnets. In this scenario, a Design Node on a dev environment won't be able to connect to a Design Node on the prod environment with a scenario. Connecting them with the Git provider does mean that the Git provider is connected to both subnets, but that seems to be more acceptable for some orgs.....

Of course you could set up something like a GitHub Action to use the Dataiku API to do this job, but adding another options within the product seems relatively easy and increases usability in my opinion.

And yes, I am aware that DSS philosophy calls for only one Design Node in general with multiple infrastructures to mimic prod or other environments, but practically many organizations choose to have multiple design nodes for all kinds of reasons (different group ownerships, VPN or subnet configurations or other internal politics)

yashpuranik

A use case where we see something like this could be useful is some organizations prefer keeping their dev/prod environments completely separated in different subnets. In this scenario, a Design Node on a dev environment won't be able to connect to a Design Node on the prod environment with a scenario. Connecting them with the Git provider does mean that the Git provider is connected to both subnets, but that seems to be more acceptable for some orgs.....

Of course you could set up something like a GitHub Action to use the Dataiku API to do this job, but adding another options within the product seems relatively easy and increases usability in my opinion.

And yes, I am aware that DSS philosophy calls for only one Design Node in general with multiple infrastructures to mimic prod or other environments, but practically many organizations choose to have multiple design nodes for all kinds of reasons (different group ownerships, VPN or subnet configurations or other internal politics)