Automated Pull of a Remote Repository Branch

Solved!
jacescheffler
Level 1
Automated Pull of a Remote Repository Branch

Our team is currently using both a CI/CD tool for automating deployments through Project Deployer API's and a remote git repository for version control. We would like to go one step further and implement automated deployments that are triggered when a merge to the master branch in the remote git repository occurs. The immediate issue is that when a merge occurs in the master branch of the remote git repository (not in the Dataiku project on the local Dataiku server), the CI/CD tool is triggered to run a Dataiku deployment on that project, but there is no apparent way to first automatically pull the master branch of the remote git repository into the project via API's or some other method.

Current state, in order for someone to deploy the newest change to master, a user would need to go into the Dataiku UI for that project and make a pull on the master branch of the remote git repository into their project before either manually or by some other means triggering the CI/CD tool's deployment process. Ideally there would be some API call in Dataiku for us to pull the remote git repository on the project's branch before carrying out the deployment via the Project Deployer API. Any recommendations on how to perform this with the current state of Dataiku DSS 9.0? Is there an API available for this? Should this be a feature request?

One idea we had was setting up the CI/CD tool to monitor the Dataiku server's local git repository. This is however a rather invasive approach that requires modification to the Dataiku server that would enable it to 'serve up' its local git repo. Ideally there would be a built-in feature of Dataiku to handle this kind of automation.

Ideal Process

  1. A user merges a change into the master branch from the development branch of the Dataiku project via the UI of the remote git repository (e.g. GitHub, GitLab, etc.)
  2. The CI/CD tool is triggered by a change to the master branch in the remote git repository
  3. [The missing step] The CI/CD tool sends a command to Dataiku to pull the latest merge to the master branch in the remote git repository into the project of Dataiku that is currently on the master branch
  4. The CI/CD tool then runs the Project Deployer deployment process to bundle that Dataiku project on the master branch and deliver it to the Automation Node.
1 Solution
Clément_Stenac
Dataiker

Hi,

There is indeed at the moment no API for triggering a pull of the remote branch, and this is indeed a feature request that is in our backlog.

However, the Git repository for each project is a normal Git repository, so you can go into the "DATADIR/config/projects/YOUR_PROJECT" and run normal git commands, including "git pull". While this requires a bit more plumbing (because you need to run a shell command), this should do what you need.

Hope this helps,

View solution in original post

0 Kudos
5 Replies
Clément_Stenac
Dataiker

Hi,

There is indeed at the moment no API for triggering a pull of the remote branch, and this is indeed a feature request that is in our backlog.

However, the Git repository for each project is a normal Git repository, so you can go into the "DATADIR/config/projects/YOUR_PROJECT" and run normal git commands, including "git pull". While this requires a bit more plumbing (because you need to run a shell command), this should do what you need.

Hope this helps,

0 Kudos
jacescheffler
Level 1
Author

Hi Clément,

Thanks for your reply! We did not consider simply pulling the repository on the backend. That may be our best option.

If this is in fact a feature request on your backlog, is there a place where I can promote it?

Thank you,

Jace

0 Kudos
jasperpaalman
Level 1

Hi all,

How would you go about implementing this?

We're having a similar issue where we want to automatically create a bundle after the completion of a pull request (into main). Somehow you want to kick of a series of commands that perform GIT actions on the Design Node. A part of this script would be checking out to the main branch, thereby changing the referenced code on the Design Node. So, I'm assuming that including this series of commands into a scenario is a no go (because you're switching the branch in the scenario itself). However, how then do I run a series of commands from an external source (say a pipeline) on the Design Node? Maybe I'm missing something :).

Best,

Jasper

LazySloth
Level 1

Hey Did you find the solution for this problem?

0 Kudos
krbee
Level 2

We have a scenario on a shared project called "pull_branch". It takes a few parameters

{
"project_key": "my_project",
"git_branch": "my_git_branch"

In a subsequent step we execute the following code

from utils.utils import checkout_and_pull_with_conflict_resolution
import dataiku
import dataiku.scenario

sc = dataiku.scenario.Scenario()
v = sc.get_all_variables()
print(f"here are all variables: {v}")
git_branch = v["git_branch"]
current_project_key = v["project_key"]

print(f"current project: {current_project_key} on branch: {git_branch}")
git_dir = f"/data/dataiku/design/config/projects/{current_project_key}/"
try:
    checkout_and_pull_with_conflict_resolution(git_dir, git_branch)
except MergeConflictException as e:
    print(e)

The `checkout_and_pull_with_conflict_resolution` function is based on git for python.

This scenario is executed from our CI/CD pipeline. This ensures that development done offline is synced before doing any tests etc. in the pipeline

0 Kudos