Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

What is your git workflow to have multiple developpers working on the same project ?

Keufran
Level 3
What is your git workflow to have multiple developpers working on the same project ?

Hi All,

We're trying to find a correct git workflow to allow multiple developers  to work simutaneously on the same project, preferably as smoothly as a regular software project

We tried a somewhat classical:

- project duplication on design node for each developper 

- feature-branch creation, the developer points his project to the branch he wants to work on

- Merge Request the feature branch to master (on our internal gitlab) and synchronyze our master project with the master branch

But we have a problem merging the params.json at the top-level directory... Some parameters of the duplicated project erase the ones of the main project (owner, id) and we don't want this. But some other informations seems to be needed for the merge to happen correctly (new datasets for instance).

How to handle this ? Can merging the params.json be safely ignored, beting on the fact that DSS will "reconstruct it" ?

Notice: we didn't take time to fully experiment skipping the params.json merging.

4 Replies
tgb417

@Keufran ,

Have you worked out the best practices for DSS in git with branches?

--Tom
0 Kudos
Keufran
Level 3
Author

@tgb417,

No, we finally gave-up. We had some hope with v10 to be able to get an acceptable workflow , but It was never installed in our infra (we work on-premise).

I think the most efficient process (regarding DSS functionalities) is to replicate the part of the flow you're going to modify in a separate project (with a separate git repo). To merge when finished, you end up reporting files manually and use merging tools outside of any forge (like Gitlab or Github).

0 Kudos

@Keufran ,

First of all Thank you for your kind reply.

let’s see if I understand the step:

1. Select objects from the current project. Use the standard DSS copy process with a destination in a different new project.

2. Setup a remote repo for that project say on GitHub.

3. Connect the new project with the copied sub flow to the new repo.
4. Puts the content of this new temporary project to GitHub.

5. Do development on this new project as normal. Pushing changes to the GitHub repo as necessary.

That all makes sense to me, However when you got here in your description you lost me.

To merge when finished, you end up reporting files manually and use merging tools outside of any forge (like Gitlab or Github).”

I can see re-importing say a notebook files by hand copying the text of the notebook out of one project back into the original project. However, what I don’t understand is other types of flow objects. How are you thinking about merging them. Going to the disk on the DSS and directly manipulating the files that make up the flow? Somehow using GitHub to take the new temporary project and merging it with the original repo over on GitHub. Something like described here? https://blog.devgenius.io/how-to-merge-two-repositories-on-git-b0ed5e3b4448

Or were you thinking about something else?

Thank you for any insights you can share.

--Tom
0 Kudos
Keufran
Level 3
Author

Hi @tgb417 ,

Sorry, my reply was written in a hurry and really too much confusing.

What I wanted to say is that we finally abandoned using a central Git repo to sync and coordinate between us. With our version (9), this finally leads to bigger problems than without.

We tried to be careful to work on different parts of the project and when not possible, we "forked manually" a part of the flow in another project and then merge everything back manually (aka by hand & eyes...).

When needed, we use tools to help in the merge (typically for source code modified on both sides), but these tools are not related in any way with DSS or Gitlab or Github. If script "P" has been modified in project A and project B, copy paste P from A and B locally on your computer and use a tool to merge them correctly. Then copy-past the result back to P in A and B.

With newer versions (>=10) and using smartly the concepts of project libraries and DSS plug-ins, I guess you may achieve something more efficient.