What is your git workflow to have multiple developpers working on the same project ?

Options
Keufran
Keufran Registered Posts: 12 ✭✭✭✭

Hi All,

We're trying to find a correct git workflow to allow multiple developers to work simutaneously on the same project, preferably as smoothly as a regular software project

We tried a somewhat classical:

- project duplication on design node for each developper

- feature-branch creation, the developer points his project to the branch he wants to work on

- Merge Request the feature branch to master (on our internal gitlab) and synchronyze our master project with the master branch

But we have a problem merging the params.json at the top-level directory... Some parameters of the duplicated project erase the ones of the main project (owner, id) and we don't want this. But some other informations seems to be needed for the merge to happen correctly (new datasets for instance).

How to handle this ? Can merging the params.json be safely ignored, beting on the fact that DSS will "reconstruct it" ?

Notice: we didn't take time to fully experiment skipping the params.json merging.

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @Keufran
    ,

    Have you worked out the best practices for DSS in git with branches?

  • Keufran
    Keufran Registered Posts: 12 ✭✭✭✭
    Options

    @tgb417
    ,

    No, we finally gave-up. We had some hope with v10 to be able to get an acceptable workflow , but It was never installed in our infra (we work on-premise).

    I think the most efficient process (regarding DSS functionalities) is to replicate the part of the flow you're going to modify in a separate project (with a separate git repo). To merge when finished, you end up reporting files manually and use merging tools outside of any forge (like Gitlab or Github).

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @Keufran
    ,

    First of all Thank you for your kind reply.

    let’s see if I understand the step:

    1. Select objects from the current project. Use the standard DSS copy process with a destination in a different new project.

    2. Setup a remote repo for that project say on GitHub.

    3. Connect the new project with the copied sub flow to the new repo.
    4. Puts the content of this new temporary project to GitHub.

    5. Do development on this new project as normal. Pushing changes to the GitHub repo as necessary.

    That all makes sense to me, However when you got here in your description you lost me.

    To merge when finished, you end up reporting files manually and use merging tools outside of any forge (like Gitlab or Github).”

    I can see re-importing say a notebook files by hand copying the text of the notebook out of one project back into the original project. However, what I don’t understand is other types of flow objects. How are you thinking about merging them. Going to the disk on the DSS and directly manipulating the files that make up the flow? Somehow using GitHub to take the new temporary project and merging it with the original repo over on GitHub. Something like described here? https://blog.devgenius.io/how-to-merge-two-repositories-on-git-b0ed5e3b4448

    Or were you thinking about something else?

    Thank you for any insights you can share.

  • Keufran
    Keufran Registered Posts: 12 ✭✭✭✭
    Options

    Hi @tgb417
    ,

    Sorry, my reply was written in a hurry and really too much confusing.

    What I wanted to say is that we finally abandoned using a central Git repo to sync and coordinate between us. With our version (9), this finally leads to bigger problems than without.

    We tried to be careful to work on different parts of the project and when not possible, we "forked manually" a part of the flow in another project and then merge everything back manually (aka by hand & eyes...).

    When needed, we use tools to help in the merge (typically for source code modified on both sides), but these tools are not related in any way with DSS or Gitlab or Github. If script "P" has been modified in project A and project B, copy paste P from A and B locally on your computer and use a tool to merge them correctly. Then copy-past the result back to P in A and B.

    With newer versions (>=10) and using smartly the concepts of project libraries and DSS plug-ins, I guess you may achieve something more efficient.

Setup Info
    Tags
      Help me…