Dataiku development settings: CI, VSCode, Remote repository

Guglielmo
Guglielmo Registered Posts: 1

Hi all,

I've read the guide to connect dataiku dds to a github remote repository to manage branch creations through the use of duplication of projects, but I have some doubts about its usage since what seems to me is that to use it I still need to have a master project, and many copies each for each branch. And this number of projects can explode since I'm working in a team with more people and each one needs to open its own feature branch. An additional problem is that a big single project can be splitted in 3/4 small projects for having smaller projects and simplify the structure; adding another level of complexity if for each project a copy must be creater for each branch.

I would like to have the following setting to develop my solutions on the design node:

  • mantain a single codebase on github, with multiple branches
  • work on code only in local editing recipes with python/SQL developping in VSCODE
  • use PR on Github to merge feature branch into master
  • leverage the usage of github CI pipeline to perform my unit tests on library functions

I've seen the guide to connect VSCode to DSS, but looks like the SQL recipes can't be edited.

In addition I've seen the guide to connect DSS to a remote repository, but as explained before having multiple versions of the same project can let the number explode. Is there a good workaround for this?

I'm looking to a solution acceptable for a cliente, and a team of Data scientist not so good with ops so it has to become enough simple to make them accept to not just "work on dataiku directly" :)

Thanks in advance.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,914 Neuron

    For a better IDE integration in Dataiku look at Code Studios. This will allow users to edit Python recipes and libraries in a familiar Visual Studio Code interface running on the web. Code Studios is not trivial to setup since it requires Kubernetes which is a beast on its own. While Code Studios does provide a decent IDE experience it doesn't really solve the Git integration issues that Dataiku has. Only just recently Dataiku added functionality to handle Git conflicts and this is still an area of weakness. The overall issue is that decided to store recipes code in JSON files so even though these JSON files are indeed committed automatically to a local Git repo the JSON metadata makes it much more difficult to see the changes. Personally I think you should Dataiku as is, let the internal Git repo track changes and if you need to branch a project either clone it or work on a separate flow zone. And for libraries just include them in custom packages and manage them via code environments.

Setup Info
    Tags
      Help me…