Create a code environment manager module

info-rchitect Registered Posts: 174 ✭✭✭✭✭✭


My group has dozens of code environments and use various Python packages. It would be great if Dataiku had a dedicated module that would allow the user to specify a python package and get the current version in each code environment for the code environments I have access to.

Second, it would be great if I could push a change to a package version to all code environments I have write access to. When we change python versions, it can take a couple hours to manually edit and rebuild every code environment. I would love for Dataiku to handle this in a batch job process and alert me when all are done or alert me to which ones failed.


4 votes

New · Last Updated


  • Grixis
    Grixis PartnerApplicant, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 75 ✭✭✭✭✭


    Yes, I'm definitely for add this idea in the dataiku backlog.

    For the moment, I've always struggled with the development of an interface (webapp/notebook) or a script (through code recipe) to manage a requirements update process from editables datasets for users.

    And the for "batch update process", I create scenarios with custom python step to update build/rebuild code environnement with an ALM that's a bit far-fetched for managing update results : regressions, CVEs, etc.


  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,925 Neuron

    While I don't work for Dataiku and nor I claim to represent them it's clear to me that more advanced / more specific requirements that can easily be implemented using the Dataiku API are not going to make it into the product as they tend to be needed only for certain kind of customers. This same customers tend to have very particular needs even for a specific feature which makes it even more difficult to have one solution that fits all. I believe this is one of the great Dataiku's strengths: a fully featured and rich API to do pretty much anything you can do via the GUI. And reading that you already spent hours doing this work manually I am surprised you haven't tried to automate this already specially knowing from previous posts that you are already familiar with the Dataiku API. So I hope this is the final nudge that you need…

    We pretty do all that you mentioned and more using the Dataiku API. In our case we use a combination of Jenkins pipelines, ADO, MS-SQL and PowerBI. Jenkins is our pipeline definition and orchestration tool. We store all Python code environments in Git so we can track changes and deploy them via Jenkins. We use MS-SQL and PowerBI to collect stats on code environment usage, be it both packages and project use.

    But the use of these additional tools is merely optional and something that fits within the toolset that we use in our team rather than using them because we need to. You could easily automate all of this and simply use Python code (either on scenarios or recipes), datasets stored in your preferred storage layer and Dataiku Webapps to interact with your code environments.

  • info-rchitect
    info-rchitect Registered Posts: 174 ✭✭✭✭✭✭

    Not all groups are technically aligned with using the Python API to handle cross-project code environment builds. Dataiku advertises low-code every time it pitches its software. When I use software like Dataiku, I break what I am doing into two fundamental categories:

    1. Value-add
    2. Housekeeping

    When I start to spend a lot of time or complexity on #2, I then would like the software provider to help with more than an API.

    We also are a company with varied groups, none of which align to a single use-case. Some use Jenkins, some use Github actions, some manually change things (low-code folks). There is no one-size fits all for us. Dataiku is also one of many enterprise pieces of software in our mix. To me, this sort of feature is something that would help make Dataiku easier to use, especially for the low code user groups.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,925 Neuron

    Dataiku advertises low-code ("clickers") as much as it promotes "coders". In fact that is the key differentiation with other products: Dataiku greatly supports both coders and clickers alike and gives them freedom of choice as it allows the designer to choose what approach to take and not only that but to also mix and match clicking and coding recipes as needed.

    We don't see updating code environments as a "housekeeping" task but a fundamental part of our software lifecycle. We provide a Jenkins pipeline for users to deploy Python code environments to Dataiku environments which gives them a no-code self-service solution which gives us full traceability of Who-Is-DOing-What (WIDOW). We also have different pipelines for different use cases and we can permission these pipelines accordingly to provide the right level of access to the right people. None of these users ever need to know or understand the Python APIs.

    For a good example of what Dataiku considers housekeeping look at the Macros section in any project.

    Furthermore expecting Dataiku to build a predefined SDLC for code environments when most customers will have different needs and requirements will probably end up wth a feature that nearly nobody uses as it will be too specific or too generic. Code environments are key component of your DevOps world as such require handling with integrated and automated approaches which clearly necesites the use of APIs.

    Like I said I don't have any authority here but in my experience and knowledge when dealing with Dataiku this is a request that is extremely unlikely to make it to the product for the reasons I stated before. So in my view your only choices are to continue to do things manually or build some tooling to make your life eaiser. Up to you what you think is more productive use of your time.

Setup Info
      Help me…