Create a code environment manager module

info-rchitect
info-rchitect Registered Posts: 189 ✭✭✭✭✭✭

Hi,

My group has dozens of code environments and use various Python packages. It would be great if Dataiku had a dedicated module that would allow the user to specify a python package and get the current version in each code environment for the code environments I have access to.

Second, it would be great if I could push a change to a package version to all code environments I have write access to. When we change python versions, it can take a couple hours to manually edit and rebuild every code environment. I would love for Dataiku to handle this in a batch job process and alert me when all are done or alert me to which ones failed.

thx

4
4 votes

New · Last Updated

Comments

  • Grixis
    Grixis PartnerApplicant, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 82 ✭✭✭✭✭

    HEY,

    Yes, I'm definitely for add this idea in the dataiku backlog.

    For the moment, I've always struggled with the development of an interface (webapp/notebook) or a script (through code recipe) to manage a requirements update process from editables datasets for users.

    And the for "batch update process", I create scenarios with custom python step to update build/rebuild code environnement with an ALM that's a bit far-fetched for managing update results : regressions, CVEs, etc.

    Best,

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    While I don't work for Dataiku and nor I claim to represent them it's clear to me that more advanced / more specific requirements that can easily be implemented using the Dataiku API are not going to make it into the product as they tend to be needed only for certain kind of customers. This same customers tend to have very particular needs even for a specific feature which makes it even more difficult to have one solution that fits all. I believe this is one of the great Dataiku's strengths: a fully featured and rich API to do pretty much anything you can do via the GUI. And reading that you already spent hours doing this work manually I am surprised you haven't tried to automate this already specially knowing from previous posts that you are already familiar with the Dataiku API. So I hope this is the final nudge that you need…

    We pretty do all that you mentioned and more using the Dataiku API. In our case we use a combination of Jenkins pipelines, ADO, MS-SQL and PowerBI. Jenkins is our pipeline definition and orchestration tool. We store all Python code environments in Git so we can track changes and deploy them via Jenkins. We use MS-SQL and PowerBI to collect stats on code environment usage, be it both packages and project use.

    But the use of these additional tools is merely optional and something that fits within the toolset that we use in our team rather than using them because we need to. You could easily automate all of this and simply use Python code (either on scenarios or recipes), datasets stored in your preferred storage layer and Dataiku Webapps to interact with your code environments.

  • info-rchitect
    info-rchitect Registered Posts: 189 ✭✭✭✭✭✭

    Not all groups are technically aligned with using the Python API to handle cross-project code environment builds. Dataiku advertises low-code every time it pitches its software. When I use software like Dataiku, I break what I am doing into two fundamental categories:

    1. Value-add
    2. Housekeeping

    When I start to spend a lot of time or complexity on #2, I then would like the software provider to help with more than an API.

    We also are a company with varied groups, none of which align to a single use-case. Some use Jenkins, some use Github actions, some manually change things (low-code folks). There is no one-size fits all for us. Dataiku is also one of many enterprise pieces of software in our mix. To me, this sort of feature is something that would help make Dataiku easier to use, especially for the low code user groups.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    Dataiku advertises low-code ("clickers") as much as it promotes "coders". In fact that is the key differentiation with other products: Dataiku greatly supports both coders and clickers alike and gives them freedom of choice as it allows the designer to choose what approach to take and not only that but to also mix and match clicking and coding recipes as needed.

    We don't see updating code environments as a "housekeeping" task but a fundamental part of our software lifecycle. We provide a Jenkins pipeline for users to deploy Python code environments to Dataiku environments which gives them a no-code self-service solution which gives us full traceability of Who-Is-DOing-What (WIDOW). We also have different pipelines for different use cases and we can permission these pipelines accordingly to provide the right level of access to the right people. None of these users ever need to know or understand the Python APIs.

    For a good example of what Dataiku considers housekeeping look at the Macros section in any project.

    Furthermore expecting Dataiku to build a predefined SDLC for code environments when most customers will have different needs and requirements will probably end up wth a feature that nearly nobody uses as it will be too specific or too generic. Code environments are key component of your DevOps world as such require handling with integrated and automated approaches which clearly necesites the use of APIs.

    Like I said I don't have any authority here but in my experience and knowledge when dealing with Dataiku this is a request that is extremely unlikely to make it to the product for the reasons I stated before. So in my view your only choices are to continue to do things manually or build some tooling to make your life eaiser. Up to you what you think is more productive use of your time.

  • info-rchitect
    info-rchitect Registered Posts: 189 ✭✭✭✭✭✭

    I think you are confused about the ask or believe making it possible with code is all that is needed. Dataiku manages code environments with a UI currently. Hooking up the logic to know which code environments a user owns is simple. Second, asking Dataiku to enable a user to up-rev a package or two, across all of their code environments, without using Python is reasonable.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    I never meant to imply that your request was unreasonable or even harder to implement. I also never said that providing an API is the solution to all the user needs. What you do need to realise is that a vendor will have hundreds or even thousands of product enhancement requests many of which will probably be reasonable or quick to implement or even both. As such a prioritisation call needs to happen which will be based on many factors some commercial, some technical, some customer driven like how popular this request is and how many customers are likely to want to use it if it were implemented. In particular I have been told few times by many vendors not just Dataiku that a feature request that can be implemented via a vendor provided API and that it's "niche" (ie that is unlikely to be used by many customers or that it might need a complex design to meet different operational models) will be very unlikely to make it to the product as it can be custom built by the customer using the API to suit a particular need. So the main aim of my post was to make you aware of that point of view and push you to a custom solution since you can clearly benefit from it and gain significant operational efficiency by having such solution.

    You can also see that from the vendor point of view providing an API to let customers automate their operational needs is a much more efficient approach than trying to build every possible need into the product. And while I can also agree that many of the automation tooling we have developed ourselves over the years should have been part of the product I can not ignore the fact that any time spent by Dataiku on automation operational needs will be taken away from implementing new features as we know their R&D budget is a finite resource.

    Personally I think that letting customers implement their own operational solutions is the correct approach since we often find that any OOTB features around this area don't work exactly the way we want so we may end up doing our own thing anyway. And in the context of what Dataiku is (ie an ML platform) and the expectation of speed and amount of change happening on this space it's clear to me that new features need to be the priority in R&D budget.

Setup Info
    Tags
      Help me…