allow users to upgrade the python interpreter in their code env

Tanguy
Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 118 Neuron

We have hundreds of Python code environments, known as "code envs", within our Dataiku design instance. Our objective is to clean and upgrade these environments. However, Dataiku users have restricted rights that prevent them from altering the Python interpreter. This imposes a challenge due to 'end of life' Python interpreters that we desire to upgrade (see following image from https://devguide.python.org/versions/).

Capture.JPGFigure 1 - Python interpreter status in mid 2024

Yet, we observed that Dataiku allows upgrades of code envs that are labelled 'deprecated'.

Capture2.JPG

Figure 2 - DSS provides a variety of Python interpreters, including those labeled as 'deprecated'. For these versions, DSS issues alerts indicating their potential removal in upcoming versions.

Capture3.JPG

Figure 3 - For code environments utilizing a deprecated Python interpreter, DSS provides users with the option to perform an upgrade

Thus, our inquiry is whether an admin can modify the status of Python versions to 'deprecated', which would

  1. inform users of a potential removal of this Python version in the ensuing DSS version
  2. grant users the ability to upgrade their Python interpreter (as it appears that the 'deprecated' tag activates the 'change interpreter' button)

Is this possible?


Operating system used: RHL 8

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron

    Hi, I don't think there are any ways to modify the Python interpreter list and their designations as "deprecated" or "experimental". As far as I know this is static and hardcoded list which only gets changed by Dataiku in new releases. While I wasn't aware of this "upgrade code env" functionality I will strongly descourage you to use it. Code env lifecycle is indeed a challenge on large DSS installations but the last thing I would want on my system is for users to have the ability to update code environments changing Python intepreters. There is no actual way of upgrading the Python interpreter in a Python code environment. What Dataiku is doing is effectively attempting to reinstall all requested packages after changing the Python interpreter. And I say "attempting to reinstall" because I did a quick test trying to "upgrade" a Python 2.7 code env to Python 3.7 while only having a simple package installed (beautifulsoup) and it failed miserably ("You're trying to run a very old release of Beautiful Soup under Python 3. This will not work. Please use Beautiful Soup 4, available through the pip package 'beautifulsoup4'").

    Furthermore when you move to a new Python interpreter you are most likely going to get package changes on the core packages and the Jupyter packages. These changes alone could have a massive impact in your recipes, specially those using pandas. Below is a sample Python 2.7 code env that I was able to upgrade to Python 3.7. It only had a single requested package (lxml) and core packages and the Jupyter packages. You can see that is more than 50% different. This strongly suggests you should do some proper testing when you move to a new Python interpreter code environment. Dataiku itself warns you of this: "Updating this code env in-place might break usages". The traditional approach is then for you to create a new code environment on the desired Python interpreter, let your users know about it and encourage them to test and migrate their projects, recipes, Notebooks, etc to it ASAP. Once the old code environment is no longer in use you delete it. In the meantime you can remove permissions from it so they can no longer select it.

    Finally note that the Dataiku list does not really match the end-of-life as per the Python.org release cycle and it's only based deprecation support from the Dataiku side. So if you want to stay on relevant versions of the Python interpreter you will need to handle this yourself manually. Using the Dataiku Python API you can quickly build tooling to report on code envs, their Python interpreter and their usage.

    Screenshot 2024-04-17 at 23.28.50.png

    PS: To clarify Dataiku doesn't provide any Python interpreters, these are installed in your system by the System Administrator. If the Python interpreter is not present in the system you will not be able to select it on the Python interpreter drop down list in Dataiku when creating new code envs.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron

    One thing I forgot to point out and it’s probably the main reason as to why you don’t want to “upgrade” code environments to new Python interpreters is that the safest approach will be “opt-in” rather than “opt-out”. If you upgrade an existing code env you are forcing all code using that code env into the new interpreter. If something fails you could “opt-out” that piece of code to an older environment until you fix it so you most likely will end up having to create another code env anyway as it is highly unlikely all code will work successfully on the new interpreter. So by creating a new code end for the new interpreter you can then use an “opt-in” approach. You move code to the new code env and test. If test OK move to next project, if not fix it or roll it back to old code env for now. While this approach appears to require more work it doesn’t have to be. You could for instance use the Python API to change all code / projects using the old code env to the new code env which will be similar to “upgrading” the code env with the new interpreter. But at least you are in full control of what’s happening. The final benefit of this approach is this change will be visible in the project version control which will help business users pinpoint issues should there be a silent failure (ie incorrect data) or a proper flow failure. With the code env upgrade you have no idea that the code env has been changed, at least not from the project point of view.

Setup Info
    Tags
      Help me…