Context: When Dataiku installs in a system it creates a built-in Python code environment which is based on the supported version of Python for such task. This is Python v2.7 for old Dataiku versions but from v9 Dataiku till v11 will use Python v3.6 or Python v3.7 where available as the built-in Python code environment. v12 then added support for Python v3.9 as built-in environment. It's worth noting that the built-in environment is a complete different beast that regular code environments that you create in Dataiku code environments Admin screen. For instance Dataiku v11.4 added support for Python v3.11 for code environments (v11.0 also added Python v3.8 and Python 3.9 too) but as above Dataiku v12 only currently supports Python v3.9 for built-in. You can specify the built-in Python version in the installation script as follows:
installer.sh -d /path/to/DATA_DIR -P python3.7 -p PORT
While I have not found any documentation stating it I have been told few times by many Dataiku employees that we should not install packages in the built-in Python code environment as it is critical this code environment remains pristine and in working condition. This is what the documention says:
The DSS installation phase creates an initial โbuiltinโ Python environment, which is used to run all Python-based internal DSS operations, and is also used as a default environment to run user-provided Python code. This builtin Python environment comes with a default set of packages, suitable for this version of DSS. These are setup by the DSS installer and updated accordingly on DSS upgrades. This builtin environment is not controllable nor configurable by user. Depending on the OS used, a suitable Python version is automatically used.
It is this dual use that I think should be split:
Dataiku will argue that you can use the built-in code environment and that if you need any changes you should create your own code environment using the code environment management functionality that Dataiku provides. I disagree with the status quo for many reasons:
So what's this idea about? Simply separate the built-in code environment into two:
Hope it makes sense. Thanks for reading!