Dataiku python library install

dkj_
Level 1
Dataiku python library install

hello Dataikers

I have a question.

1. Is it possible to add python library to dataiku in closed network on-premises?
If possible, how should I perform it?

2. Is there a limited library when installing dataiaku Python library through pip
Or is there a library that cannot be installed?
(For example, in Anaconda, library installation may be limited or impossible.)

thank you


Operating system used: Custom Dataiku install on Linux

0 Kudos
2 Replies
JuanE
Dataiker

Hello,


DSS leverages standard third-party tools for Python package management like pip (and conda, for conda-based code environments). You can customize code environments to instruct these tools to operate in closed environments (e.g., by retrieving packages via a proxy or an internal package repository) as you would outside DSS. There are more details in our documentation:

https://doc.dataiku.com/dss/latest/code-envs/custom-options.html

 

0 Kudos
Turribeach
  1. Been there. Done that. Got burned. Never again. It’s technically possible but realistically infeasible. pip can install locally downloaded packages so if you have the dedication to manually download all the packages you need, their dependencies and you install them from local files in the right order you might get there with some significant pain. The problem is that for every package that you want there are around 5x dependencies needed. And these dependencies go multiple levels so that if you want package a, then you need b which needs c and d, etc, etc. In addition to this complexity there are many different versions of each package and those versions have different dependencies such as Python version installed, Operating System and other Python packages already installed in the code environment. This makes the job of working out which exact packages you need really hard. If you had a perfect clone of your actual offline system somewhere else where you could install packages and see what dependencies and versions you need that will help a lot. Without that realistically this is not feasible. The best approach if you or your company don’t want to download packages directly from the internet will be to use a local PyPi mirror with tools like Artifactory where you can have a subset of “approved” packages and their dependencies. But this will slow down your users a lot as every time they need a new package you will need to get it and add it to the repo. 
  2. Of course there will always be package incompatibilities. This has nothing to do with Dataiku. There can be OS level incompatibilities, Python versions, clashes with other packages or simply too many packages installed that pip can’t figure out dependencies. But Dataiku uses virtual code environments which do a great job of isolating each Python code environment and preventing clashes as much as possible. So in my experience clashes you can’t work around are rare. 
0 Kudos