Use my own Python code in a DataIKU code_env
Hi
I have a library of Python packages that I want to use in DataIKU. How do I load them into a Code_Env when they are not in a pip repository (or the pip is private and cannot be accessed by DataIKU).
Answers
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron
Hi @cbridge
,You can use project libraries for creating packages you can import in code recipes and notebooks. These libraries can be sourced from a git repo.
The following documentation describes project libraries: https://doc.dataiku.com/dss/latest/python/reusing-code.html
Note that you can create a file in the python folder and then import that as a package in your code. For example, create "mypackage.py" in the python folder and then in your code you can import it: "import mypackage". No need to create a subfolder with a __init__.py file (although you can do that too).
This documentation details how to source project libraries from git repos: https://doc.dataiku.com/dss/latest/collaboration/import-code-from-git.html
I use this approach all of the time. Not quite as convenient as including packages in a code environment but works fine.
Marlan
-
Grixis PartnerApplicant, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 82 ✭✭✭✭✭
Hi @cbridge
You can upload and manage your own library at your project level without any dependance with your code env.
Or, if you have access to admin side of DSS you can try to manage the ressource directory of your targeted code env.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Install manually using this technique:
-
Thank you all for replies. I am still a little confused.
I have a Python package with multiple sub-files and loading them up into the Library looks time consuming, one file at a time.
I would like a standard code_env that I can reuse in multiple projects, having one place to update the code. I have Admin access. I've looked at loading the files in the Resource directory as zip files, and this appears to work. However, I can't seam to add them to the Requested Packages (pip). Any line causes an error - e.g. /Resource directory/[package], or even -e /Resource directory/[package]
My mistake in the original question, i meant a private git rather than pip.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Do you have access to an Artifactory (From JFrog) repository? Artifactory can host local Python repositories which allow you to publish your custom Python packages and install them via pip in your Dataiku code environment (after adding the Artifactory URL as trusted). This is the best solution for custom Python packages.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
So Artifactory will be the easiest solution if it is available to you. Otherwise you can create your own Python mirror web server. Most people use devpi which is a web server to host a Python mirror which you can then use directly in Dataiku by adding to the configuration (see Using custom package repositories). This option has the benefit of working in all the other Dataiku components like the Automation Node, API Node and Containerized execution which some of the other solutions won't work. You can even host your own Python basic mirror using nginx if you prefer a more simple solution. In Dataiku even installs nginx as well so you may be able to use the same binaries with a different configuration.
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron
Hi @cbridge
,Sounds like you would prefer installing your package in a code environment which I definitely get as that would be most convenient. @Turribeach
has options there covered.I would note that you can use Import from Git to bring all elements of your package into a project libriary in one step. It would not be a one file at a time operation.
Marlan
-
Hello @cbridge
,
In "packages to install" in your code env, you can use any syntax that is supported by pip when reading a requirements file
/localpath/to/tar.gz will workgit+https://github.com/myrepo/lib.git will work
https://webpath/to/tar.gz will work
So depending on how your files are packages, hopefully one of those will work for youYou shouldn't need to add the package specifically to the code env resources, as long as it resolves on disk, this should work out.
Best regards,
Nico