Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Use my own Python code in a DataIKU code_env

cbridge
Level 1
Use my own Python code in a DataIKU code_env

Hi

I have a library of Python packages that I want to use in DataIKU. How do I load them into a Code_Env when they are not in a pip repository (or the pip is private and cannot be accessed by DataIKU).

0 Kudos
8 Replies
Marlan

Hi @cbridge,

You can use project libraries for creating packages you can import in code recipes and notebooks. These libraries can be sourced from a git repo.

The following documentation describes project libraries: https://doc.dataiku.com/dss/latest/python/reusing-code.html

Note that you can create a file in the python folder and then import that as a package in your code. For example, create "mypackage.py" in the python folder and then in your code you can import it: "import mypackage". No need to create a subfolder with a __init__.py file (although you can do that too). 

This documentation details how to source project libraries from git repos: https://doc.dataiku.com/dss/latest/collaboration/import-code-from-git.html

I use this approach all of the time. Not quite as convenient as including packages in a code environment but works fine.

Marlan

0 Kudos
Grixis
Level 4

Hi @cbridge 

You can upload and manage your own library at your project level without any dependance with your code env.

 

Or, if you have access to admin side of DSS you can try to manage the ressource directory of your targeted code env.

 

 

0 Kudos
cbridge
Level 1
Author

Thank you all for replies. I am still a little confused.

I have a Python package with multiple sub-files and loading them up into the Library looks time consuming, one file at a time.

I would like a standard code_env that I can reuse in multiple projects, having one place to update the code. I have Admin access. I've looked at loading the files in the Resource directory as zip files, and this appears to work. However, I can't seam to add them to the Requested Packages (pip). Any line causes an error - e.g. /Resource directory/[package], or even -e /Resource directory/[package]

My mistake in the original question, i meant a private git rather than pip. 

0 Kudos

Do you have access to an Artifactory (From JFrog) repository? Artifactory can host local Python repositories which allow you to publish your custom Python packages and install them via pip in your Dataiku code environment (after adding the Artifactory URL as trusted). This is the best solution for custom Python packages. 

0 Kudos

So Artifactory will be the easiest solution if it is available to you. Otherwise you can create your own Python mirror web server. Most people use devpi which is a web server to host a Python mirror which you can then use directly in Dataiku by adding to the configuration (see Using custom package repositories). This option has the benefit of working in all the other Dataiku components like the Automation Node, API Node and Containerized execution which some of the other solutions won't work. You can even host your own Python basic mirror using nginx if you prefer a more simple solution. In Dataiku even installs nginx as well so you may be able to use the same binaries with a different configuration.

0 Kudos

Hi @cbridge,

Sounds like you would prefer installing your package in a code environment which I definitely get as that would be most convenient. @Turribeach has options there covered.

I would note that you can use Import from Git to bring all elements of your package into a project libriary in one step. It would not be a one file at a time operation. 

Marlan

0 Kudos
NicolasD
Dataiker

Hello @cbridge ,

In "packages to install" in your code env, you can use any syntax that is supported by pip when reading a requirements file 
/localpath/to/tar.gz will work

git+https://github.com/myrepo/lib.git will work

https://webpath/to/tar.gz will work

So depending on how your files are packages, hopefully one of those will work for you 🙂 

You shouldn't need to add the package specifically to the code env resources, as long as it resolves on disk, this should work out.

Best regards,
Nico

 

0 Kudos