Use my own Python code in a DataIKU code_env

Options
cbridge
cbridge Registered Posts: 2

Hi

I have a library of Python packages that I want to use in DataIKU. How do I load them into a Code_Env when they are not in a pip repository (or the pip is private and cannot be accessed by DataIKU).

Answers

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 316 Neuron
    Options

    Hi @cbridge
    ,

    You can use project libraries for creating packages you can import in code recipes and notebooks. These libraries can be sourced from a git repo.

    The following documentation describes project libraries: https://doc.dataiku.com/dss/latest/python/reusing-code.html

    Note that you can create a file in the python folder and then import that as a package in your code. For example, create "mypackage.py" in the python folder and then in your code you can import it: "import mypackage". No need to create a subfolder with a __init__.py file (although you can do that too).

    This documentation details how to source project libraries from git repos: https://doc.dataiku.com/dss/latest/collaboration/import-code-from-git.html

    I use this approach all of the time. Not quite as convenient as including packages in a code environment but works fine.

    Marlan

  • Grixis
    Grixis PartnerApplicant, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 47 ✭✭✭✭✭
    Options

    Hi @cbridge

    You can upload and manage your own library at your project level without any dependance with your code env.

    Or, if you have access to admin side of DSS you can try to manage the ressource directory of your targeted code env.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,718 Neuron
    Options
  • cbridge
    cbridge Registered Posts: 2
    Options

    Thank you all for replies. I am still a little confused.

    I have a Python package with multiple sub-files and loading them up into the Library looks time consuming, one file at a time.

    I would like a standard code_env that I can reuse in multiple projects, having one place to update the code. I have Admin access. I've looked at loading the files in the Resource directory as zip files, and this appears to work. However, I can't seam to add them to the Requested Packages (pip). Any line causes an error - e.g. /Resource directory/[package], or even -e /Resource directory/[package]

    My mistake in the original question, i meant a private git rather than pip.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,718 Neuron
    Options

    Do you have access to an Artifactory (From JFrog) repository? Artifactory can host local Python repositories which allow you to publish your custom Python packages and install them via pip in your Dataiku code environment (after adding the Artifactory URL as trusted). This is the best solution for custom Python packages.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,718 Neuron
    Options

    So Artifactory will be the easiest solution if it is available to you. Otherwise you can create your own Python mirror web server. Most people use devpi which is a web server to host a Python mirror which you can then use directly in Dataiku by adding to the configuration (see Using custom package repositories). This option has the benefit of working in all the other Dataiku components like the Automation Node, API Node and Containerized execution which some of the other solutions won't work. You can even host your own Python basic mirror using nginx if you prefer a more simple solution. In Dataiku even installs nginx as well so you may be able to use the same binaries with a different configuration.

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 316 Neuron
    Options

    Hi @cbridge
    ,

    Sounds like you would prefer installing your package in a code environment which I definitely get as that would be most convenient. @Turribeach
    has options there covered.

    I would note that you can use Import from Git to bring all elements of your package into a project libriary in one step. It would not be a one file at a time operation.

    Marlan

  • NicolasD
    NicolasD Dataiker, Dataiku DSS Core Designer, Registered Posts: 12 Dataiker
    Options

    Hello @cbridge
    ,

    In "packages to install" in your code env, you can use any syntax that is supported by pip when reading a requirements file
    /localpath/to/tar.gz will work

    git+https://github.com/myrepo/lib.git will work

    https://webpath/to/tar.gz will work

    So depending on how your files are packages, hopefully one of those will work for you

    You shouldn't need to add the package specifically to the code env resources, as long as it resolves on disk, this should work out.

    Best regards,
    Nico

Setup Info
    Tags
      Help me…