Pipeline fails with import error for dataiku

Options
arielma2304
arielma2304 Registered Posts: 47 ✭✭✭✭

I'm trying to run the below example on my jenkins node:

Building-a-Jenkins-pipeline-for-Dataiku-DSS

In one of the stages, I created python venv which install the requirments.txt file. I can see in the console log that dataiku-api-client was installed with version 8.0.0, but still, it fails later with the first command of:

import dataiku

The error is:

ModuleNotFoundError: No module named 'dataiku'

My python version is 3.7.4

any idea what's wrong with that?

Best Answer

  • fsergot
    fsergot Dataiker, Registered, Product Ideas Manager Posts: 117 Dataiker
    Answer ✓
    Options

    In fact, there is an alternative that should work: to not use the dataiku package, just the dataiku-client-api. That would require some change in the code provided in the article and I need to test it thoroughly.

    Here is how you can go this way:

    1. No need to load dataiku-internal-client.tar.gz, keep your original setup without it
    2. Change the code of run_bundling.py
      1. replace import dataiku with import dataikuapi
      2. replace the 2 lines dataiku.set_remote_dss(host, apiKey) & client = dataiku.api_client() with a single line client = dataikuapi.DSSClient(host, apiKey)

    I am attaching a new version of the run_bundling.py file and requirements.txt to this answer.

    Let me know if this works better.

Answers

  • fsergot
    fsergot Dataiker, Registered, Product Ideas Manager Posts: 117 Dataiker
    Options

    Hello,

    Is it possible to share the code of your pipeline and the full console log of the build?

    Thanks

  • arielma2304
    arielma2304 Registered Posts: 47 ✭✭✭✭
    edited July 17
    Options

    I can't share files, so I'll share what possible:

    the stage in Jenkinsfile:

    stage('PROJECT_VALIDATION') {
                steps {
                    withCredentials([string(credentialsId: 'DESIGN_API_KEY', variable: 'TOKEN')]) {
                    sh """
                    python -m venv ariel_vm
                    source ariel_vm/bin/activate
                    pip install -i https://{private_url} -r requirements.txt
                    python run_bundling.py '${DESIGN_HOST}' '${TOKEN}' '${DSS_PROJECT}' '${bundle_name}'
                    """
                    }
                }
    }

    The console output:

    Installing collected packages: idna, certifi, chardet, urllib3, requests, six, python-dateutil, dataiku-api-client, numpy, pytz, pandas, colorama, mando, future, radon, zipp, importlib-metadata, pluggy, py, pyparsing, packaging, toml, iniconfig, attrs, pytest
      Running setup.py install for dataiku-api-client: started
        Running setup.py install for dataiku-api-client: finished with status 'done'
      Running setup.py install for future: started
        Running setup.py install for future: finished with status 'done'
    Successfully installed attrs-20.3.0 certifi-2020.11.8 chardet-3.0.4 colorama-0.4.4 dataiku-api-client-8.0.0 future-0.18.2 idna-2.8 importlib-metadata-2.0.0 iniconfig-1.1.1 mando-0.6.4 numpy-1.19.4 packaging-20.4 pandas-0.23.4 pluggy-0.13.1 py-1.9.0 pyparsing-2.4.7 pytest-6.1.2 python-dateutil-2.8.1 pytz-2020.4 radon-4.3.2 requests-2.21.0 six-1.15.0 toml-0.10.2 urllib3-1.24.3 zipp-3.4.0
    + python run_bundling.py https://{private_url} ** DKU_TSHIRTS_3 bundle_2020-11-16_20-11-14
    Traceback (most recent call last):
      File "run_bundling.py", line 1, in <module>
        import dataiku
    ModuleNotFoundError: No module named 'dataiku'

  • fsergot
    fsergot Dataiker, Registered, Product Ideas Manager Posts: 117 Dataiker
    edited July 17
    Options

    Hello,

    Thanks for logs. In order to use the dataiku python client, you need 2 packages: dataiku & dataiku-api-client.

    In your case, the dataiku-api-client is retrieved through pip and it works but the dataiku package is missing. And this package can only be retrieved directly from your DSS instance from http(s)://DSS_HOST:DSS_PORT/public/packages/dataiku-internal-client.tar.gz (you can find the related documentation entry here).

    Can you check the content of your requirements.txt?

    In the original file, there is a line to retrieve it:

    @DESIGN_URL@/public/packages/dataiku-internal-client.tar.gz

    With a little shell trick to replace the string with the right value on-the-fly:

    sh "sed -i 's|@DESIGN_URL@|${DESIGN_URL}|' requirements.txt"

    Let me know if this helps!

    François

  • arielma2304
    arielma2304 Registered Posts: 47 ✭✭✭✭
    edited July 17
    Options

    Ok, I guess this is the issue. when I'm trying to install it manually I get:

    + pip install dataiku-internal-client.tar.gz
    Processing ./dataiku-internal-client.tar.gz
    Collecting requests>=2 (from dataiku-internal-client==7.0.0)
      Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f15b95ec860>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /simple/requests/
  • arielma2304
    arielma2304 Registered Posts: 47 ✭✭✭✭
    Options

    Now it looks better, but fails on connection issue. It looks like something internal, and not related to the code, so once it will be solved, I'll update for your solution.
    Update 11/19: It works (I had also to add to add `client._session.verify = False` to the code). Note you need to modify also the rest of the py files with this solution.

  • fsergot
    fsergot Dataiker, Registered, Product Ideas Manager Posts: 117 Dataiker
    Options

    Note that I will update the knowledge base article soon to reflect this new (and cleaner) approach.

  • arielma2304
    arielma2304 Registered Posts: 47 ✭✭✭✭
    edited July 17
    Options

    BTW, now it fails with new error in prod_activation.py :

      File "prod_activation.py", line 18, in <module>
        previous_bundle_id = project['activeBundleState']['bundleId']
    KeyError: 'activeBundleState'

  • fsergot
    fsergot Dataiker, Registered, Product Ideas Manager Posts: 117 Dataiker
    Options

    Regarding this issue, I suspect this is linked to the fact your DSS instance is using HTTPS but with an certificate not recognized by python.

    The _session of client is from the requests python library (requests.sessions.Session - https://requests.readthedocs.io/en/master/user/advanced/ )

  • fsergot
    fsergot Dataiker, Registered, Product Ideas Manager Posts: 117 Dataiker
    Options

    Hello,

    Thanks for all the feedbacks and sorry for the remaining mistakes in the sample code.

    This is a missing case in prod_activation.py: when the project has just been imported for the first time, there is no active bundle and the script does not handle well this case. The easiest way to bypass this check is to remove the sys.exit(1) at line 21. The process should work (although not nicely).

    In any case, I also attach a newer version that supports this case more properly

Setup Info
    Tags
      Help me…