Pipeline fails with import error for dataiku

Solved!
arielma2304
Level 3
Pipeline fails with import error for dataiku

I'm trying to run the below example on my jenkins node:

Building-a-Jenkins-pipeline-for-Dataiku-DSS

In one of the stages, I created python venv which install the requirments.txt file. I can see in the console log that dataiku-api-client was installed with version 8.0.0, but still, it fails later with the first command of: 

import dataiku

The error is:

ModuleNotFoundError: No module named 'dataiku'

My python version is 3.7.4

any idea what's wrong with that?

0 Kudos
1 Solution
fsergot
Dataiker

In fact, there is an alternative that should work: to not use the dataiku package, just the dataiku-client-api. That would require some change in the code provided in the article and I need to test it thoroughly.

Here is how you can go this way:

  1. No need to load dataiku-internal-client.tar.gz, keep your original setup without it
  2. Change the code of run_bundling.py
    1. replace import dataiku with import dataikuapi
    2. replace the 2 lines dataiku.set_remote_dss(host, apiKey) & client = dataiku.api_client() with a single line client = dataikuapi.DSSClient(host, apiKey)

I am attaching a new version of the run_bundling.py file and requirements.txt to this answer.

Let me know if this works better.

View solution in original post

0 Kudos
10 Replies
fsergot
Dataiker

Hello,

Is it possible to share the code of your pipeline and the full console log of the build?

Thanks

0 Kudos
arielma2304
Level 3
Author

I can't share files, so I'll share what possible:

the stage in Jenkinsfile:

stage('PROJECT_VALIDATION') {
            steps {
                withCredentials([string(credentialsId: 'DESIGN_API_KEY', variable: 'TOKEN')]) {
                sh """
                python -m venv ariel_vm
                source ariel_vm/bin/activate
                pip install -i https://{private_url} -r requirements.txt
                python run_bundling.py '${DESIGN_HOST}' '${TOKEN}' '${DSS_PROJECT}' '${bundle_name}'
                """
                }
            }
}

 

The console output:

Installing collected packages: idna, certifi, chardet, urllib3, requests, six, python-dateutil, dataiku-api-client, numpy, pytz, pandas, colorama, mando, future, radon, zipp, importlib-metadata, pluggy, py, pyparsing, packaging, toml, iniconfig, attrs, pytest
  Running setup.py install for dataiku-api-client: started
    Running setup.py install for dataiku-api-client: finished with status 'done'
  Running setup.py install for future: started
    Running setup.py install for future: finished with status 'done'
Successfully installed attrs-20.3.0 certifi-2020.11.8 chardet-3.0.4 colorama-0.4.4 dataiku-api-client-8.0.0 future-0.18.2 idna-2.8 importlib-metadata-2.0.0 iniconfig-1.1.1 mando-0.6.4 numpy-1.19.4 packaging-20.4 pandas-0.23.4 pluggy-0.13.1 py-1.9.0 pyparsing-2.4.7 pytest-6.1.2 python-dateutil-2.8.1 pytz-2020.4 radon-4.3.2 requests-2.21.0 six-1.15.0 toml-0.10.2 urllib3-1.24.3 zipp-3.4.0
+ python run_bundling.py https://{private_url} ** DKU_TSHIRTS_3 bundle_2020-11-16_20-11-14
Traceback (most recent call last):
  File "run_bundling.py", line 1, in <module>
    import dataiku
ModuleNotFoundError: No module named 'dataiku'

 

0 Kudos
fsergot
Dataiker

Hello,

Thanks for logs. In order to use the dataiku python client, you need 2 packages: dataiku & dataiku-api-client.

In your case, the dataiku-api-client is retrieved through pip and it works but the dataiku package is missing. And this package can only be retrieved directly from your DSS instance from http(s)://DSS_HOST:DSS_PORT/public/packages/dataiku-internal-client.tar.gz (you can find the related documentation entry here).

Can you check the content of your requirements.txt?

In the original file, there is a line to retrieve it:

@DESIGN_URL@/public/packages/dataiku-internal-client.tar.gz

With a little shell trick to replace the @DESIGN_URL@@ string with the right value on-the-fly:

sh "sed -i 's|@DESIGN_URL@|${DESIGN_URL}|' requirements.txt"

 

Let me know if this helps!

Franรงois

0 Kudos
arielma2304
Level 3
Author

Ok, I guess this is the issue. when I'm trying to install it manually I get:

+ pip install dataiku-internal-client.tar.gz
Processing ./dataiku-internal-client.tar.gz
Collecting requests>=2 (from dataiku-internal-client==7.0.0)
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f15b95ec860>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /simple/requests/
0 Kudos
fsergot
Dataiker

In fact, there is an alternative that should work: to not use the dataiku package, just the dataiku-client-api. That would require some change in the code provided in the article and I need to test it thoroughly.

Here is how you can go this way:

  1. No need to load dataiku-internal-client.tar.gz, keep your original setup without it
  2. Change the code of run_bundling.py
    1. replace import dataiku with import dataikuapi
    2. replace the 2 lines dataiku.set_remote_dss(host, apiKey) & client = dataiku.api_client() with a single line client = dataikuapi.DSSClient(host, apiKey)

I am attaching a new version of the run_bundling.py file and requirements.txt to this answer.

Let me know if this works better.

0 Kudos
arielma2304
Level 3
Author

Now it looks better, but fails on connection issue. It looks like something internal, and not related to the code, so once it will be solved, I'll update for your solution.
Update 11/19: It works (I had also to add to add `client._session.verify = False` to the code). Note you need to modify also the rest of the py files with this solution.

0 Kudos
fsergot
Dataiker

Regarding this issue, I suspect this is linked to the fact your DSS instance is using HTTPS but with an certificate not recognized by python.

The _session of client is from the requests python library (requests.sessions.Session - https://requests.readthedocs.io/en/master/user/advanced/ )

0 Kudos
fsergot
Dataiker

Note that I will update the knowledge base article soon to reflect this new (and cleaner) approach.

0 Kudos
arielma2304
Level 3
Author

BTW, now it fails with new error in prod_activation.py :

  File "prod_activation.py", line 18, in <module>
    previous_bundle_id = project['activeBundleState']['bundleId']
KeyError: 'activeBundleState'

 

0 Kudos
fsergot
Dataiker

Hello,

Thanks for all the feedbacks and sorry for the remaining mistakes in the sample code.

This is a missing case in prod_activation.py: when the project has just been imported for the first time, there is no active bundle and the script does not handle well this case. The easiest way to bypass this check is to remove the sys.exit(1) at line 21. The process should work (although not nicely).

In any case, I also attach a newer version that supports this case more properly

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku