Pipeline fails with import error for dataiku
I'm trying to run the below example on my jenkins node:
Building-a-Jenkins-pipeline-for-Dataiku-DSS
In one of the stages, I created python venv which install the requirments.txt file. I can see in the console log that dataiku-api-client was installed with version 8.0.0, but still, it fails later with the first command of:
import dataiku
The error is:
ModuleNotFoundError: No module named 'dataiku'
My python version is 3.7.4
any idea what's wrong with that?
Best Answer
-
In fact, there is an alternative that should work: to not use the dataiku package, just the dataiku-client-api. That would require some change in the code provided in the article and I need to test it thoroughly.
Here is how you can go this way:
- No need to load dataiku-internal-client.tar.gz, keep your original setup without it
- Change the code of run_bundling.py
- replace import dataiku with import dataikuapi
- replace the 2 lines dataiku.set_remote_dss(host, apiKey) & client = dataiku.api_client() with a single line client = dataikuapi.DSSClient(host, apiKey)
I am attaching a new version of the run_bundling.py file and requirements.txt to this answer.
Let me know if this works better.
Answers
-
Hello,
Is it possible to share the code of your pipeline and the full console log of the build?
Thanks
-
I can't share files, so I'll share what possible:
the stage in Jenkinsfile:
stage('PROJECT_VALIDATION') { steps { withCredentials([string(credentialsId: 'DESIGN_API_KEY', variable: 'TOKEN')]) { sh """ python -m venv ariel_vm source ariel_vm/bin/activate pip install -i https://{private_url} -r requirements.txt python run_bundling.py '${DESIGN_HOST}' '${TOKEN}' '${DSS_PROJECT}' '${bundle_name}' """ } } }
The console output:
Installing collected packages: idna, certifi, chardet, urllib3, requests, six, python-dateutil, dataiku-api-client, numpy, pytz, pandas, colorama, mando, future, radon, zipp, importlib-metadata, pluggy, py, pyparsing, packaging, toml, iniconfig, attrs, pytest Running setup.py install for dataiku-api-client: started Running setup.py install for dataiku-api-client: finished with status 'done' Running setup.py install for future: started Running setup.py install for future: finished with status 'done' Successfully installed attrs-20.3.0 certifi-2020.11.8 chardet-3.0.4 colorama-0.4.4 dataiku-api-client-8.0.0 future-0.18.2 idna-2.8 importlib-metadata-2.0.0 iniconfig-1.1.1 mando-0.6.4 numpy-1.19.4 packaging-20.4 pandas-0.23.4 pluggy-0.13.1 py-1.9.0 pyparsing-2.4.7 pytest-6.1.2 python-dateutil-2.8.1 pytz-2020.4 radon-4.3.2 requests-2.21.0 six-1.15.0 toml-0.10.2 urllib3-1.24.3 zipp-3.4.0 + python run_bundling.py https://{private_url} ** DKU_TSHIRTS_3 bundle_2020-11-16_20-11-14 Traceback (most recent call last): File "run_bundling.py", line 1, in <module> import dataiku ModuleNotFoundError: No module named 'dataiku'
-
Hello,
Thanks for logs. In order to use the dataiku python client, you need 2 packages: dataiku & dataiku-api-client.
In your case, the dataiku-api-client is retrieved through pip and it works but the dataiku package is missing. And this package can only be retrieved directly from your DSS instance from http(s)://DSS_HOST:DSS_PORT/public/packages/dataiku-internal-client.tar.gz (you can find the related documentation entry here).
Can you check the content of your requirements.txt?
In the original file, there is a line to retrieve it:
@DESIGN_URL@/public/packages/dataiku-internal-client.tar.gz
With a little shell trick to replace the string with the right value on-the-fly:
sh "sed -i 's|@DESIGN_URL@|${DESIGN_URL}|' requirements.txt"
Let me know if this helps!
François
-
Ok, I guess this is the issue. when I'm trying to install it manually I get:
+ pip install dataiku-internal-client.tar.gz Processing ./dataiku-internal-client.tar.gz Collecting requests>=2 (from dataiku-internal-client==7.0.0) Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f15b95ec860>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /simple/requests/
-
Now it looks better, but fails on connection issue. It looks like something internal, and not related to the code, so once it will be solved, I'll update for your solution.
Update 11/19: It works (I had also to add to add `client._session.verify = False` to the code). Note you need to modify also the rest of the py files with this solution. -
Note that I will update the knowledge base article soon to reflect this new (and cleaner) approach.
-
BTW, now it fails with new error in prod_activation.py :
File "prod_activation.py", line 18, in <module> previous_bundle_id = project['activeBundleState']['bundleId'] KeyError: 'activeBundleState'
-
Regarding this issue, I suspect this is linked to the fact your DSS instance is using HTTPS but with an certificate not recognized by python.
The _session of client is from the requests python library (requests.sessions.Session - https://requests.readthedocs.io/en/master/user/advanced/ )
-
Hello,
Thanks for all the feedbacks and sorry for the remaining mistakes in the sample code.
This is a missing case in prod_activation.py: when the project has just been imported for the first time, there is no active bundle and the script does not handle well this case. The easiest way to bypass this check is to remove the sys.exit(1) at line 21. The process should work (although not nicely).
In any case, I also attach a newer version that supports this case more properly