Install Dataiku Plugin code environment without Internet (Local packages)
Hi Dataiku Community,
I have a bunch of Dataiku store plugins like Text Preparation, Text Summarization, Named Entity Recognition, Deep Learning Image (to name a few) to set up and test in the Designer node server.
This server has no internet connection.
I understand that code environments are isolated and does not use package at the system level (Redhat 8.6). How does one install the required python package dependencies manually for the plugin? Do we first create the code environment, and then associate this code environment to the plugin on the Plugins page? If so, can this be done through Dataiku GUI? If have to do through linux command, are there documentations to follow?
Basically, for each plugin, if I have a requirements.txt file of the packages, I have to manually download them in a folder (for a computer with internet), then move these packages into the Linux (no internet), and how can I then associate these packages with the particular code environment?
Many thanks in advance.
Operating system used: Linux
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,913 Neuron
Have a look at this section of the manual that shows different options on how to install Python packages on a machine with limited connectivity:
https://doc.dataiku.com/dss/latest/code-envs/custom-options.html
pip settings can be set for the whole DSS instance under Administration => Settings => Other => Misc.
With regular (non-plugin) Python code environments DSS allows you to download a whole environment from one server and import it on another DSS instance so you could have a DSS machine built in the Cloud for instance where you can have internet connectivity, create your code environments there and then download them to import them in your restricted internet DSS instance.
Having said that, and this is something I have been asking myself for a while, for some reason Dataiku does not allow Admins to download or import Plugin code environments. Perhaps someone in Dataiku can answer why this is not possible? I was able to find a work around by following a hacky process though. I installed the plugin on a machine with internet and let the plugin create the plugin managed code env. I then installed the plugin on another DSS instance without internet (plugins can be download as zip fileS) and I skipped the code en creation. I then create a manually managed code env for the plugin and I uploaded a manually zipped code env from the other machine into that unmanaged directory (under ./code-envs/python in your DATA_DIR). This did work but seems like a shady process to me...
-
SAURABH Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 26 Partner
Hi @victor_toh
this is a manual approach from backedn
first import your plugin from the front end and create a code env for the same.
now from backend move you need to go to the specific code env folder which has been created for the plugin from that move into the bin folder and run the below command
"Source activate"
Once done with this you can install the required package which you have placed in linux box with the command
pip3 install "path where you have kept the downloaded packages"
for example : pip3 install /data/dataiku/package_file