Support for GPU libraries for faster data preprocessing
Hi Team
Need help and advise on using Dataiku for faster data pre processing.
We have huge amount of data that needs to be pre-processed and using python data frames it is very slow. We have GPUs in the Dataiku cluster and want to use Rapidsai library however it needs Python 3.8 version whereas the Dataiku instance we have is on version 3.7 and immediate upgrade to 3.8 is not available.
[1] Is there any other library that can be used with Python 3.7 version for faster data processing using GPUs which is supported in Dataiku?
[2] Can Spark be integrated with the existing Dataiku cluster so that pyspark can be used for faster processing? What would it take to onboard spark in Dataiku instance?
[3] Does upgrade and support to Python 3.8 and above in Dataiku available ?
Thanks
Operating system used: Redhat 7.9
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Hi, you don't say what version of Dataiku you are running but support for Python 3.8, Python 3.9 and Python 3.10 in code environments was added in Version 10.0.4 - March 7th, 2022
-
Hi
Thanks for your reply.
We are using Dataiku version 11.3.1 and the python version available in the instance build is 3.7. We will look to have it upgraded to 3.8
Also can you please advise on point 1 and 2 .
Thanks
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
If you are on v11.3.2 then all you need to do is to install the Python 3.8 and Python 3.9 packages as alternative installs in your RHEL box and then you will be able to create Python 3.8 and Python 3.9 code environments in Dataiku. See Setting up Spark integration: https://doc.dataiku.com/dss/latest/spark/installation.html