Support for GPU libraries for faster data preprocessing

Shubhjeet · ‎06-23-2023

Hi Team

Need help and advise on using Dataiku for faster data pre processing.

We have huge amount of data that needs to be pre-processed and using python data frames it is very slow. We have GPUs in the Dataiku cluster and want to use Rapidsai library however it needs Python 3.8 version whereas the Dataiku instance we have is on version 3.7 and immediate upgrade to 3.8 is not available.

[1] Is there any other library that can be used with Python 3.7 version for faster data processing using GPUs which is supported in Dataiku?

[2] Can Spark be integrated with the existing Dataiku cluster so that pyspark can be used for faster processing? What would it take to onboard spark in Dataiku instance?

[3] Does upgrade and support to Python 3.8 and above in Dataiku available ?

Thanks

Operating system used: Redhat 7.9

Turribeach · ‎06-23-2023

Hi, you don't say what version of Dataiku you are running but support for Python 3.8, Python 3.9 and Python 3.10 in code environments was added in Version 10.0.4 - March 7th, 2022

Shubhjeet · ‎06-23-2023

Hi

Thanks for your reply.

We are using Dataiku version 11.3.1 and the python version available in the instance build is 3.7. We will look to have it upgraded to 3.8

Also can you please advise on point 1 and 2 .

Thanks

Turribeach · ‎06-23-2023

If you are on v11.3.2 then all you need to do is to install the Python 3.8 and Python 3.9 packages as alternative installs in your RHEL box and then you will be able to create Python 3.8 and Python 3.9 code environments in Dataiku. See Setting up Spark integration: https://doc.dataiku.com/dss/latest/spark/installation.html

Sign up to take part

Support for GPU libraries for faster data preprocessing

Support for GPU libraries for faster data preprocessing

Setup info