Support for GPU libraries for faster data preprocessing

Shubhjeet Registered Posts: 3

Hi Team

Need help and advise on using Dataiku for faster data pre processing.

We have huge amount of data that needs to be pre-processed and using python data frames it is very slow. We have GPUs in the Dataiku cluster and want to use Rapidsai library however it needs Python 3.8 version whereas the Dataiku instance we have is on version 3.7 and immediate upgrade to 3.8 is not available.

[1] Is there any other library that can be used with Python 3.7 version for faster data processing using GPUs which is supported in Dataiku?

[2] Can Spark be integrated with the existing Dataiku cluster so that pyspark can be used for faster processing? What would it take to onboard spark in Dataiku instance?

[3] Does upgrade and support to Python 3.8 and above in Dataiku available ?


Operating system used: Redhat 7.9



Setup Info
      Help me…