Deep Learning / GPU training on Dataiku

Turribeach
Deep Learning / GPU training on Dataiku

So we wanted to test some models using Deep Learning / GPU training on Dataiku and I need to setup the GPU in Dataiku. I found this post:

https://community.dataiku.com/t5/Setup-Configuration/How-do-I-use-a-graphics-processing-unit-GPU-wit...

Which provides some guidance but it's using very old versions of CUDA and cuDNN. As we already had automated our DSS build with our cloud Linux images I didn't want to use any of the GCP images that come ready for Deep Learning. So I went ahead and created my own series of steps after working through all the issues to get everything working. Hopefully this will help someone else, assuming you have a similar environment. 

VM: GCP n1-highmem-16

GPU: nvidia-tesla-t4

Google Image: rhel-7-v20210401

OS: RHEL 7.9

Steps

vi /etc/default/grub (add the blacklist for nouveau)

 

GRUB_CMDLINE_LINUX="crashkernel=auto console=ttyS0,38400n8 elevator=noop ipv6.disable=1 modprobe.blacklist=nouveau"

 

Then run:

 

grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

 

Packages needed (RHEL PDF):

 

yum install gcc make kernel-headers kernel-devel acpid libglvnd-glx libglvnd-opengl libglvnd-devel pkgconfig --enablerepo=epel --assumeyes

 

Packages needed (Dataiku community post):

 

yum install libstdc++.i686 dkms --enablerepo=epel --assumeyes

 


Needed by the driver:

 

yum install ocl-icd libva-vdpau-driver opencl-filesystem --enablerepo=epel
yum upgrade kernel kernel-devel --assumeyes
reboot

 

Check nouveau is not loaded (should return nothing)

 

lsmod | grep nouv

 

Check output is the same:

 

uname -r
3.10.0-1127.13.1.el7.x86_64
rpm -q kernel-devel
kernel-devel-3.10.0-1127.18.2.el7.x86_64

 

Install vulkan-filesystem package and Nvidia drivers (get them off the web, use the corresponding drivers for your GPU)

 

yum --nogpgcheck install /dataiku/installers/vulkan-filesystem-1.1.97.0-1.el7.noarch.rpm --assumeyes
yum --nogpgcheck install /dataiku/installers/nvidia-diag-driver-local-repo-rhel7-410.129-1.0-1.x86_64.rpm --assumeyes
yum clean all
yum install cuda-drivers --assumeyes

 

Install CUDA 10

 

yum --nogpgcheck install /dataiku/installers/cuda-repo-rhel7-10-0-local-10.0.130-410.48-1.0-1.x86_64.rpm --assumeyes
yum install cuda --assumeyes
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

 

Install cuDNN 10.0

 

tar -zxvf /dataiku/installers/cudnn-10.0-linux-x64-v7.4.2.24.tgz
cp -P /dataiku/installers/cuda/include/cudnn*.h /usr/local/cuda/include
cp -P /dataiku/installers/cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

 

Check you can see the GPU

 

/usr/local/cuda/bin/nvcc --version
cat /proc/driver/nvidia/version
/usr/bin/nvidia-smi

 

Test on Dataiku. I used these packages:

 

tensorflow-gpu==1.15.0
keras==2.3.1
keras-preprocessing==1.1.0
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
statsmodels>=0.10,<0.11
jinja2>=2.10,<2.11
flask>=1.0,<1.1
h5py==2.10.0
pillow==6.2.2
cloudpickle>=1.3,<1.6
matplotlib==3.3.4

 

 

1 Reply
CoreyS
Dataiker Alumni

Thanks for sharing your knowledge @Turribeach

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos