Deep Learning / GPU training on Dataiku
So we wanted to test some models using Deep Learning / GPU training on Dataiku and I need to setup the GPU in Dataiku. I found this post:
Which provides some guidance but it's using very old versions of CUDA and cuDNN. As we already had automated our DSS build with our cloud Linux images I didn't want to use any of the GCP images that come ready for Deep Learning. So I went ahead and created my own series of steps after working through all the issues to get everything working. Hopefully this will help someone else, assuming you have a similar environment.
VM: GCP n1-highmem-16
GPU: nvidia-tesla-t4
Google Image: rhel-7-v20210401
OS: RHEL 7.9
Steps
vi /etc/default/grub (add the blacklist for nouveau)
GRUB_CMDLINE_LINUX="crashkernel=auto console=ttyS0,38400n8 elevator=noop ipv6.disable=1 modprobe.blacklist=nouveau"
Then run:
grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Packages needed (RHEL PDF):
yum install gcc make kernel-headers kernel-devel acpid libglvnd-glx libglvnd-opengl libglvnd-devel pkgconfig --enablerepo=epel --assumeyes
Packages needed (Dataiku community post):
yum install libstdc++.i686 dkms --enablerepo=epel --assumeyes
Needed by the driver:
yum install ocl-icd libva-vdpau-driver opencl-filesystem --enablerepo=epel yum upgrade kernel kernel-devel --assumeyes reboot
Check nouveau is not loaded (should return nothing)
lsmod | grep nouv
Check output is the same:
uname -r 3.10.0-1127.13.1.el7.x86_64 rpm -q kernel-devel kernel-devel-3.10.0-1127.18.2.el7.x86_64
Install vulkan-filesystem package and Nvidia drivers (get them off the web, use the corresponding drivers for your GPU)
yum --nogpgcheck install /dataiku/installers/vulkan-filesystem-1.1.97.0-1.el7.noarch.rpm --assumeyes yum --nogpgcheck install /dataiku/installers/nvidia-diag-driver-local-repo-rhel7-410.129-1.0-1.x86_64.rpm --assumeyes yum clean all yum install cuda-drivers --assumeyes
Install CUDA 10
yum --nogpgcheck install /dataiku/installers/cuda-repo-rhel7-10-0-local-10.0.130-410.48-1.0-1.x86_64.rpm --assumeyes yum install cuda --assumeyes export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Install cuDNN 10.0
tar -zxvf /dataiku/installers/cudnn-10.0-linux-x64-v7.4.2.24.tgz cp -P /dataiku/installers/cuda/include/cudnn*.h /usr/local/cuda/include cp -P /dataiku/installers/cuda/lib64/libcudnn* /usr/local/cuda/lib64 chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
Check you can see the GPU
/usr/local/cuda/bin/nvcc --version cat /proc/driver/nvidia/version /usr/bin/nvidia-smi
Test on Dataiku. I used these packages:
tensorflow-gpu==1.15.0 keras==2.3.1 keras-preprocessing==1.1.0 scikit-learn>=0.20,<0.21 scipy>=1.2,<1.3 statsmodels>=0.10,<0.11 jinja2>=2.10,<2.11 flask>=1.0,<1.1 h5py==2.10.0 pillow==6.2.2 cloudpickle>=1.3,<1.6 matplotlib==3.3.4
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Thanks for sharing your knowledge @Turribeach
!