Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
So we wanted to test some models using Deep Learning / GPU training on Dataiku and I need to setup the GPU in Dataiku. I found this post:
Which provides some guidance but it's using very old versions of CUDA and cuDNN. As we already had automated our DSS build with our cloud Linux images I didn't want to use any of the GCP images that come ready for Deep Learning. So I went ahead and created my own series of steps after working through all the issues to get everything working. Hopefully this will help someone else, assuming you have a similar environment.
VM: GCP n1-highmem-16
GPU: nvidia-tesla-t4
Google Image: rhel-7-v20210401
OS: RHEL 7.9
Steps
vi /etc/default/grub (add the blacklist for nouveau)
GRUB_CMDLINE_LINUX="crashkernel=auto console=ttyS0,38400n8 elevator=noop ipv6.disable=1 modprobe.blacklist=nouveau"
Then run:
grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Packages needed (RHEL PDF):
yum install gcc make kernel-headers kernel-devel acpid libglvnd-glx libglvnd-opengl libglvnd-devel pkgconfig --enablerepo=epel --assumeyes
Packages needed (Dataiku community post):
yum install libstdc++.i686 dkms --enablerepo=epel --assumeyes
Needed by the driver:
yum install ocl-icd libva-vdpau-driver opencl-filesystem --enablerepo=epel
yum upgrade kernel kernel-devel --assumeyes
reboot
Check nouveau is not loaded (should return nothing)
lsmod | grep nouv
Check output is the same:
uname -r
3.10.0-1127.13.1.el7.x86_64
rpm -q kernel-devel
kernel-devel-3.10.0-1127.18.2.el7.x86_64
Install vulkan-filesystem package and Nvidia drivers (get them off the web, use the corresponding drivers for your GPU)
yum --nogpgcheck install /dataiku/installers/vulkan-filesystem-1.1.97.0-1.el7.noarch.rpm --assumeyes
yum --nogpgcheck install /dataiku/installers/nvidia-diag-driver-local-repo-rhel7-410.129-1.0-1.x86_64.rpm --assumeyes
yum clean all
yum install cuda-drivers --assumeyes
Install CUDA 10
yum --nogpgcheck install /dataiku/installers/cuda-repo-rhel7-10-0-local-10.0.130-410.48-1.0-1.x86_64.rpm --assumeyes
yum install cuda --assumeyes
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Install cuDNN 10.0
tar -zxvf /dataiku/installers/cudnn-10.0-linux-x64-v7.4.2.24.tgz
cp -P /dataiku/installers/cuda/include/cudnn*.h /usr/local/cuda/include
cp -P /dataiku/installers/cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
Check you can see the GPU
/usr/local/cuda/bin/nvcc --version
cat /proc/driver/nvidia/version
/usr/bin/nvidia-smi
Test on Dataiku. I used these packages:
tensorflow-gpu==1.15.0
keras==2.3.1
keras-preprocessing==1.1.0
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
statsmodels>=0.10,<0.11
jinja2>=2.10,<2.11
flask>=1.0,<1.1
h5py==2.10.0
pillow==6.2.2
cloudpickle>=1.3,<1.6
matplotlib==3.3.4
Thanks for sharing your knowledge @Turribeach!