How do I use a graphics processing unit (GPU) with Dataiku DSS?

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
For example, to use the plugin Deep learning for images with GPU.

Answers

  • Alex_Reutter
    Alex_Reutter Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer Posts: 105 ✭✭✭✭✭✭✭
    edited July 2024

    Prerequisites

    • A Linux server with an NVIDIA GPU. These instructions were written using CentOS 7; Ubuntu 14.04 or later is known to work as well, though installation instructions may differ.

    • Dataiku DSS installed

    Libraries version

    We aim to install the following Python libraries:

    • Keras 1.2.2 (version required in fast.ai)

    • Tensorflow-gpu 1.4.1

    To do so, we must install:

    • CUDA 8.0

    • cuDNN 6.0

    Install NVIDIA driver and the CUDA and cuDNN libraries

    Install NVIDIA driver

    Before installing the driver we need to install required libraries, using the yum package-management utility shipped with CentOS:


    sudo yum update
    sudo yum install kernel-devel kernel-headers gcc gcc-c++ make wget epel-release dkms libstdc++.i686 bzip2 python-pip
    sudo reboot

    The new kernel should now be loaded. You can verify that your kernel is the same as the installed source by comparing the results of the following two commands:


    uname -r
    rpm -q kernel-devel

    If the version numbers are not the same, you can upgrade and reboot:


    sudo yum -y upgrade kernel kernel-devel
    sudo reboot

    By default, CentOS comes with a "nouveau" driver for the GPU. You can verify it with the command


    lsmod | grep nouv

    We need to blacklist this driver in order to let the GPU use NVIDIA. Hence, you need to create a file at: /etc/modprobe.d/blacklist-nouveau.conf, containing:


    blacklist nouveau
    options nouveau modeset=0

    To enforce the blacklist, run the commands


    sudo dracut --force
    sudo reboot

    You can verify that the driver is not used anymore by re-running the command lsmod | grep nouv, which should not display anything this time.

    We can now install the NVIDIA driver with its official runfile. You need to download the appropriate runfile from the NVIDIA website. Be careful to take NVIDIA for CUDA 8.0, which is the last supported version of CUDA for Tensorflow.

    For our instance, it was the following parameters:

    nvidia-params

    ...and can be downloaded with the following command:


    wget http://us.download.nvidia.com/XFree86/Linux-x86_64/384.66/NVIDIA-Linux-x86_64-384.66.run

    First make the file executable (the filename may be slightly different according to the version you have downloaded):


    sudo chmod +x NVIDIA-Linux-x86_64-384.66.run

    Then execute it:


    ./NVIDIA-Linux-x86_64-384.66.run

    Follow the instructions in the command line installation interface, using default options.

    NVIDIA drivers are now installed! You can check that it worked by running the command nvidia-smi. That should display something like:

    nvidia-smi

    Install the CUDA library

    First you need to download the CUDA toolkit. You want to install CUDA 8.0, and make the appropriate selections for your platform:

    cuda_install

    Select the runfile installer. For our instance, the link is:


    wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run

    As before, you need to make the file executable:


    sudo chmod +x cuda_8.0.61_375.26_linux-run

    Then run this command to retrieve the installer:


    sudo ./cuda_8.0.61_375.26_linux-run --extract=$HOME

    Next, find the installer (it starts with "cuda-linux64-rel") and run it:


    sudo ./cuda-linux64-rel-8.0.61-21551265.run

    Then follow the installation steps, using default options. Remember to say yes to the question "Would you like to add desktop menu shortcuts?".

    Finally, you need to add CUDA to the PATH and create LD_LIBRARY_PATH. To do so, add the following line to your ~/.bashrc:


    export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

    Be sure to run source ~/.bashrc afterward, or to reboot your instance so that your changes are taken into account.

    You can check that CUDA is installed by running the command nvcc --version.

    Installing the cuDNN library

    First register as a member of the NVIDIA developer program.

    Then download cuDNN 6.0 for CUDA 8.0 for Linux:

    cudnn

    Then, transfer it to your instance, for instance using:


    scp -i <MY_PRIVATE_KEY_FILE> <PATH_TO_CUDNN_ARCHIVE> centos@<MY_PUBLIC_DNS_IPV4>:~

    Decompress the file:


    tar -xzf cudnn-8.0-linux-x64-v6.0.tgz

    Then copy/paste the library to your CUDA installation directory. Go to the CUDA folder created at the previous step and run the commands:


    cd cuda
    cp include/cudnn.h /usr/local/cuda/include/
    cp lib64/libcudnn* /usr/local/cuda/lib64/

    cuDNN is now installed!

    Set Up A Deep Learning Code Environment in Dataiku DSS

    If you plan to use the Deep Learning for Images plugin, follow the instructions in the Howto for installing the plugin and it will set up its own code environment.

    To create a custom code environment, follow the reference documentation to create a Python environment and install the following libraries (note the following copy and paste-able text assumes you are using conda):


    # pip
    tensorflow-gpu==1.4.1
    keras==1.2.2
    matplotlib
    Pillow
    scikit-learn
    scipy
    bcolz
    h5py

    # conda
    libgcc
    mkl-service

    Now you can use your environment from a notebook in your own DSS project. initNotebook

    You can test tensorflow-gpu from your notebook:


    from tensorflow.python.client import device_lib
    print(device_lib.list_local_devices())

    It should give you a message like this: testTensorflow

    If any GPU is listed, it means that you can use it with tensorflow. Congratulations!

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 Neuron

    Hi is there an updated guide for CUDA 9.0/cuDNN 7.0 or will this guide just work on those versions fine? Thanks!

    PS: I am on RHEL 7 so I presume the above should work fine as CentOS 7 is the open source equivalent.

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,149 ✭✭✭✭✭✭✭✭✭

    @Turribeach
    Thanks for your follow up. This depends:

    • If you want to use the plugin. The current version requires CUDA 8 because of the dependence with the python packages. So it could not work for CUDA 9
    • If you just want to install CUDA, this may work by selecting CUDA 9 instead of CUDA 8 in the various steps. We have not tested it, so you may be bit on your own. But note that this is completely out of the scope of DSS.

    In regards to Alex's directions above, they are mostly still valid except the following:

    • the run script is not the advised way on centOS7
    • We can use CUDA 9 or 10
    • In DSS deps, the UI in the code-env creation helps you pre-fill stuff

    I hope this helps!

  • Tomas
    Tomas Registered, Neuron 2022 Posts: 121 ✭✭✭✭✭✭

    Hi Alex

    thanks for the detailed guide. But how do I make sure that these steps are applied in the docker images for Containerized execution? I am using containers on EKS, the EKS has already GPU enabled workers running. Thanks.

    Tomas

    Update: never mind, I found it https://doc.dataiku.com/dss/latest/containers/custom-base-images.html#container-cuda-base-image

Setup Info
    Tags
      Help me…