Timeseries forecasting with GPU / cuda 11

raftersvk
raftersvk Registered Posts: 6

Hello,

I am now trying to train a model with timeseries forecast by using GPU.
OS: Ubuntu 22.04

Installed with apt-get on OS:

  • libcudnn9-cuda-11
  • cuda-toolkit-11-8
  • libnccl2

I then created a new python env :

image.png

when i use that environment in the model, I can see at first that it's fine since it shows me my GPU card :

image.png

but when I start training there is an error :

image.png



and the result when I try to execute a ML model :

image.png

Any idea how to resolve the issue ?

Operating system used: Ubuntu 22.04

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,580 Neuron

    You need to add all the additional software and drivers that NVIDIA GPUs need to work. These steps will be of course dependant on the OS / version / architecture / hardware / GPU that your server is running and it's not a trivial setup given that all the software versions need to be compatible between themselves and the GPU used and this not always clearly stated on each of the software components. Pretty much all the cloud vendors provide OS images with the software components pre-installed and configured properly to be used on GPU enabled instances so you may be able to leverage those if you are using a Cloud VM. Over 3 years ago I wrote this post which is a complete guide on all the setup steps needed to get GPU training working in a Dataiku instance running on RHEL v7.9. While this post is now outdated it will give you a rough idea of all the steps involved. It will be up to you to work out the specific steps for your required environment. Feel free to post an update when and if you get it working so other people in the Community can benefit from your experience.

  • raftersvk
    raftersvk Registered Posts: 6
    edited October 2024

    I will look into that but my first impression is that TimeSeries Forecasting hasn't been updated for a while.

    Indeed the DL algorithms proposed in the default install are based on the MxNet library which is no longer maintained since November 2023. Also it is stucked with a cuda 11.7 at max.

    this makes things even more complicated since my setup is based on CUDA 12.5 for other reasons / tools.

    any idea if its possible to work with newer version of algorithms ? libraries ?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,580 Neuron

    DL is an advanced topic which is probably not suitable to test on the free Dataiku version specially if you want the latest frameworks. Ultimately Dataiku has to targe specific framework versions to integrate with them and make sure they work. These frameworks are a moving target so you can't really expect Dataiku to always support the latest versions. Support for Pandas 2.x has only just recently been added for instance. Whether this will work or not it's hard to say. If I had to guess I would probably say CUDA 12.5 will not work but it's just a guess. You will need to involve Dataiku Support or Professional Services to get a more clear answer.

  • raftersvk
    raftersvk Registered Posts: 6

    Thank you for your answer and I will give another try with "generic" AutoML prediction to see if I can make it work.

  • cwdurand
    cwdurand Registered Posts: 1

    There's an option at the top of Requested Packages to "Add sets of packages".

    you will find options that meet your needs. I believe in order for the update to work your dataiku-dss instance must be able to connect to the link at the bottom of these sets:
    --find-links https://download.pytorch.org/whl/torch_stable.html

    That being said, I was able (after much testing) to add some packages that worked with a free version (13.3.2) and allowed deep modeling to use my GPU. It assumes that the libraries are available as I did not use the link in my Package Requests. Note that these hard requirements are based on the specificly listed libraries. It also includes some libraries that Dataiku requires for their javascript to work (I assume).

    Notes:

    • The mxnet-cu112 was not required for the GPU to process.
    • I had Cuda 12.7 and verified with nvidia-smi. Cuda 11.8 was compatable.

    typing-extensions==4.5.0 # satisfies TF 2.13.x and Torch 2.0.1
    numpy==1.23.5 # keeps MXNet 1.9.1 happy

    -------- Deep-learning libs (CUDA 11.8) ----------

    torch==2.0.1+cu118
    torchvision==0.15.2+cu118
    torchaudio==2.0.2+cu118
    tensorflow[and-cuda]==2.13.1 # GPU wheel, pulls nvidia-*-cu11 helpers

    -------- ML / stats ------------------------------

    scikit-learn==1.5.0
    scipy>=1.11,<1.12
    statsmodels
    pmdarima
    prophet
    gluonts==0.15.1
    pydantic==1.10.15 # avoids 2.x which needs newer typing-ext
    mxnet==1.9.1 # CPU wheel (simpler), or mxnet-cu112==1.9.1
    cloudpickle
    matplotlib
    opencv-python
    flask

    For Time Series Forecasting there are other requirements and so I worked to combine the Deep Modeling and Time Series requirements on the same system. Here, mxnet-cu112==1.9.1 is required

    numpy==1.23.5
    typing-extensions==4.5.0 # TF 2.13 & Torch 2.0 agree

    Deep-Learning (Keras & PyTorch, CUDA 11.8)

    torch==2.0.1+cu118
    torchvision==0.15.2+cu118
    torchaudio==2.0.2+cu118
    tensorflow[and-cuda]==2.13.1 # GPU wheel, bundles cu118 libs

    Time-Series (GluonTS + MXNet, CUDA 11.2)

    mxnet-cu112==1.9.1
    nvidia-nccl-cu11==2.19.3 # ships libnccl.so.2
    gluonts==0.15.1

    DSS “must-have” utilities

    flask
    Jinja2 # DSS UI check for GPU
    h5py # Keras saving
    pillow # image transforms

    Classical ML / stats

    scikit-learn==1.5.0
    scipy>=1.11,<1.12
    statsmodels
    pmdarima
    cloudpickle

    Good luck. It's a real pain and some things should just be available out of the box in my honest opinion.

Setup Info
    Tags
      Help me…