Visual Time series model training on GPU fails

Tags
Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭

Hello,

I'm getting this error whilst trying to train a time series model on GPU.

OSError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory

I have done the following so far:

1. Created a cuda 10.2 enabled base image on the DSS and pushed the base images

2. Created a code environment and added the additional packages for visual time series forecasting (cuda 10.2)

I've also tried to use docker append to add cuda-nvtx-10-2 to the base image.

USER root
# Install cuda-nvtx-10-2
RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo && \
yum install -y cuda-nvtx-10-2 && \
yum clean all
# Globally enable cuda-nvtx-10-2
ENV PATH=/usr/local/cuda-10.2/bin:${PATH} \
LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:${LD_LIBRARY_PATH}
USER dataiku

The files are installed and available, but it they're still not found when the code runs.

I've seen online that others resolved this by including the /usr/local/cuda/lib64 path to $LD_LIBRARY_PATH folder but I'm unable to do so. The ENV from the docker append doesn't seem to take effect.

Does anyone have any suggestions?

Thanks

Riaan


Operating system used: centos (cloud stack)

Answers

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 365 Dataiker
    edited July 2024

    Hi @RiaanB

    As you have also reported this in the support ticket, I will also reply to this here.

    You will need to update LD_LIBRARY_PATH:

    ENV LD_LIBRARY_PATH=/usr/local/cuda/compat:/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}

    and rebuild images. We are going to fix this permanently in the upcoming releases.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.