Visual Time series model training on GPU fails

RiaanB
RiaanB Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭

Hello,

I'm getting this error whilst trying to train a time series model on GPU.

OSError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory

I have done the following so far:

1. Created a cuda 10.2 enabled base image on the DSS and pushed the base images

2. Created a code environment and added the additional packages for visual time series forecasting (cuda 10.2)

I've also tried to use docker append to add cuda-nvtx-10-2 to the base image.

USER root
# Install cuda-nvtx-10-2
RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo && \
yum install -y cuda-nvtx-10-2 && \
yum clean all
# Globally enable cuda-nvtx-10-2
ENV PATH=/usr/local/cuda-10.2/bin:${PATH} \
LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:${LD_LIBRARY_PATH}
USER dataiku

The files are installed and available, but it they're still not found when the code runs.

I've seen online that others resolved this by including the /usr/local/cuda/lib64 path to $LD_LIBRARY_PATH folder but I'm unable to do so. The ENV from the docker append doesn't seem to take effect.

Does anyone have any suggestions?

Thanks

Riaan


Operating system used: centos (cloud stack)

Tagged:

Answers

  • Sergey
    Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
    edited July 17

    Hi @RiaanB

    As you have also reported this in the support ticket, I will also reply to this here.

    You will need to update LD_LIBRARY_PATH:

    ENV LD_LIBRARY_PATH=/usr/local/cuda/compat:/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
    

    and rebuild images. We are going to fix this permanently in the upcoming releases.

Setup Info
    Tags
      Help me…