Visual Time series model training on GPU fails

RiaanB · October 2022

Hello,

I'm getting this error whilst trying to train a time series model on GPU.

OSError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory

I have done the following so far:

1. Created a cuda 10.2 enabled base image on the DSS and pushed the base images

2. Created a code environment and added the additional packages for visual time series forecasting (cuda 10.2)

I've also tried to use docker append to add cuda-nvtx-10-2 to the base image.

USER root
# Install cuda-nvtx-10-2
RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo && \
yum install -y cuda-nvtx-10-2 && \
yum clean all
# Globally enable cuda-nvtx-10-2
ENV PATH=/usr/local/cuda-10.2/bin:${PATH} \
LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:${LD_LIBRARY_PATH}
USER dataiku

The files are installed and available, but it they're still not found when the code runs.

I've seen online that others resolved this by including the /usr/local/cuda/lib64 path to $LD_LIBRARY_PATH folder but I'm unable to do so. The ENV from the docker append doesn't seem to take effect.

Does anyone have any suggestions?

Thanks

Riaan

Operating system used: centos (cloud stack)

Sergey · October 2022

Hi @RiaanB

As you have also reported this in the support ticket, I will also reply to this here.

You will need to update LD_LIBRARY_PATH:

ENV LD_LIBRARY_PATH=/usr/local/cuda/compat:/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}

and rebuild images. We are going to fix this permanently in the upcoming releases.

Visual Time series model training on GPU fails

Tags

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

Visual Time series model training on GPU fails

Tags

Answers

Welcome!

Welcome!

Quick Links

Categories