Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Kernel dies with Convolutions on GPU

byvalentino
Level 1
Kernel dies with Convolutions on GPU

I am running into a kernel panic each time I try a convolution on GPU.

The environment seems set correctly, GPU is available, and simple transformations run both on GPU and CPU. Convolutions run in CPU, and in GPU are killing the kernel (see image attached).

I can't find any useful error message.

How to throubleshot?

Thanks upfront for the help!

Valentino

 

torch==2.3.0
torchaudio==2.3.0
torchvision==0.18.0
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105

0 Kudos
1 Reply
NicolasD
Dataiker

Hello ๐Ÿ™‚ 

Out of memory problems are alas common when using GPUs.

Would you be able to monitor your GPU memory usage while the cell run ? For example if you can use `watch nvidia-smi` on the server where the GPU is located and observe, it would help identify an out of memory problem.

0 Kudos