Error with Tensorflow & GPU

Wave · April 2021

I been trying to use my GPU (RTX 3090) to run some Tenforflow models; I tried different environment also with Conda and I have installed and reinstalled a few times CUDA 10 & cuDNN7 without much success.

I do see data loading into the GPU memory but no calculation and then following error:

Failed to train : <class 'tensorflow.python.framework.errors_impl.InternalError'> : 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(100, 64), b.shape=(64, 64), m=100, n=64, k=64 [[{{node dense_2/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(100, 64), b.shape=(64, 64), m=100, n=64, k=64 [[{{node dense_2/MatMul}}]] [[Mean/_53]] 0 successful operations. 0 derived errors ignored.

I would appreciate some support to get the GPU running.

Wave · April 2021

Hi @CoreyS
, I eventually managed to fix this. It seams to be complexity with the RTX30XX cards.

In case someone else have similar issues, this is the guidance I followed:

https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-docker-or-CUDA-install-2005/

Below is a screenshot of the packages installed using a Conda environment.

In addition I had to do a manual downgrade of h5py (with pip) as by default the installation was taking a higher one which have some issues.

Screenshot 2021-04-08 at 18.22.12.png

CoreyS · April 2021

Hi, @Wave
! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

CoreyS · April 2021

Thank you for sharing this with everyone!

Error with Tensorflow & GPU

Best Answer

Answers

Categories

Setup Info

Tags