Error with Tensorflow & GPU

Solved!
Wave
Level 1
Error with Tensorflow & GPU

I been trying to use my GPU (RTX 3090) to run some Tenforflow models; I tried different environment also with Conda and I have installed and reinstalled a few times CUDA 10 & cuDNN7 without much success.

I do see data loading into the GPU memory but no calculation and then following error:

Failed to train : <class 'tensorflow.python.framework.errors_impl.InternalError'> : 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(100, 64), b.shape=(64, 64), m=100, n=64, k=64 [[{{node dense_2/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(100, 64), b.shape=(64, 64), m=100, n=64, k=64 [[{{node dense_2/MatMul}}]] [[Mean/_53]] 0 successful operations. 0 derived errors ignored.

I would appreciate some support to get the GPU running.  

0 Kudos
1 Solution
Wave
Level 1
Author

Hi @CoreyS , I eventually managed to fix this. It seams to be complexity with the RTX30XX cards.

In case someone else have similar issues, this is the guidance I followed:

https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-d...

Below is a screenshot of the packages installed using a Conda environment.

In addition I had to do a manual downgrade of h5py (with pip) as by default the installation was taking a higher one which have some issues. 

Screenshot 2021-04-08 at 18.22.12.png

โ€ƒ

View solution in original post

3 Replies
CoreyS
Dataiker Alumni
Hi, @Wave! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if youโ€™ve tried any fixes already?This should lead to a quicker response from the community.
Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos
Wave
Level 1
Author

Hi @CoreyS , I eventually managed to fix this. It seams to be complexity with the RTX30XX cards.

In case someone else have similar issues, this is the guidance I followed:

https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-d...

Below is a screenshot of the packages installed using a Conda environment.

In addition I had to do a manual downgrade of h5py (with pip) as by default the installation was taking a higher one which have some issues. 

Screenshot 2021-04-08 at 18.22.12.png

โ€ƒ

CoreyS
Dataiker Alumni
Thank you for sharing this with everyone!
Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos