Unable to connect to DSS from container

Krishna615
Level 2
Unable to connect to DSS from container

Hi All,

I am trying to setup AKS in dataiku. I have successfully attached the AKS cluster with the dataiku from clusters tab in administration.

I am facing issue while testing it at the container execution tab in Setting. Below is the error I am getting while I try to test it.

Unable to connect to DSS from container : HTTPConnectionPool(host='xx.xx.xx.xx', port=10001): Max retries exceeded with url: /dip/api/tintercom/containers/get-execution (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2ffae2fcf8>: Failed to establish a new connection: [Errno 110] Connection timed out',))

I have created the cluster from Azure portal UI. I have deployed the cluster in the same Vnet that my DSS instance is running. I have deployed DSS on Azure VM. I have also set the DSS IP in the env-set.sh file in bin folder.

I have been trying to fix it but could not. Can somebody please help me with the issue.

Thank you!

 

0 Kudos
7 Replies
fchataigner2
Dataiker

Hi,

the nodes of your AKS cluster need to be able to connect to the machine hosting DSS, which is not the case here. This points to a network setup issue, with either routes missing from the vnet or subnet where the AKS is deployed, or firewall rules on the DSS machine preventing inbound traffic from these nodes

0 Kudos
JanMigon
Level 1

So the test dataiku containers on the k8s cluster are trying to connect to the dataiku server on port 10001. Is this the only port that will be used for the cluster <--> dss_server communication or is there a range of ports that I should open so that when there is more containers running on the cluster, all of them could communicate with dss?

Thanks in advance.

0 Kudos
Krishna615
Level 2
Author

No 10001 port is not just enough for dataiku to establish connection with the containers. There are range of ports that has to opened to allow communication between containers and dss instance. I don't have the list of ports but you can find the ports in the error traceback that, is where I have observed few ports.

0 Kudos
Cumuli2024
Level 1

What was the resolution to this?  I am getting a similar error but it is on a random TCP port (for example 34757) 

 

I have DSS  nstalled on a VM on port 11000
I pushed the base images to Azure ACR
set up the cluster configuration in settings.
I created the cluster from DSS and chose the option to "Inherit DSS Host Settings"
I set the DKU_BACKEND_EXT_HOST, so it is correctly id'ing the IP address.

 

ERROR: (IP & port #'s obfuscated)

[13:56:19] [INFO] [dku.utils]  - [2024-02-04 13:56:19,233] [1/MainThread] [ERROR] [root] Could not reach DSS: HTTPConnectionPool(host='125.133.200.45', port=36752): Max retries exceeded with url: /kernel/tintercom/containers/get-execution (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f673caef240>: Failed to establish a new connection: [Errno 110] Connection timed out',)

 

0 Kudos
Young-Sang_Lee
Dataiker

DSS will open a random port in the range between ports 1024 and 65535, so the DSS host and executors on the K8 cluster can connect.

Double-check your network settings for DSS and K8S clusters

Cumuli2024
Level 1

Thanks for the response.

Do you have any suggestions on the setting to check? The Dataiku documentation is a little vague there.

By default the pods should be able to output on any port.

I had already opened the range you mentioned in iptables on the DSS host and still had the same error.

Many thanks.

 

0 Kudos

Has anyone been able to solve this? Getting the same issue on my EKS setup. What exactly needs to be changed to allow the job containers to contact the DSS server?

0 Kudos