Unable to connect to DSS from container
Hi All,
I am trying to setup AKS in dataiku. I have successfully attached the AKS cluster with the dataiku from clusters tab in administration.
I am facing issue while testing it at the container execution tab in Setting. Below is the error I am getting while I try to test it.
Unable to connect to DSS from container : HTTPConnectionPool(host='xx.xx.xx.xx', port=10001): Max retries exceeded with url: /dip/api/tintercom/containers/get-execution (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2ffae2fcf8>: Failed to establish a new connection: [Errno 110] Connection timed out',))
I have created the cluster from Azure portal UI. I have deployed the cluster in the same Vnet that my DSS instance is running. I have deployed DSS on Azure VM. I have also set the DSS IP in the env-set.sh file in bin folder.
I have been trying to fix it but could not. Can somebody please help me with the issue.
Thank you!
Answers
-
Hi,
the nodes of your AKS cluster need to be able to connect to the machine hosting DSS, which is not the case here. This points to a network setup issue, with either routes missing from the vnet or subnet where the AKS is deployed, or firewall rules on the DSS machine preventing inbound traffic from these nodes
-
So the test dataiku containers on the k8s cluster are trying to connect to the dataiku server on port 10001. Is this the only port that will be used for the cluster <--> dss_server communication or is there a range of ports that I should open so that when there is more containers running on the cluster, all of them could communicate with dss?
Thanks in advance. -
Krishna615 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 7 Partner
No 10001 port is not just enough for dataiku to establish connection with the containers. There are range of ports that has to opened to allow communication between containers and dss instance. I don't have the list of ports but you can find the ports in the error traceback that, is where I have observed few ports.
-
What was the resolution to this? I am getting a similar error but it is on a random TCP port (for example 34757)
I have DSS nstalled on a VM on port 11000
I pushed the base images to Azure ACR
I set up the cluster configuration in settings.
I created the cluster from DSS and chose the option to "Inherit DSS Host Settings"
I set the DKU_BACKEND_EXT_HOST, so it is correctly id'ing the IP address.ERROR: (IP & port #'s obfuscated)
[13:56:19] [INFO] [dku.utils] - [2024-02-04 13:56:19,233] [1/MainThread] [ERROR] [root] Could not reach DSS: HTTPConnectionPool(host='125.133.200.45', port=36752): Max retries exceeded with url: /kernel/tintercom/containers/get-execution (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f673caef240>: Failed to establish a new connection: [Errno 110] Connection timed out',)
-
DSS will open a random port in the range between ports 1024 and 65535, so the DSS host and executors on the K8 cluster can connect.
Double-check your network settings for DSS and K8S clusters -
Thanks for the response.
Do you have any suggestions on the setting to check? The Dataiku documentation is a little vague there.
By default the pods should be able to output on any port.
I had already opened the range you mentioned in iptables on the DSS host and still had the same error.
Many thanks.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
Has anyone been able to solve this? Getting the same issue on my EKS setup. What exactly needs to be changed to allow the job containers to contact the DSS server?
-
Just open the connectivity in your firewall for all the ports from your DSS instance to your K8's nodes. I have created a firewall ingress rule which says:
Source: internal_vpc_ip_ranges, kubernetes_nodes_ip_ranges
Ports: Allow ALL