Azure AKS deployment timeout

Solved!
daniel_adornes
Level 3
Azure AKS deployment timeout

Hi everyone!

The API Service deployment to my Kubernetes cluster is often failing with the following error:

Additional technical details: HTTP code: 500, Code: ERR_API_DEPLOYER_K8S_DEPLOYMENT_KUBECTL_FAILED, type: com.dataiku.dip.exceptions.ProcessDiedException

It takes to many minutes to finish (it seems to be a 10-minutes limit) and then fails. Stack trace:

Waiting for deployment "...." rollout to finish: 1 old replicas are pending termination...
error: deployment "..." exceeded its progress deadline.

Is there any place on DSS configs where I should increase this timeout limit?

THks!

0 Kudos
2 Solutions
fchataigner2
Dataiker

Hi,

10min is the default timeout for the `kubectl rollout...` command, but DSS doesn't offer control on it, neither by the --timeout flag nor by setting the progressDeadlineSeconds . You can still fiddle with the file deployment.yaml.template in the DSS installation dir if you need to tweak it (but that's of course unsupported)

View solution in original post

daniel_adornes
Level 3
Author

Thks @fchataigner2 !!

I found the file under /dataiku/dataiku-dss-9.0.3/resources/api-deployer/kubernetes/deployment.yaml.template

There was an attribute `initialDelaySeconds` which I changed from 600 to 1200 (same timeout config that we have in our infra).

All good!

Thank you!

View solution in original post

2 Replies
fchataigner2
Dataiker

Hi,

10min is the default timeout for the `kubectl rollout...` command, but DSS doesn't offer control on it, neither by the --timeout flag nor by setting the progressDeadlineSeconds . You can still fiddle with the file deployment.yaml.template in the DSS installation dir if you need to tweak it (but that's of course unsupported)

daniel_adornes
Level 3
Author

Thks @fchataigner2 !!

I found the file under /dataiku/dataiku-dss-9.0.3/resources/api-deployer/kubernetes/deployment.yaml.template

There was an attribute `initialDelaySeconds` which I changed from 600 to 1200 (same timeout config that we have in our infra).

All good!

Thank you!