Python job fails upon execution in EKS
Hi,
We have a python job (source: database (PostgreSQL)), it fails when ran in EKS with error as:
Waiting for logs, time elapsed: 837, status changed to: Error from server (BadRequest): container "c" in pod "dataiku-exec-python-nimlxju-sr7qc" is waiting to start: ContainerCreating
We can see, the container is getting created - "dataiku-exec-python-nimlxju-sr7qc 0/1 ContainerCreating" from CLI.
Update: We were able to run the job on EKS and it was in running state. After 1.5 mins it failed with error message:
Raw error is{"errorType":"SubProcessFailed","message":"Containerized process execution failed, return code 119","stackTrace":[]}
How can we resolve it?
We have a PySpark job with same source and it runs successfully on EKS.
Note: Number of records is ~25M
Thanks,
Answers
-
Hi,
the container being in "containercreating" means kubernetes is setting up the container (fetching the image, connecting stuff...), so it's not yet actually running the recipe. You should attach a diagnostic of the failed job, and also check via kubectl what happened (or happens) with the pod:
kubectl logs dataiku-exec-python-nimlxju-sr7qc
and
kubectl describe pod dataiku-exec-python-nimlxju-sr7qc
-
Hi,
"return code 119" means that your container ran out of memory and was killed by Kubernetes.
You need to increase your "memory request" and/or "memory limit" settings. Note that if you don't have a memory limit, you may also need to use larger nodes on your Kubernetes cluster.