Python job fails upon execution in EKS

piyushk
Level 4
Python job fails upon execution in EKS

Hi,

We have a python job (source: database (PostgreSQL)), it fails when ran in EKS with error as:

Waiting for logs, time elapsed: 837, status changed to: Error from server (BadRequest): container "c" in pod "dataiku-exec-python-nimlxju-sr7qc" is waiting to start: ContainerCreating

We can see, the container is getting created - "dataiku-exec-python-nimlxju-sr7qc 0/1 ContainerCreating" from CLI.

Update: We were able to run the job on EKS and it was in running state. After 1.5 mins it failed with error message: 

Raw error is{"errorType":"SubProcessFailed","message":"Containerized process execution failed, return code 119","stackTrace":[]}

How can we resolve it?

We have a PySpark job with same source and it runs successfully on EKS.

Note: Number of records is ~25M

Thanks,

0 Kudos
2 Replies
fchataigner2
Dataiker

Hi,

the container being in "containercreating" means kubernetes is setting up the container (fetching the image, connecting stuff...), so it's not yet actually running the recipe. You should attach a diagnostic of the failed job, and also check via kubectl what happened (or happens) with the pod:

kubectl logs dataiku-exec-python-nimlxju-sr7qc

and

kubectl describe pod dataiku-exec-python-nimlxju-sr7qc

 

0 Kudos
Clรฉment_Stenac

Hi,

"return code 119" means that your container ran out of memory and was killed by Kubernetes.

You need to increase your "memory request" and/or "memory limit" settings. Note that if you don't have a memory limit, you may also need to use larger nodes on your Kubernetes cluster.