Python job fails upon execution in EKS

piyushk Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Adv Designer, Registered Posts: 55 ✭✭✭✭✭


We have a python job (source: database (PostgreSQL)), it fails when ran in EKS with error as:

Waiting for logs, time elapsed: 837, status changed to: Error from server (BadRequest): container "c" in pod "dataiku-exec-python-nimlxju-sr7qc" is waiting to start: ContainerCreating

We can see, the container is getting created - "dataiku-exec-python-nimlxju-sr7qc 0/1 ContainerCreating" from CLI.

Update: We were able to run the job on EKS and it was in running state. After 1.5 mins it failed with error message:

Raw error is{"errorType":"SubProcessFailed","message":"Containerized process execution failed, return code 119","stackTrace":[]}

How can we resolve it?

We have a PySpark job with same source and it runs successfully on EKS.

Note: Number of records is ~25M



  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker


    the container being in "containercreating" means kubernetes is setting up the container (fetching the image, connecting stuff...), so it's not yet actually running the recipe. You should attach a diagnostic of the failed job, and also check via kubectl what happened (or happens) with the pod:

    kubectl logs dataiku-exec-python-nimlxju-sr7qc


    kubectl describe pod dataiku-exec-python-nimlxju-sr7qc

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer Posts: 753 Dataiker


    "return code 119" means that your container ran out of memory and was killed by Kubernetes.

    You need to increase your "memory request" and/or "memory limit" settings. Note that if you don't have a memory limit, you may also need to use larger nodes on your Kubernetes cluster.

Setup Info
      Help me…