Python job fails upon execution in EKS

Options
piyushk
piyushk Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Adv Designer, Registered Posts: 55 ✭✭✭✭✭

Hi,

We have a python job (source: database (PostgreSQL)), it fails when ran in EKS with error as:

Waiting for logs, time elapsed: 837, status changed to: Error from server (BadRequest): container "c" in pod "dataiku-exec-python-nimlxju-sr7qc" is waiting to start: ContainerCreating

We can see, the container is getting created - "dataiku-exec-python-nimlxju-sr7qc 0/1 ContainerCreating" from CLI.

Update: We were able to run the job on EKS and it was in running state. After 1.5 mins it failed with error message:

Raw error is{"errorType":"SubProcessFailed","message":"Containerized process execution failed, return code 119","stackTrace":[]}

How can we resolve it?

We have a PySpark job with same source and it runs successfully on EKS.

Note: Number of records is ~25M

Thanks,

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    the container being in "containercreating" means kubernetes is setting up the container (fetching the image, connecting stuff...), so it's not yet actually running the recipe. You should attach a diagnostic of the failed job, and also check via kubectl what happened (or happens) with the pod:

    kubectl logs dataiku-exec-python-nimlxju-sr7qc

    and

    kubectl describe pod dataiku-exec-python-nimlxju-sr7qc

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer Posts: 753 Dataiker
    Options

    Hi,

    "return code 119" means that your container ran out of memory and was killed by Kubernetes.

    You need to increase your "memory request" and/or "memory limit" settings. Note that if you don't have a memory limit, you may also need to use larger nodes on your Kubernetes cluster.

Setup Info
    Tags
      Help me…