Getting error while using Pyspark's Pandas UDFs in Dataiku,

vaishnavi
vaishnavi Registered Posts: 40 ✭✭✭✭
edited July 16 in Using Dataiku

I am getting the below error when I tried to use Pandas UDFs in Dataiku's Pyspark notebook.

when I execute df.show() or df.count() or any other similar operation I am getting below error.

("df" is the output of Pandas UDF)

Py4JJavaError: An error occurred while calling o197.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 21031) (10.211.203.157 executor 2): java.io.IOException: Cannot run program "/opt/dataiku/code-env/bin/python": error=2, No such file or directory

Can anyone please help me in resolving this issue ?

Answers

  • Vitaliy
    Vitaliy Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker

    Hi,

    When Spark hits an error with data frames or other methods, it returns a generic error. To investigate the cause, we will need to analyse the job diag. It will be better to handle it over a support ticket that you have already opened on this. There could be many reasons which, in many cases, are environmental, so let's handle it further over the support ticket.

    Best,

    Vitaliy

Setup Info
    Tags
      Help me…