Getting error while using Pyspark's Pandas UDFs in Dataiku,

vaishnavi Registered Posts: 40 ✭✭✭✭
edited July 16 in Using Dataiku

I am getting the below error when I tried to use Pandas UDFs in Dataiku's Pyspark notebook.

when I execute or df.count() or any other similar operation I am getting below error.

("df" is the output of Pandas UDF)

Py4JJavaError: An error occurred while calling o197.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 21031) ( executor 2): Cannot run program "/opt/dataiku/code-env/bin/python": error=2, No such file or directory

Can anyone please help me in resolving this issue ?


  • VitaliyD
    VitaliyD Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker


    When Spark hits an error with data frames or other methods, it returns a generic error. To investigate the cause, we will need to analyse the job diag. It will be better to handle it over a support ticket that you have already opened on this. There could be many reasons which, in many cases, are environmental, so let's handle it further over the support ticket.



Setup Info
      Help me…