Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Getting error while using Pyspark's Pandas UDFs in Dataiku,

Level 3
Getting error while using Pyspark's Pandas UDFs in Dataiku,

I am getting the below error when I tried to use Pandas UDFs in Dataiku's Pyspark notebook.

when I execute or df.count() or any other similar operation I am getting below error.

("df" is the output of Pandas UDF)

Py4JJavaError: An error occurred while calling o197.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 21031) ( executor 2): Cannot run program "/opt/dataiku/code-env/bin/python": error=2, No such file or directory

Can anyone please help me in resolving this issue ?


0 Kudos
1 Reply


When Spark hits an error with data frames or other methods, it returns a generic error. To investigate the cause, we will need to analyse the job diag. It will be better to handle it over a support ticket that you have already opened on this. There could be many reasons which, in many cases, are environmental, so let's handle it further over the support ticket.



0 Kudos