Getting error while using Pyspark's Pandas UDFs in Dataiku,
I am getting the below error when I tried to use Pandas UDFs in Dataiku's Pyspark notebook.
when I execute df.show() or df.count() or any other similar operation I am getting below error.
("df" is the output of Pandas UDF)
Py4JJavaError: An error occurred while calling o197.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 21031) (10.211.203.157 executor 2): java.io.IOException: Cannot run program "/opt/dataiku/code-env/bin/python": error=2, No such file or directory
Can anyone please help me in resolving this issue ?
When Spark hits an error with data frames or other methods, it returns a generic error. To investigate the cause, we will need to analyse the job diag. It will be better to handle it over a support ticket that you have already opened on this. There could be many reasons which, in many cases, are environmental, so let's handle it further over the support ticket.