Getting error while using Pyspark's Pandas UDFs in Dataiku,

vaishnavi · ‎10-20-2022

I am getting the below error when I tried to use Pandas UDFs in Dataiku's Pyspark notebook.

when I execute df.show() or df.count() or any other similar operation I am getting below error.

("df" is the output of Pandas UDF)

Py4JJavaError: An error occurred while calling o197.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 21031) (10.211.203.157 executor 2): java.io.IOException: Cannot run program "/opt/dataiku/code-env/bin/python": error=2, No such file or directory

Can anyone please help me in resolving this issue ?

VitaliyD · ‎10-22-2022

Hi,

When Spark hits an error with data frames or other methods, it returns a generic error. To investigate the cause, we will need to analyse the job diag. It will be better to handle it over a support ticket that you have already opened on this. There could be many reasons which, in many cases, are environmental, so let's handle it further over the support ticket.

Best,

Vitaliy

Sign up to take part

Getting error while using Pyspark's Pandas UDFs in Dataiku,

Getting error while using Pyspark's Pandas UDFs in Dataiku,