Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

Getting error while using Pyspark's Pandas UDFs in Dataiku,

vaishnavi
Level 2
Getting error while using Pyspark's Pandas UDFs in Dataiku,

I am getting the below error when I tried to use Pandas UDFs in Dataiku's Pyspark notebook.

when I execute df.show() or df.count() or any other similar operation I am getting below error.

("df" is the output of Pandas UDF)

Py4JJavaError: An error occurred while calling o197.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 21031) (10.211.203.157 executor 2): java.io.IOException: Cannot run program "/opt/dataiku/code-env/bin/python": error=2, No such file or directory

Can anyone please help me in resolving this issue ?

 

0 Kudos
1 Reply
VitaliyD
Dataiker

Hi,

When Spark hits an error with data frames or other methods, it returns a generic error. To investigate the cause, we will need to analyse the job diag. It will be better to handle it over a support ticket that you have already opened on this. There could be many reasons which, in many cases, are environmental, so let's handle it further over the support ticket.

Best,

Vitaliy

0 Kudos