Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a table which contains 3 million rows and 100 columns ; and I wanted to transform the data with pyspark.
I create a pyspark recipe and I import the data without problem but when I wanted to just execute easy operations like .show or count() it takes a lot of time almost 10min or more
If you have any idea why he does it like that I would be very grateful.
i get the error
Py4JJavaError: An error occurred while calling o71.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 10) (10.130.12.246 executor 2): org.apache.spark.SparkException: Error communicating with MapOutputTracker