Community Conundrum 28: News Engagement is live! Read More

Spark executors on k8s

Level 1
Spark executors on k8s

How can I get more executors from spark on a kubernetes attached cluster?

I see that an azure blob storage is read using one executors (as documentation says, is there a way to get round this data source reading limitation?), but after that the data is splitted into 10 partitions (default number) but only one executor is running. Spark driver should be able to ask more executors to this stage once it has 10 tasks to be done.

 

Capture.PNG 

0 Kudos
1 Reply
Dataiker
Dataiker

Hi,

The single executor reading means that DSS cannot use the direct access to Azure from Spark. There will be a warning in the logs indicating why. Very often, it is because you are using user isolation, and have not granted to the end user the right to "read details" on the connection. It could also be because you didn't enable "HDFS interface" in the Azure connection settings, or it could also be because you are trying to read a CSV file with header lines.

We cannot say more about why it doesn't use more executors after repartitioning without logs, however. Did you make sure to setup the spark.executor.instances Spark configuration key ?

If you need further assistance for that, please open a support ticket and join a job diagnosis (Job page > Actions > Download diagnosis)

0 Kudos
A banner prompting to get Dataiku DSS