New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

How to spark-submit into my kubernetes cluster

DrissiReda
Level 2
How to spark-submit into my kubernetes cluster

I have setup dataiku on a kubernetes cluster, I can submit python recipes and they're executed as kube pods using containerized execution.

Now I want to do the same with spark, I have a spark master on my kubernetes cluster. I'm able to run pods with spark-submit in client mode manually.

 

How can I do the same with spark-submit using a custom image from dataiku's UI?

 

Below is my spark configuration:


 

Screenshot from 2021-03-05 11-18-48.png

 

Any help would be appreciated, didn't find much help on how to set this particular scenario up.

0 Kudos
5 Replies
fchataigner2
Dataiker
Dataiker

Hi,

if you're running with a custom image, setting the spark.kubernetes.container.image property is what you need to do. You don't seem to be using the integration of DSS with K8S clusters at all, so you "just" have to pass all the needed properties, like you would for a command line spark-submit. Note though that you'll only be able to use the client deployment mode (ie the spark driver is on the DSS machine).

What is the error you get when you try to run with your settings.?

0 Kudos
DrissiReda
Level 2
Author

I don't have spark-submit installed in my dss container image:

I'm trying to build a dss container image with spark support, should this be sufficient?:

dssadmin build-base-image --type spark --without-r

or should I use another type,

 

0 Kudos
fchataigner2
Dataiker
Dataiker

that command line build the docker image for use by a spark-over-k8s, with the jars needed by DSS code in the image. This command requires DSS to have a working spark integration, of course

0 Kudos
DrissiReda
Level 2
Author

Hello, I have managed to get it working through a shell, I use a spark-submit command, with jars from my s3 storage, it works well, is there a way to crate a custom visual recipe where a user would only have to enter the jar's path in s3 and the class name instead of all the parameters in the spark submit command?

0 Kudos
fchataigner2
Dataiker
Dataiker

if by visual recipe you mean something where the user enters a few parameters and clicks on run, then that would probably be a plugin python recipe: make a python recipe, in which you launch the spark-submit via subprocess, then convert it to a plugin recipe (it's one of the actions you have when on the recipe page).

0 Kudos
A banner prompting to get Dataiku DSS