How to spark-submit into my kubernetes cluster

DrissiReda
DrissiReda Registered Posts: 57 ✭✭✭✭✭

I have setup dataiku on a kubernetes cluster, I can submit python recipes and they're executed as kube pods using containerized execution.

Now I want to do the same with spark, I have a spark master on my kubernetes cluster. I'm able to run pods with spark-submit in client mode manually.

How can I do the same with spark-submit using a custom image from dataiku's UI?

Below is my spark configuration:


Screenshot from 2021-03-05 11-18-48.png

Any help would be appreciated, didn't find much help on how to set this particular scenario up.

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    Hi,

    if you're running with a custom image, setting the spark.kubernetes.container.image property is what you need to do. You don't seem to be using the integration of DSS with K8S clusters at all, so you "just" have to pass all the needed properties, like you would for a command line spark-submit. Note though that you'll only be able to use the client deployment mode (ie the spark driver is on the DSS machine).

    What is the error you get when you try to run with your settings.?

  • DrissiReda
    DrissiReda Registered Posts: 57 ✭✭✭✭✭

    I don't have spark-submit installed in my dss container image:

    I'm trying to build a dss container image with spark support, should this be sufficient?:

    dssadmin build-base-image --type spark --without-r

    or should I use another type,

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    that command line build the docker image for use by a spark-over-k8s, with the jars needed by DSS code in the image. This command requires DSS to have a working spark integration, of course

  • DrissiReda
    DrissiReda Registered Posts: 57 ✭✭✭✭✭

    Hello, I have managed to get it working through a shell, I use a spark-submit command, with jars from my s3 storage, it works well, is there a way to crate a custom visual recipe where a user would only have to enter the jar's path in s3 and the class name instead of all the parameters in the spark submit command?

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    if by visual recipe you mean something where the user enters a few parameters and clicks on run, then that would probably be a plugin python recipe: make a python recipe, in which you launch the spark-submit via subprocess, then convert it to a plugin recipe (it's one of the actions you have when on the recipe page).

Setup Info
    Tags
      Help me…