DSS setup of spark on AKS
Hi everyone,
I’m currently trying to set up my DSS instance (running on a VM) to run Spark on AKS, and I'm feeling a bit lost about where to start.
Could you please guide me on where Spark should be installed? Should it be on the AKS cluster or the VM? I realize this might be a basic question, but any assistance or pointers would be greatly appreciated.
Thank you!
Operating system used: Linux
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @jonhli
,
The recommended approach is to attach an AKS cluster to DSS:
https://doc.dataiku.com/dss/latest/containers/aks/managed.html#initial-setup
Please note the network requirements specifically as DSS and AKS cluster will need to be able to communicate on all ports.
Once you follow the rest of the step build the base spark image and define spark configuration and run the spark integration you should be able to run spark jobs.
https://doc.dataiku.com/dss/latest/containers/setup-k8s.html#optional-setup-spark
Thanks -
Hello @AlexT
,
Thanks for your prompt reply, the base spark image was built and I was able to push the base image successfully. How can I test if this is working as expected in my projects? -
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
You can create a prepare recipe and run it on spark engine or a pyspark recipe.
-
I see. But is there a way to change the execution engine for all of my recipes in my project to be executed by default in spark on K8s?