Get to know ben_p with this User Highlight Learn More

YARN and Oozie on DDS

Dataiker
Dataiker
YARN and Oozie on DDS
Hi,

In DSS, is it possible to use YARN to view and manage node and containers allocation to the different Spark application across different nodes?
Additionally, is it possible to run Apache Oozie workflows to automate jobs?

Thank you!
0 Kudos
1 Reply
Dataiker
Dataiker
Hi,

DSS leverages Spark through its standard APIs, so you can use the "YARN" mode of Spark just as you would with any other Spark application. This means that you can set the number of containers to use, the memory allocation, whether to use dynamic allocations, YARN queues ... .and so on, by using standard Spark configuration keys (see https://spark.apache.org/docs/latest/running-on-yarn.html for details on the Spark configuration keys and https://doc.dataiku.com/dss/latest/spark/configuration.html to know how to set them in DSS)



DSS does not have a native integration with Oozie. However, Oozie has a REST API so you can use Python code in DSS to make calls to this REST API in order to trigger workflows (see https://oozie.apache.org/docs/4.0.0/WebServicesAPI.html)
0 Kudos
Labels (1)