In DSS, is it possible to use YARN to view and manage node and containers allocation to the different Spark application across different nodes? Additionally, is it possible to run Apache Oozie workflows to automate jobs?
DSS leverages Spark through its standard APIs, so you can use the "YARN" mode of Spark just as you would with any other Spark application. This means that you can set the number of containers to use, the memory allocation, whether to use dynamic allocations, YARN queues ... .and so on, by using standard Spark configuration keys (see https://spark.apache.org/docs/latest/running-on-yarn.html for details on the Spark configuration keys and https://doc.dataiku.com/dss/latest/spark/configuration.html to know how to set them in DSS)
DSS does not have a native integration with Oozie. However, Oozie has a REST API so you can use Python code in DSS to make calls to this REST API in order to trigger workflows (see https://oozie.apache.org/docs/4.0.0/WebServicesAPI.html)