Join us on July 16th as we explore real-world Reinforcement Learning Learn more

DSS Livy and spark integration

Level 1
DSS Livy and spark integration

Hi,

i am trying to learn DSS.  I could use more detailed explanation than what’s avail in docs.  I haven’t integrated Spark with a remote client before,  please ELI5 for me.  Thank u.

I have a new Cloudera CDP 7.1.1 cluster, running Spark 2.4, Livy, and Hadoop 3.1.  Hadoop cluster is kerberized, and user ‘serviceA’ is already configured.

Is it possible to integrate DSS with Livy?  How? 

If Livy is not an option, I have the DSS version spark-integration already pre-installed in my DSS VM. Looking at configuration setting screen, what is it minimally that I need to enter to get DSS working with spark serviceA account?  My goal is to run interactive session.  Thanks 

0 Kudos
1 Reply
Dataiker
Dataiker

Hi Nish,

please make sure to read our doc here, specifically for secure clusters.

Also consider that CDP 7.1.1 is not supported, as stated here. It might work, however.

As you will see from the documentation I posted above, DSS will need to be configured to access the cluster by running the hadoop integration and spark integration. 

Once done, DSS should be able to offload jobs to the cluster.

To do so you don't need to integrate with Livy. What you need to do is map users and groups to hadoop users, as you want to run all your activities (in hadoop) as serviceA. This is done in Administration -> Settings -> Login (LDAP, SSO) -> User Isolation. 
This feature is only available in Enterprise license.

Take care,

Omar
Architect @ Dataiku

0 Kudos