Join us on July 16th as we explore real-world Reinforcement Learning Learn more

Recipe run on Spark from database table

Level 2
Recipe run on Spark from database table

Hi,

  my name is Giuseppe Naldi; I'm from Bergamo and I work for ICTeam,

  (a company owned by Lutech) as Solution Architect, with an extensive experience

  in subjects related to data analysis (from BI to data science and so on).

  My question is rather general and is related to the way DSS performs a recipe

  using Spark as engine on a dataset coming from a database table.

  Namely, are the data from the table accessed by DSS and streamed from memory

  to the  Spark job or is the access performed directly by Spark ? 

  From the documentation I understand that it works as per the first way: correct ?

  Thanks. Regards.

Giuseppe

0 Kudos
2 Replies
Dataiker
Dataiker

the helper code DSS provides for Spark indeed streams SQL data from the database to the (single) Spark executor via the DSS backend. If you want Spark to directly connect to the database, you need to use a spark-scala recipe in DSS to leverage the native interfaces if they exist.

Note that I can only think of Snowflake for databases integrated to Spark, and DSS already integrates with the native path in this case (https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#spark-native-integration)

0 Kudos
Level 2
Author

OK, it's clear. Thank you. 

Giuseppe

0 Kudos