Recipe run on Spark from database table

Solved!
gnaldi62
Recipe run on Spark from database table

Hi,

  my name is Giuseppe Naldi; I'm from Bergamo and I work for ICTeam,

  (a company owned by Lutech) as Solution Architect, with an extensive experience

  in subjects related to data analysis (from BI to data science and so on).

  My question is rather general and is related to the way DSS performs a recipe

  using Spark as engine on a dataset coming from a database table.

  Namely, are the data from the table accessed by DSS and streamed from memory

  to the  Spark job or is the access performed directly by Spark ? 

  From the documentation I understand that it works as per the first way: correct ?

  Thanks. Regards.

Giuseppe

0 Kudos
1 Solution
fchataigner2
Dataiker

the helper code DSS provides for Spark indeed streams SQL data from the database to the (single) Spark executor via the DSS backend. If you want Spark to directly connect to the database, you need to use a spark-scala recipe in DSS to leverage the native interfaces if they exist.

Note that I can only think of Snowflake for databases integrated to Spark, and DSS already integrates with the native path in this case (https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#spark-native-integration)

View solution in original post

0 Kudos
2 Replies
fchataigner2
Dataiker

the helper code DSS provides for Spark indeed streams SQL data from the database to the (single) Spark executor via the DSS backend. If you want Spark to directly connect to the database, you need to use a spark-scala recipe in DSS to leverage the native interfaces if they exist.

Note that I can only think of Snowflake for databases integrated to Spark, and DSS already integrates with the native path in this case (https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#spark-native-integration)

0 Kudos
gnaldi62
Author

OK, it's clear. Thank you. 

Giuseppe

0 Kudos