Recipe run on Spark from database table

gnaldi62 · June 2020

Hi,

my name is Giuseppe Naldi; I'm from Bergamo and I work for ICTeam,

(a company owned by Lutech) as Solution Architect, with an extensive experience

in subjects related to data analysis (from BI to data science and so on).

My question is rather general and is related to the way DSS performs a recipe

using Spark as engine on a dataset coming from a database table.

Namely, are the data from the table accessed by DSS and streamed from memory

to the Spark job or is the access performed directly by Spark ?

From the documentation I understand that it works as per the first way: correct ?

Thanks. Regards.

Giuseppe

fchataigner2 · June 2020

the helper code DSS provides for Spark indeed streams SQL data from the database to the (single) Spark executor via the DSS backend. If you want Spark to directly connect to the database, you need to use a spark-scala recipe in DSS to leverage the native interfaces if they exist.

Note that I can only think of Snowflake for databases integrated to Spark, and DSS already integrates with the native path in this case (https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#spark-native-integration)

gnaldi62 · June 2020

OK, it's clear. Thank you.

Giuseppe

Recipe run on Spark from database table

Best Answer

Answers

Categories

Setup Info

Tags