my name is Giuseppe Naldi; I'm from Bergamo and I work for ICTeam,
(a company owned by Lutech) as Solution Architect, with an extensive experience
in subjects related to data analysis (from BI to data science and so on).
My question is rather general and is related to the way DSS performs a recipe
using Spark as engine on a dataset coming from a database table.
Namely, are the data from the table accessed by DSS and streamed from memory
to the Spark job or is the access performed directly by Spark ?
From the documentation I understand that it works as per the first way: correct ?
the helper code DSS provides for Spark indeed streams SQL data from the database to the (single) Spark executor via the DSS backend. If you want Spark to directly connect to the database, you need to use a spark-scala recipe in DSS to leverage the native interfaces if they exist.
Note that I can only think of Snowflake for databases integrated to Spark, and DSS already integrates with the native path in this case (https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#spark-native-integration)