DSS engine error : while joining Database Engine and Spark Configuration Issue
When I attempt to join datasets using the DSS engine, I encounter an error stating that the recipe cannot utilize the in-database engine and will default to the slower DSS engine instead. Additionally, it warns that the 'national_joined' dataset is not a SQL table dataset.
If I switch to the Spark engine, I receive a performance warning:
'WARN_SPARK_NON_DISTRIBUTED_READ: Performance warning: Input dataset is read in a non-distributed way. Dataset sales.base_aggregation, cause: Invalid connection configuration: This connection cannot be used directly from Spark.'
What are the necessary steps to resolve these errors and successfully join the datasets?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,167 Neuron
Use the Sync recipe before the join recipe to have the two datasets stored using the same Dataiku connection. When datasets are stored in different Dataiku connections Dataiku is forced to have to do the join in memory.