DSS engine error : while joining Database Engine and Spark Configuration Issue

code_wizard · August 25

When I attempt to join datasets using the DSS engine, I encounter an error stating that the recipe cannot utilize the in-database engine and will default to the slower DSS engine instead. Additionally, it warns that the 'national_joined' dataset is not a SQL table dataset.

If I switch to the Spark engine, I receive a performance warning:

'WARN_SPARK_NON_DISTRIBUTED_READ: Performance warning: Input dataset is read in a non-distributed way. Dataset sales.base_aggregation, cause: Invalid connection configuration: This connection cannot be used directly from Spark.'

What are the necessary steps to resolve these errors and successfully join the datasets?

Turribeach · August 27

Use the Sync recipe before the join recipe to have the two datasets stored using the same Dataiku connection. When datasets are stored in different Dataiku connections Dataiku is forced to have to do the join in memory.

DSS engine error : while joining Database Engine and Spark Configuration Issue

Answers

Categories

Setup Info

Tags