Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Multi Table Visual Joins

0 Kudos

Description

Currently, the visual join recipe allows you to join a base table (A) to a second table (B), and then to join (A) to a third table (C).

That looks something like the Issue & Suggested Fix image (ignoring the red rectangle, for a moment).

What it will not allow you to do is define a relationship between B and C in addition to the relationship between A and C, similar to what you see in the section inside the red rectangle.

 

Note that this is fundamentally different from the "Not This Scenario" image, where the first table is joined to two separate copies of DIM_MM_TM.

This creates two copies of DIM_MM_TM with different aliases, which is not the same as what we are attempting to achieve, above.

Impact

Visual clutter

This clutters up the flow zones by forcing you to add additional joins as a workaround.

The second and third join in the "Forced Workaround" image flow zone would be unnecessary if this change were implemented.

Wasted storage

This causes us to store two extra copies of the data just in this one flow zone.

This one project contains 4 flow zones, each with 1-2 unnecessary joins.

Wasted processing

Ignoring the compute_LIVE_MM_FACT_LONG, as this suggestion would not address that time, the extra join in the "Forced Workaround" caused the job to run for 1.5x as long as an equivalent SQL statement.

Suggested / Possible Fix

Either allow for a secondary join, as in the red rectangle, or allow the join from the main table (directly above the red rectangle) to incorporate fields from other datasets in the join.