Visual Join Recipe: Join on a Constant condition not a column value in the other dataset

0 Kudos

User Story:

As an analyst who is selectively joining data through the visual prepare recipe, I'd like to be able to add a constant as one of the join conditions, so that I can make more selective joins through the visual prepare recipes without having to write SQL from scratch.

Conditions of Satisfaction (COS):

  • Well documented
  • Warning if only a constant condition is used in order to avoid Cartesian product results.

Notes:

Today you can join on a column value coming from either dataset.  But you can not pick a constant.  I have a use case where a join has to be done only if a value in a column is equal to a certain constant value and that value does not appear in the table being joined.  For example I can not model the following SQL in the visual Join recipe (without writing a minimally documented advanced join).

Select *
From t_customer as tc
left join t_address as ta
      on tc.customer_no = ta.customer_no
          and ta.address_type = 'primary_address'


In a very real case like this for me, the constant is being used because the database has other address types and we want all customer records.  We do not want to duplicate customer records by adding result records for different address types like 'home_address'. Finally even if we do not have a primary address defined.  We do want all of the customer records.  This kind of join can not be done with a where clause.

I do note that one can use a "custom SQL condition" in a visual prepare recipe.  But it is hard for the "analyst" persona to figure out how to use this more advanced setup.  For example to do the above join, one would use the following condition.  Assuming that t_customer is $left and t_address is $right:

$left."customer_no" = $right."customer_no"
and $left."address_type" = 'primary_address'

There appears to be little or no documentation on the "Advanced join" other than this comment in the knowledge base

Advanced join

Provides custom options for row selection and deduplication for when none of the other options are suitable.


And these some what cryptic notes on the actual dialog box for custom join conditions.

Write a custom SQL condition. It will be copied in the request as:

SELECT (...) FROM (...) JOIN (...) ON <your_code_here>
You should use $left and $right to refer to both datasets.
The $left and $right tokens will be replaced by the corresponding quoted
table names.

It is nice that we can do this.  This is a request for at least better documentation of this advanced feature.  But preferably and option to use a constant rather than a table for the right part of the join conditions.

--Tom
6 Comments

After further research on this topic.  I note that a "Pre Filter" might work for this use case.  In this case one would pre-filter the right table t_address to only permit joins on records for which address_type = 'primary_address'.  Still there was a bunch or research needed to come to this conclusion. 

However, if you are trying to do the constant from the left table in a left join. This approach would not work.

--Tom

After further research on this topic.  I note that a "Pre Filter" might work for this use case.  In this case one would pre-filter the right table t_address to only permit joins on records for which address_type = 'primary_address'.  Still there was a bunch or research needed to come to this conclusion. 

However, if you are trying to do the constant from the left table in a left join. This approach would not work.

I am able to pre-filter on both sides of the join in a Join recipe. 

I am able to pre-filter on both sides of the join in a Join recipe. 

@Turribeach 

The case that this would be "required" is for example when you need to do a left join and select which records to do the left join on from the left table.

So maybe this is a very small case.  But, I'll leave it here for now.

 

--Tom

@Turribeach 

The case that this would be "required" is for example when you need to do a left join and select which records to do the left join on from the left table.

So maybe this is a very small case.  But, I'll leave it here for now.

 

That's what I mean Tom, the join recipe pre-filter allows you to pre-filter any of the joined tables by any condition. So I don't how you wouldn't be able to add a predicate to any join table like address_type = 'primary_address'. Is there any issues as to why you can't do the pre-filter on the left join table?

That's what I mean Tom, the join recipe pre-filter allows you to pre-filter any of the joined tables by any condition. So I don't how you wouldn't be able to add a predicate to any join table like address_type = 'primary_address'. Is there any issues as to why you can't do the pre-filter on the left join table?

If the value is in the left table and you want to keep all of the records from the left table.

--Tom

If the value is in the left table and you want to keep all of the records from the left table.

Ok I got you now. 

Ok I got you now.