This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Left join with empty values as one of the keys

Hi,

I have difficulties to make a left join with multiple columns as keys.

One of the columns can have empty cells (null) and it seems that the join is considering that no value can be found with the join.

For example :

Key : Columns 1 , 2 and 3.

left table : A B C D

D E Null F

on the right table:

A B C G

D E Null H

result :

A B C D G

D E Null F **Null**So Dataiku seems to not recognize an empty value as a valid key and so it is removing the values.

Is there a way to avoid this problem ?

Thanks !

Solutions shown first - Read whole discussion

5 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Jack,

What recipe engine are you using?

Also

LEFT Table | |||

Col 1 | Col 2 | Col 3 | Col 4 |

A | B | C | D |

D | E | NULL | F |

RIGHT Table | |||

Col 1 | Col 2 | Col 3 | Col 4 |

A | B | C | G |

D | E | NULL | H |

Also, are these representations of the table you are trying to get results from?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi @GCase ,

11 months after my question, here i am, sorry for the quite long delay.

I have the same problem today, but i think i will be able to illustrate it more precisely, with an project example.

Left table of the join:

Right table of the join:

Join:

Selected columns:

Result:

Why is there no data in column "E_rapporté" ?

The join should be effective as there is the same data on the left side and the right side of the join.

I am using DSS engine. Dataiku says that Spark is not available.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi @jack. This is an interest topic that I was ready to dismiss with a comment like: "one should never do joins between columns that are nullable". However, life is not perfect and because of design or because of data problems, there are cases when you need to do it.

On the subject of nulls' joins, I was able to find this article that might help to get some ideas on how to proceed: https://bertwagner.com/posts/joining-on-nulls

----

Going back to your example, the outcome you are getting is correct, or at least is the expected behavior: there were no matches between the left and right tables. Since you used "Left Join" all the rows in the left table are kept, but since there was no match with the right table, all the values of E_rapporté are NULL (empty).

Why is the expected behavior? Because in most programming languages NULL != NULL (in python you can test what happens when you do np.nan == np.nan). And this has some logic, as a missing value is not necessarily equal to another missing value. So when B is Null then left.B != right.B

I hope this helps to understand the behavior you are seeing, and that the article can help you to see how to solve the problem.

Cheers!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks @Ignacio_Toledo , i will read the article, hoping i will find something.

Concerning the example, the origin of this column is caused by data. I do builds of my projects successively and sometimes the two sides of the join on column B are empty (the whole columns).

I don't understand why there is no match between the left side and right.

The key is the content of the column A/B/C/D.

For the first line of the two tables:

on left side, this key represents the values: a1/empty/c1/d1

on right side, this key represents the values: a1/empty/c1/d1

So for me there is a match.

The alternative solution i found is to make a concatenate of the key before the join of the columns: concat(A,B,C,D), and then even if a column is totally empty on both side, the join works.

But since i have more than 200 tables with joins which can potentially have this problem, i hope there is a more *elegant* solution 😁

Thanks again

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi @jack. About this part of your message:

I don't understand why there is no match between the left side and right.

The key is the content of the column A/B/C/D.

For the first line of the two tables:

on left side, this key represents the values: a1/empty/c1/d1

on right side, this key represents the values: a1/empty/c1/d1

The problem is that while a1==a1, c1==c1 and d1==d1, empty != empty, at least not when you have a NULL in place, and that is the way that SQL databases work. I hope you had time to check the article, because there are examples on to understand why this is so.

In your case, and without knowing anything about the particularities of your data, the solution could be simple: make first a preparation recipe where you fill the null or empty rows of column B with a string like 'NA' or 'NoData', and then go with the join, that should do the trick.

Cheers!