Hello, I am in study of the Dataiku DSS. I have 3 datasets(Customer, Computer, Printer) that I am joining by customerID. Each datasets has column field called CustomerID. The Computer and Printer has the following columns i.e. Customer ID and Date when they purchased the item. I would like to know if the Customer bought a computer and a printer. I would also like to know if the Customer bought a computer or a printer. How should I do this in the Join Recipe? I would like to create a list of Customers who bought a computer and printer and another list of Customers who bought a computer or a printer,
Hello @aotearoanz and welcome to the Dataiku Community. While you wait for a more detailed response, I just wanted to make you aware of a few resources that may be able to guide you in the right direction:
Hopefully these help!
Welcome to the Dataiku Community! 😀
As you have shown the challenge, in the sample dataset it is fairly easy. The resources suggested by @CoreyS will get you started.
However, in real-world data sets, it is often the case that one customer has bought more than one computer on different dates or multiple printers all in different orders.
So depending on what you want your final data set to look like you may also need to learn a bit about the following functions as well:
These visual recipes would be helpful in dealing with the one customer to multiple transaction cases you may see in your dataset.
Thank you for your response @CoreyS and @tgb417 . The resources you have suggested deals with simple join. I couldn't find anything about joining multiple datasets. I need to understand the concept of the complex joins. Based on the example I have described. Should I use the join recipe in order to achieve this?
Thanks and have a great day!
Hi @aotearoanz ,
with the join-recipe you can join multiple data sets, you're certainly not limited to two. The first picture i attached shows how that looks with 3 datasets, just like the challenge you present in your original post. The second picture highlights the button to press to get more datasets into your join-recipe. Stitch them together with a key (customer ID), or even multiple keys as is shown in that second picture.
As i understood you need multiple outputs. The split-recipe is a nice option for that : based on one or more filters which you specify in that recipe the data is divided over new datasets.
@tgb417 made an important point concerning real-world data sets : the fact that something is not likely (somebody buying multiple computers/printers on different moments on the same day) does not mean it cannot happen. In fact, as a former employee of a big computer builder, i have seen that happening ! It would be prudent to take that possibility into account.
When i started using DSS i took some time to do the courses provided in the Academy section mentioned above. They proved to be really helpful in getting up to speed with DSS !
Kind regards, Jurre
I think the issue is that with the join recipe, you're limited to two datasets per join. That is, you can't easily join one dataset with columns from two different datasets.
The join recipe allows the user to join multiple datasets to a single dataset, provided that those datasets don't also need to join to multiple datasets, but a single dataset cannot be joined to multiple datasets.
This can be worked around, but my habit has been to just express the join using SQL whenever I need a complex join.