Join recipe with Multiple datasets
Hello, I am in study of the Dataiku DSS. I have 3 datasets(Customer, Computer, Printer) that I am joining by customerID. Each datasets has column field called CustomerID. The Computer and Printer has the following columns i.e. Customer ID and Date when they purchased the item. I would like to know if the Customer bought a computer and a printer. I would also like to know if the Customer bought a computer or a printer. How should I do this in the Join Recipe? I would like to create a list of Customers who bought a computer and printer and another list of Customers who bought a computer or a printer,
Thanks
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hello @aotearoanz
and welcome to the Dataiku Community. While you wait for a more detailed response, I just wanted to make you aware of a few resources that may be able to guide you in the right direction:- Join: joining datasets (Documentation)
- Concept: Join Recipe (Dataiku Academy)
- Concept: Join Recipe (Dataiku Knowledge)
Hopefully these help!
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Welcome to the Dataiku Community!
As you have shown the challenge, in the sample dataset it is fairly easy. The resources suggested by @CoreyS
will get you started.However, in real-world data sets, it is often the case that one customer has bought more than one computer on different dates or multiple printers all in different orders.
So depending on what you want your final data set to look like you may also need to learn a bit about the following functions as well:
- Group by Recipe (academy)
- Grouping: aggregating data (documentation)
- Window Recipe (academy)
- Window: analytics functions (documentation)
These visual recipes would be helpful in dealing with the one customer to multiple transaction cases you may see in your dataset.
-
Thank you for your response @CoreyS
and @tgb417
. The resources you have suggested deals with simple join. I couldn't find anything about joining multiple datasets. I need to understand the concept of the complex joins. Based on the example I have described. Should I use the join recipe in order to achieve this?Thanks and have a great day!
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi @aotearoanz
,with the join-recipe you can join multiple data sets, you're certainly not limited to two. The first picture i attached shows how that looks with 3 datasets, just like the challenge you present in your original post. The second picture highlights the button to press to get more datasets into your join-recipe. Stitch them together with a key (customer ID), or even multiple keys as is shown in that second picture.
As i understood you need multiple outputs. The split-recipe is a nice option for that : based on one or more filters which you specify in that recipe the data is divided over new datasets.
@tgb417
made an important point concerning real-world data sets : the fact that something is not likely (somebody buying multiple computers/printers on different moments on the same day) does not mean it cannot happen. In fact, as a former employee of a big computer builder, i have seen that happening ! It would be prudent to take that possibility into account.When i started using DSS i took some time to do the courses provided in the Academy section mentioned above. They proved to be really helpful in getting up to speed with DSS !
Kind regards, Jurre
-
I think the issue is that with the join recipe, you're limited to two datasets per join. That is, you can't easily join one dataset with columns from two different datasets.
The join recipe allows the user to join multiple datasets to a single dataset, provided that those datasets don't also need to join to multiple datasets, but a single dataset cannot be joined to multiple datasets.
This can be worked around, but my habit has been to just express the join using SQL whenever I need a complex join.
-
Hello everybody,
I am stuck on a phase about join recipe. Here is my case :I built a join recipe between Datasets DS1 and DS2, with join conditions between DS1 and DS2.
Now I would like to add a new Dataset DS3 and put conditions between DS2 and DS3. I cannot do it because the interface proposes joins between DS3 and DS1. The input DataSet (D1) is just displayed and there is no dropdown list to change the dataset. I do not know how to change the datasets to joins, except doing a Custom Join.
Is there a graphical way to do it ? My version of Dataiku is 9.0.1
Thank you in advance for your help.
Have a good day.
Best regards,Jean-Luc.
-
Hello everybody,
There is no ADD INPUT button in my join receipt
My version of Dataiku is 12
any way to, owercome this ?
Thanks
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,993 Neuron
I have replied to you in your own thread.