Multiple Joins to mimic VLOOKUP?
Hi everyone,
I'm new to dataiku and have a question about JOINS:
Context: I have two datasets: Activties and Goals. Activites is an uploaded excel file, with one row reflecting one activity. Each activty is linked to one or more goals. The link between activities and goals is captured in one column of the dataset. For example cell X contains "1,2,5" meaning this activity is linked to goal 1, 2 and 5. These numbers are keys in the Goals dataset, which lists the name, description and other information for each goal.
What I need: I want to merge/join the two datasets, adding information about goals to the activity dataset. To do this I can split the column in the activity dataset that indicates what goals the activity is connected to (for the example above "1,2,5" are split into three columns). Then I can use each column for a JOIN with the goals dataset. This works, but requires three separate JOINS. (I could use an OR condition in the join but that would duplicate rows)
Question: Is there more more efficient way to pull information from the goals dataset (perhaps with SQL or Python)? In excel you could use VLOOKUP.
Thanks!
Best Answers
-
I'm going to follow because I also wondered if there is any easier way than what I suggest. Maybe there is a cool join that does this in one step. :-)
Use the prepare recipe to split and fold to create the distinct values into rows.
There is the column approach, split and unfold.
Happy Dataiku'n!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
Split and Fold is the way to go.
Answers
-
Thanks!