Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on November 5, 2024 9:42AM
Likes: 1
Replies: 3
Hi everyone,
I'm new to dataiku and have a question about JOINS:
Context: I have two datasets: Activties and Goals. Activites is an uploaded excel file, with one row reflecting one activity. Each activty is linked to one or more goals. The link between activities and goals is captured in one column of the dataset. For example cell X contains "1,2,5" meaning this activity is linked to goal 1, 2 and 5. These numbers are keys in the Goals dataset, which lists the name, description and other information for each goal.
What I need: I want to merge/join the two datasets, adding information about goals to the activity dataset. To do this I can split the column in the activity dataset that indicates what goals the activity is connected to (for the example above "1,2,5" are split into three columns). Then I can use each column for a JOIN with the goals dataset. This works, but requires three separate JOINS. (I could use an OR condition in the join but that would duplicate rows)
Question: Is there more more efficient way to pull information from the goals dataset (perhaps with SQL or Python)? In excel you could use VLOOKUP.
Thanks!
I'm going to follow because I also wondered if there is any easier way than what I suggest. Maybe there is a cool join that does this in one step. :-)
Use the prepare recipe to split and fold to create the distinct values into rows.
There is the column approach, split and unfold.
Happy Dataiku'n!
Split and Fold is the way to go.
Thanks!