Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

How to do a VLookUp or an Index Match in Dataiku

InesB
Dataiker
How to do a VLookUp or an Index Match in Dataiku

“I want to do a VLookUp or Index Match  in Dataiku, how do I get started?” 

It looks like you're attempting to connect two datasets that share common values. If you're transitioning from Excel, you might be familiar with terms like VLOOKUP or INDEX MATCH, but in Dataiku, we refer to this process as a 'Join'.

While writing VLOOKUP formulas in Excel can be challenging due to issues like inconsistent data and selecting the right lookup values, in Dataiku, it's as straightforward as creating a single visual recipe and defining parameters with just a few clicks.

What does it look like? 

InesB_9-1715961839791.png

Multiple Dataset Join

InesB_1-1715961537297.png  

Match with multiple columns and advanced conditions (contains, case…)

InesB_2-1715961562111.png

   Advanced parameters in matching

Let's delve deeper into this scenario:

Consider the following example: Suppose we are conducting churn analysis to understand why customers churn and what factors contribute to their decision. To follow along with these steps, refer to the attached "Zip" project.

In our dataset, we have pertinent information from our CRM data such as birth date, first item purchased, and gender, as well as insights from our web data including IP addresses and pages visited.

The objective of this project is twofold: firstly, to analyze the CRM data and weblogs, and secondly, to extract actionable insights from the combined datasets.

InesB_3-1715961592984.png

CRM Data

InesB_4-1715961605762.png

Web Data

How to do it? 

1.Creating a Join Recipe:

From the Flow, select your crm_stacked dataset, and then click Join with… in the Actions menu. The primary purpose of the Join recipe is to enrich one dataset with columns from another. The output of the matching depends on the join type. For our example here, we’ll keep the default join type (Left join) which allows us to keep all the rows from the left dataset and add information from the right dataset.

InesB_5-1715961631592.png

Under “Input datasets”, click the second dataset dropdown that currently says “No dataset detected”, and select the web_data dataset. 

On the right hand side, name the output dataset crm_web_data and click on create recipe. 

2.Choose the parameters:

InesB_6-1715961655926.png

On the Selected columns, preview the list of columns that will appear in the joined output dataset.

Once you’ve selected the columns you wanted to enrich your left dataset with, you can click on 'RUN' the recipe.

Once it’s successfully run, preview the output dataset. 

InesB_8-1715961704621.png

Congratulations, you’ve now successfully enrich our CRM data with our web data !

You’ve just performed the equivalent of a VlookUp or Index Match in Dataiku !  🎉🎉🎉

 

 

0 Replies

Labels

?
Labels (4)
A banner prompting to get Dataiku