Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Joining 2 datasets on 2 variables, 1 is a row, 1 is a column.

Level 1
Joining 2 datasets on 2 variables, 1 is a row, 1 is a column.

New to Dataiku and I am being asked to join 2 datasets for flight data and to capture the destination latituded and longitudes for each flight. The first data set has the variable for the airports (airport_icao) on a column, and the other has the destinations (destinations_icao) under a column in a row. How can I inner join the two datasets on those 2 variables, without them having the same name? (Using Visual Recipes, not familiar with SQL) Also how would I rename the columns? I've provided the two datasets below. Any input is appreciated, thank you! 

0 Kudos
10 Replies
Level 1
Author

*My mistake not the same name, but how can i do it with one being a row and another a column*

0 Kudos
Level 5

Could you provide an actual sample of both datasets? I don't get the full picture of your data from the snapshots. 

Cheers

0 Kudos
Level 1
Author

The datasets are provided to me within a module already present in the catalogue, I'm not too sure how to send them. 

0 Kudos
Level 5

From the Dataset Explore view you can find the action "Export" to export a sample in csv or excel format. But you need to be sure that you can share the data in a public forum... If you can't share it,  a screenshot  with 3 or 4 rows of data (not just the column or rows names) could be enough

0 Kudos
Level 1
Author

The first dataset is stacked with 3 others and those other 3 have the data within them. I have provided that stack below.

0 Kudos
Level 5

Thanks, that screenshot show perfectly one of the datasets. Now I would need a kind of simar screenshot of the other dataset. What you provided before, the table with 3 rows and 5 columns, and airport_icao plus coordinates, looks like a description of a table, but is not a screenshot of the data.

0 Kudos
Level 1
Author

That screenshot would show all of the data I have stacked into 1. I guess i'm asking then how would I join the two descriptions of the tables (from the first two screenshots).

 

0 Kudos
Level 5

Sorry, I don't think I understand the exact problem. I was expecting that you had 2 datasets:

1) Dataset 1: is the one shown in the screenshot named "Screen Shot 2020-08-27 at 5.38.18 PM.png"

2) Dataset 2: another dataset where you have for each "airport_icao" the corresponding longitudes and the latitudes

By joining the Dataset 1 with the 2, you are enriching the Dataset 1 by adding two more columns with the latitude and longitude of the destination airport.

Is this correct? If so, how does the Dataset 2 actually looks?

If wrong, I'm sorry I can't help, because I wouldn't have the understanding of what you are trying to achieve.

Cheers!

 

0 Kudos
Level 1
Author

My apologies for the confusion, I am still learning this myself. I believe I am being asked to inner join the table of the third screenshot with the second screenshot. "Join on destination_icao and airport_icao" are the exact instructions, which I assume is the column and row respectively. However I do not know how to inner join them since one lies in a row. If that helps. If not, I appreciate your efforts! 

0 Kudos
Level 5

Don't worry, no apologies needed!

For what they are asking you to do (join), the dataset providing the coordinates for each airport_icao, should have a schema similar to the following table:

entries_by_row.png

 

 

 

 

or in another case, like this table:

entries_by_column.png

 

But what you share as your "second snapshot" or dataset, doesn't have any data to actually do a join. Is that the only other dataset you have to do the work they asked you to do?

0 Kudos
Labels (2)