Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
New to Dataiku and I am being asked to join 2 datasets for flight data and to capture the destination latituded and longitudes for each flight. The first data set has the variable for the airports (airport_icao) on a column, and the other has the destinations (destinations_icao) under a column in a row. How can I inner join the two datasets on those 2 variables, without them having the same name? (Using Visual Recipes, not familiar with SQL) Also how would I rename the columns? I've provided the two datasets below. Any input is appreciated, thank you!
*My mistake not the same name, but how can i do it with one being a row and another a column*
Could you provide an actual sample of both datasets? I don't get the full picture of your data from the snapshots.
Cheers
The datasets are provided to me within a module already present in the catalogue, I'm not too sure how to send them.
From the Dataset Explore view you can find the action "Export" to export a sample in csv or excel format. But you need to be sure that you can share the data in a public forum... If you can't share it, a screenshot with 3 or 4 rows of data (not just the column or rows names) could be enough
Thanks, that screenshot show perfectly one of the datasets. Now I would need a kind of simar screenshot of the other dataset. What you provided before, the table with 3 rows and 5 columns, and airport_icao plus coordinates, looks like a description of a table, but is not a screenshot of the data.
That screenshot would show all of the data I have stacked into 1. I guess i'm asking then how would I join the two descriptions of the tables (from the first two screenshots).
Sorry, I don't think I understand the exact problem. I was expecting that you had 2 datasets:
1) Dataset 1: is the one shown in the screenshot named "Screen Shot 2020-08-27 at 5.38.18 PM.png"
2) Dataset 2: another dataset where you have for each "airport_icao" the corresponding longitudes and the latitudes
By joining the Dataset 1 with the 2, you are enriching the Dataset 1 by adding two more columns with the latitude and longitude of the destination airport.
Is this correct? If so, how does the Dataset 2 actually looks?
If wrong, I'm sorry I can't help, because I wouldn't have the understanding of what you are trying to achieve.
Cheers!
My apologies for the confusion, I am still learning this myself. I believe I am being asked to inner join the table of the third screenshot with the second screenshot. "Join on destination_icao and airport_icao" are the exact instructions, which I assume is the column and row respectively. However I do not know how to inner join them since one lies in a row. If that helps. If not, I appreciate your efforts!
Don't worry, no apologies needed!
For what they are asking you to do (join), the dataset providing the coordinates for each airport_icao, should have a schema similar to the following table:
or in another case, like this table:
But what you share as your "second snapshot" or dataset, doesn't have any data to actually do a join. Is that the only other dataset you have to do the work they asked you to do?