Joining 2 datasets on 2 variables, 1 is a row, 1 is a column.
New to Dataiku and I am being asked to join 2 datasets for flight data and to capture the destination latituded and longitudes for each flight. The first data set has the variable for the airports (airport_icao) on a column, and the other has the destinations (destinations_icao) under a column in a row. How can I inner join the two datasets on those 2 variables, without them having the same name? (Using Visual Recipes, not familiar with SQL) Also how would I rename the columns? I've provided the two datasets below. Any input is appreciated, thank you!
Answers
-
*My mistake not the same name, but how can i do it with one being a row and another a column*
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Could you provide an actual sample of both datasets? I don't get the full picture of your data from the snapshots.
Cheers
-
The datasets are provided to me within a module already present in the catalogue, I'm not too sure how to send them.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
From the Dataset Explore view you can find the action "Export" to export a sample in csv or excel format. But you need to be sure that you can share the data in a public forum... If you can't share it, a screenshot with 3 or 4 rows of data (not just the column or rows names) could be enough
-
The first dataset is stacked with 3 others and those other 3 have the data within them. I have provided that stack below.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Thanks, that screenshot show perfectly one of the datasets. Now I would need a kind of simar screenshot of the other dataset. What you provided before, the table with 3 rows and 5 columns, and airport_icao plus coordinates, looks like a description of a table, but is not a screenshot of the data.
-
That screenshot would show all of the data I have stacked into 1. I guess i'm asking then how would I join the two descriptions of the tables (from the first two screenshots).
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Sorry, I don't think I understand the exact problem. I was expecting that you had 2 datasets:
1) Dataset 1: is the one shown in the screenshot named "Screen Shot 2020-08-27 at 5.38.18 PM.png"
2) Dataset 2: another dataset where you have for each "airport_icao" the corresponding longitudes and the latitudes
By joining the Dataset 1 with the 2, you are enriching the Dataset 1 by adding two more columns with the latitude and longitude of the destination airport.
Is this correct? If so, how does the Dataset 2 actually looks?
If wrong, I'm sorry I can't help, because I wouldn't have the understanding of what you are trying to achieve.
Cheers!
-
My apologies for the confusion, I am still learning this myself. I believe I am being asked to inner join the table of the third screenshot with the second screenshot. "Join on destination_icao and airport_icao" are the exact instructions, which I assume is the column and row respectively. However I do not know how to inner join them since one lies in a row. If that helps. If not, I appreciate your efforts!
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Don't worry, no apologies needed!
For what they are asking you to do (join), the dataset providing the coordinates for each airport_icao, should have a schema similar to the following table:
or in another case, like this table:
But what you share as your "second snapshot" or dataset, doesn't have any data to actually do a join. Is that the only other dataset you have to do the work they asked you to do?