Joining 2 datasets on 2 variables, 1 is a row, 1 is a column.

Options
Andrew_Hua
Andrew_Hua Registered Posts: 6 ✭✭✭✭

New to Dataiku and I am being asked to join 2 datasets for flight data and to capture the destination latituded and longitudes for each flight. The first data set has the variable for the airports (airport_icao) on a column, and the other has the destinations (destinations_icao) under a column in a row. How can I inner join the two datasets on those 2 variables, without them having the same name? (Using Visual Recipes, not familiar with SQL) Also how would I rename the columns? I've provided the two datasets below. Any input is appreciated, thank you!

Answers

  • Andrew_Hua
    Andrew_Hua Registered Posts: 6 ✭✭✭✭
    Options

    *My mistake not the same name, but how can i do it with one being a row and another a column*

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Could you provide an actual sample of both datasets? I don't get the full picture of your data from the snapshots.

    Cheers

  • Andrew_Hua
    Andrew_Hua Registered Posts: 6 ✭✭✭✭
    Options

    The datasets are provided to me within a module already present in the catalogue, I'm not too sure how to send them.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    From the Dataset Explore view you can find the action "Export" to export a sample in csv or excel format. But you need to be sure that you can share the data in a public forum... If you can't share it, a screenshot with 3 or 4 rows of data (not just the column or rows names) could be enough

  • Andrew_Hua
    Andrew_Hua Registered Posts: 6 ✭✭✭✭
    Options

    The first dataset is stacked with 3 others and those other 3 have the data within them. I have provided that stack below.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Thanks, that screenshot show perfectly one of the datasets. Now I would need a kind of simar screenshot of the other dataset. What you provided before, the table with 3 rows and 5 columns, and airport_icao plus coordinates, looks like a description of a table, but is not a screenshot of the data.

  • Andrew_Hua
    Andrew_Hua Registered Posts: 6 ✭✭✭✭
    Options

    That screenshot would show all of the data I have stacked into 1. I guess i'm asking then how would I join the two descriptions of the tables (from the first two screenshots).

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Sorry, I don't think I understand the exact problem. I was expecting that you had 2 datasets:

    1) Dataset 1: is the one shown in the screenshot named "Screen Shot 2020-08-27 at 5.38.18 PM.png"

    2) Dataset 2: another dataset where you have for each "airport_icao" the corresponding longitudes and the latitudes

    By joining the Dataset 1 with the 2, you are enriching the Dataset 1 by adding two more columns with the latitude and longitude of the destination airport.

    Is this correct? If so, how does the Dataset 2 actually looks?

    If wrong, I'm sorry I can't help, because I wouldn't have the understanding of what you are trying to achieve.

    Cheers!

  • Andrew_Hua
    Andrew_Hua Registered Posts: 6 ✭✭✭✭
    Options

    My apologies for the confusion, I am still learning this myself. I believe I am being asked to inner join the table of the third screenshot with the second screenshot. "Join on destination_icao and airport_icao" are the exact instructions, which I assume is the column and row respectively. However I do not know how to inner join them since one lies in a row. If that helps. If not, I appreciate your efforts!

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Don't worry, no apologies needed!

    For what they are asking you to do (join), the dataset providing the coordinates for each airport_icao, should have a schema similar to the following table:

    entries_by_row.png

    or in another case, like this table:

    entries_by_column.png

    But what you share as your "second snapshot" or dataset, doesn't have any data to actually do a join. Is that the only other dataset you have to do the work they asked you to do?

Setup Info
    Tags
      Help me…