Joining Dataset

Torito
Torito Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 3

Good Day

How could i joined two datasets based on a primary key and with the most close date of the row related

Example in one dataset i have info related to maintenance of a equipment for example filter replacing.

And on the second one i have the readings of hours, strokes, etc but this data is not updated daily is updated randomly so i want to get the most close date between this two datasets for the joined data.

Best Answer

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
    Answer ✓

    @Torito

    Welcome to the Dataiku community.

    One of the ideas that came to mind as I was reading your post was to use the interpolation features of Dataiku that are usually used with time series data to regularize data that come in sporadically. This may be way more than your use case requires. But it is something that DSS can do and might help your create a dataset to better understand what is going on with your data. The idea would be to calculate a daily value for your one dataset and then join it to the other. So you would not need to pick the nearest date. You would have an estimate of the values on that given date.


    https://doc.dataiku.com/dss/latest/time-series/time-series-preparation/resampling.html

    Just an idea.

Answers

  • Torito
    Torito Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 3

    so you can add a nearest date option in the recipe seem to be working fine right now.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,926 Neuron

    Use a Window recipe to get the data partitioned and filtered. On Window definitions set the partition columns to your primary key and order columns to the primary key and your date/time column descending on when the row was last updated. Then in aggregations enable Row number. Finally in Post-filter set a condition to filter by rownumber == 1 to only see the last row.

  • Torito
    Torito Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 3

    thanks i will give both of the options a try to learn more for future projects thanks

Setup Info
    Tags
      Help me…