Join 2 datasets matching a date into an interval

UserBird Dataiker, Alpha Tester Posts: 535 Dataiker

I have to datasets that I want to join. The first dataset as a DATE column, but the second dataset as a BEGIN_DATE and END_DATE column.

I want to join rows from dataset2 into dataset1 analysis.

Is there a way to join the 2 datasets directly or a way to enrich dataset2 and add rows matching BEGIN_DATE en END_DATE?

Best regards



  • UserBird
    UserBird Dataiker, Alpha Tester Posts: 535 Dataiker

    I succeeded in doing the "join" (actually it is not, but for my data it is). The goal is to tag each record from DATASET1 for being/not being in a specific time period. Time periods are defined in DATASET2.

    I did it using Python.

    For those interested, here is the script (As i'm not a Python expert, more elegant code can be proposed):

    # -*- coding: utf-8 -*-<BR />import dataiku<BR />import pandas as pd, numpy as np<BR />from dataiku import pandasutils as pdu<BR /><BR /># Input datasets<BR />DATASET1 = dataiku.Dataset("DATASET1")<BR />DATASET1_df = histo_MAG_JF_ZV.get_dataframe()<BR />DATASET2 = dataiku.Dataset("DATASET2")<BR />DATASET2_df = soldes_prepared.get_dataframe()<BR /><BR /># Output datasets<BR />DATASET1_outuput = dataiku.Dataset("DATASET1_outuput")<BR /><BR /># Pour chaque ligne du tableau de Soldes<BR />for d in DATASET2.iter_rows():<BR /> # FOR DEBUG<BR /> #print d<BR /> <BR /> begin_date = d['begin']<BR /> end_date = d['end']<BR /> <BR /> cond1 = (DATASET1_df['Date'] >= begin_date)<BR /> cond2 = (DATASET1_df['Date'] <= end_date)<BR /> <BR /> matching_records = DATASET1_df[cond1][cond2]<BR /> <BR /> # Add begin/end date to output dataset<BR /> matching_records['begin_date'] = begin_date<BR /> matching_records['end_date'] = end_date<BR /> <BR /> DATASET1_outuput.write_with_schema(matching_records)

    Next I left join DATASET1 with DATASET1_output to recover non mathing records, as the following analysis needs both matching and non matching records.


Setup Info
      Help me…