Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on July 16, 2015 7:18PM
Likes: 0
Replies: 1
I succeeded in doing the "join" (actually it is not, but for my data it is). The goal is to tag each record from DATASET1 for being/not being in a specific time period. Time periods are defined in DATASET2.
I did it using Python.
For those interested, here is the script (As i'm not a Python expert, more elegant code can be proposed):
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Input datasets
DATASET1 = dataiku.Dataset("DATASET1")
DATASET1_df = histo_MAG_JF_ZV.get_dataframe()
DATASET2 = dataiku.Dataset("DATASET2")
DATASET2_df = soldes_prepared.get_dataframe()
# Output datasets
DATASET1_outuput = dataiku.Dataset("DATASET1_outuput")
# Pour chaque ligne du tableau de Soldes
for d in DATASET2.iter_rows():
# FOR DEBUG
#print d
begin_date = d['begin']
end_date = d['end']
cond1 = (DATASET1_df['Date'] >= begin_date)
cond2 = (DATASET1_df['Date'] <= end_date)
matching_records = DATASET1_df[cond1][cond2]
# Add begin/end date to output dataset
matching_records['begin_date'] = begin_date
matching_records['end_date'] = end_date
DATASET1_outuput.write_with_schema(matching_records)
Next I left join DATASET1 with DATASET1_output to recover non mathing records, as the following analysis needs both matching and non matching records.
Geoff