Keep only rows that has timestamp between start and stop columns from another dataset
Hi, im kinda new to DSS.
I have a huge Dataset (with 20M+ rows), this dataset has a timestamp column, and 20 other columns.
I have another Dataset with like 15 rows, this one has 3 columns start, stop and flag.
Foreach row in my first Dataset i want to check if the timestamp is between start and stop of a row in the second dataset, if it's i want to enrich the first dataset with the flag from the second. And if a row in my first Dataset isn't between any of the start and stop, i don't want to keep it.
I've done this with a Python recipe using Pandas, but im trying to do it only with DSS recipes.
I tried to do it with a Join recipe with this SQL code :
$left.timestamp>=$right.start AND $left.timestamp<=$right.stop;
but it's not working..
I found one way but it takes too much disk space and time: Do a cross join between my two Datasets and then do a filter with the value of start and stop
Can you guys suggest me a good way to do this ?
Best Answer
-
Hi Fragan,
Please use the following syntax on your join clause
"Test_dataset"."timestamp">="testa"."start" AND "Test_dataset"."timestamp"<="testa"."stop"mind the quotes around the field names.