Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I am reading a dataset from our data lake. This dataset is very critical to a lot of DSS projects and recently there have been failures in the upstream pipelines that has caused the dataset to build successfully, but not have any new records in it - This without our data team noticing it apparently. So I have set up a TOP N recipe on the dataset that takes the latest date in a "created_date" column in the dataset. This column should show new records everyday with today's date - Which will indicate to me that there is indeed new data coming in. My TOP N dataset does not have any other columns than the "created_date" column and it only shows one row - So I just need to check if there is a change in that one value. Only the TOP N recipe's output is in DSS - The dataset being read is not produced by DSS but only read from the data lake.
Is there a way to set that up easily I don't seem to be able to find?
An alternative might be to compute and column todays_date and and date diff with created_date and todays_date and set a checks if that value goes above 0. But I do not seem to able to set that check up either (I know how to create the logic mentioned just not the actual check).