Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi Team,
I have created scenario to run daily wise to run the pipeline every day. in that scenario I have created 4 steps as below.
1)step 1: to get latest data available or not in table if available pick date_time of latest date
2)step 2: set the date_time as project variable.
3)step 3: based on date_time filter the data from table and keep it in test dataset(for ML model prediction).
4)step 4: run the ML model for test data.
Here when I run the scenario, some times latest data may not exist in table, in that case date_time variable populating as null value in project variable and scenario failing at step 3.
Now I want to stop the scenario when date_time value is null and scenario should stop running instead of failing at step 3(step 1 and step 2 should run). i.e whenever data_time is null
This is easy to do by adding a conditional logic to execute your remaining scenario steps.
See this post: https://community.dataiku.com/t5/Using-Dataiku/Conditional-execute-of-scenario-step-without-steps-fa...
More info about this technique here: https://community.dataiku.com/t5/What-s-New/Want-to-Control-the-Execution-of-Scenario-Steps-With-Con...
Personally I would use row count in your dataset as it is a built-in metric. Here is a sample on how to get the row count from a dataset metric:
toNumber(filter(parseJson(stepOutput_Compute_Metrics)['Project_Key.Dataset_Name_NP']['computed'], x, x["metricId"]=="records:COUNT_RECORDS")[0].value)