Time serie issue - Time column 'SHIP_DT' has missing values with frequency

Tags
Registered Posts: 2 ✭✭✭

Hello,

I keep getting this error. And the sugestion "You can use the Time Series Preparation plugin to resample your time column." Which is something I did of course.

I can see that I have continuous dates and all the cells are filled.

[2022/03/10-16:04:31.045] [null-out-105] [INFO] [dku.utils]  - *************** Recipe code failed **************
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  - Begin Python stack
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  - Traceback (most recent call last):
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/container/exec_py_recipe.py", line 19, in <module>
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  -     exec(fd.read())
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "<string>", line 39, in <module>
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 61, in prepare_timeseries_dataframe
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -     self._check_regular_frequency(dataframe_prepared)
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 129, in _check_regular_frequency
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -     assert_time_column_valid(identifiers_df, self.time_column_name, self.frequency)
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 245, in assert_time_column_valid
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  -     raise ValueError(error_message)
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  - ValueError: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column.
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  - End Python stack
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  - Installing debugging signal handler
[2022/03/10-16:04:31.406] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,404] [1/MainThread] [ERROR] [root] Containerized process terminated by signal 1
[2022/03/10-16:04:31.406] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,404] [1/MainThread] [INFO] [root] Sending error.json to backend/JEK
[2022/03/10-16:04:31.409] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,408] [1/MainThread] [DEBUG] [urllib3.connectionpool] Starting new HTTP connection (1): 172.19.86.10:37489
[2022/03/10-16:04:31.442] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,440] [1/MainThread] [DEBUG] [urllib3.connectionpool] http://172.19.86.10:37489 "POST /kernel/tintercom/containers/put-file?fileKind=EXECUTION_DIR&path=error.json&executionId=c-timeseries-forecast-1-train-evaluate-l2op4hr HTTP/1.1" 200 0
[2022/03/10-16:04:31.513] [null-out-105] [INFO] [dku.utils]  - Installing debugging signal handler
[2022/03/10-16:04:32.534] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Log streaming terminated with return code 0
[2022/03/10-16:04:32.534] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Waiting for kubernetes job to finish
[2022/03/10-16:04:32.535] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - wait for job to complete, getting job status
[2022/03/10-16:04:32.535] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - Querying job status on job dataiku-exec-c-timeseries-forecast-1-train-evaluate-l2op4hr
[2022/03/10-16:04:32.650] [Thread-79] [DEBUG] [dku.utils]  - Process kubectl-get-job done (return code 0)
[2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [DEBUG] [dku.containers.kubernetes] - kubectl get job OK, out=Complete
[2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - Kubernetes Job is complete
[2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Error file found, trying to throw it: /home/dataiku/data_dir/jobs/QP1/Build_model__NP__2022-03-10T16-03-57.239/compute_M8oAPCj8_NP/custom-python-recipe/pyoutGPQCM1a6Jh6L/error.json
[2022/03/10-16:04:32.652] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Raw error is{"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","detailedMessage":"At line 39: \u003cclass \u0027ValueError\u0027\u003e: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","stackTrace":[]}
[2022/03/10-16:04:32.652] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Now err: {"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Error in python process: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","detailedMessage":"Error in python process: At line 39: \u003cclass \u0027ValueError\u0027\u003e: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","stackTrace":[]}
[2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [DEBUG] [dku.remoterun.registry] - Finished with container execution c-timeseries-forecast-1-train-evaluate-l2op4hr
[2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [DEBUG] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"guillaume-blondel","projectKey":"QP1","jobId":"Build_model__NP__2022-03-10T16-03-57.239","activityId":"compute_M8oAPCj8_NP","activityType":"recipe","recipeType":"CustomCode_timeseries-forecast-1-train-evaluate","recipeName":"compute_M8oAPCj8"},"type":"SINGLE_K8S_JOB","id":"wL6RcMlkssv2HgpO","startTime":1646928238355,"singleK8SJob":{"k8sClusterId":"aks-design-tier-useast2","executionId":"c-timeseries-forecast-1-train-evaluate-l2op4hr"}}
[2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Cleaning Kubernetes job
[2022/03/10-16:04:32.654] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Run command securely, as user dataiku
[2022/03/10-16:04:32.654] [FRT-58-FlowRunnable] [INFO] [dku.security.process] - Starting process (regular)
[2022/03/10-16:04:32.656] [FRT-58-FlowRunnable] [INFO] [dku.security.process] - Process started with pid=2927158
[2022/03/10-16:04:32.657] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Process reads from nothing
[2022/03/10-16:04:32.754] [null-out-113] [INFO] [dku.utils]  - secret "dkusecret-c-timeseries-forecast-1-train-evaluate-l2op4hr" deleted
[2022/03/10-16:04:32.768] [null-out-113] [INFO] [dku.utils]  - job.batch "dataiku-exec-c-timeseries-forecast-1-train-evaluate-l2op4hr" deleted
[2022/03/10-16:04:32.782] [FRT-58-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_M8oAPCj8_NP
com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 39: <class 'ValueError'>: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column.
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65)
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32)
 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260)
 at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80)
 at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/03/10-16:04:32.892] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - activity is finished
[2022/03/10-16:04:32.892] [ActivityExecutor-46] [ERROR] [dku.flow.activity] running compute_M8oAPCj8_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 39: <class 'ValueError'>: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column.
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65)
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32)
 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260)
 at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80)
 at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/03/10-16:04:32.893] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Executing default post-activity lifecycle hook
[2022/03/10-16:04:32.897] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Removing samples for QP1.performance
[2022/03/10-16:04:32.899] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Removing samples for QP1.evaluation
[2022/03/10-16:04:32.899] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Done post-activity tasks

Did anyone face this? Do you know what could be the problem?

Thank you,


Operating system used: Windows

Answers

  • Registered Posts: 2 ✭✭✭

    OK so the problem is with the resample recipe. It is not populating correctly all the empty dates. I fnd 3 dates not populated. I can try to run the recipe a second time but it doesn't change anything.

    Any one faced that already?

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.