Time serie issue - Time column 'SHIP_DT' has missing values with frequency

Guillaume_fdx
Level 1
Time serie issue - Time column 'SHIP_DT' has missing values with frequency

Hello,

 

I keep getting this error. And the sugestion "You can use the Time Series Preparation plugin to resample your time column." Which is something I did of course.

 

I can see that I have continuous dates and all the cells are filled.

 

 

[2022/03/10-16:04:31.045] [null-out-105] [INFO] [dku.utils]  - *************** Recipe code failed **************
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  - Begin Python stack
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  - Traceback (most recent call last):
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/container/exec_py_recipe.py", line 19, in <module>
[2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils]  -     exec(fd.read())
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "<string>", line 39, in <module>
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 61, in prepare_timeseries_dataframe
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -     self._check_regular_frequency(dataframe_prepared)
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 129, in _check_regular_frequency
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -     assert_time_column_valid(identifiers_df, self.time_column_name, self.frequency)
[2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 245, in assert_time_column_valid
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  -     raise ValueError(error_message)
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  - ValueError: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column.
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  - End Python stack
[2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils]  - Installing debugging signal handler
[2022/03/10-16:04:31.406] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,404] [1/MainThread] [ERROR] [root] Containerized process terminated by signal 1
[2022/03/10-16:04:31.406] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,404] [1/MainThread] [INFO] [root] Sending error.json to backend/JEK
[2022/03/10-16:04:31.409] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,408] [1/MainThread] [DEBUG] [urllib3.connectionpool] Starting new HTTP connection (1): 172.19.86.10:37489
[2022/03/10-16:04:31.442] [null-out-105] [INFO] [dku.utils]  - [2022-03-10 16:04:31,440] [1/MainThread] [DEBUG] [urllib3.connectionpool] http://172.19.86.10:37489 "POST /kernel/tintercom/containers/put-file?fileKind=EXECUTION_DIR&path=error.json&executionId=c-timeseries-forecast-1-train-evaluate-l2op4hr HTTP/1.1" 200 0
[2022/03/10-16:04:31.513] [null-out-105] [INFO] [dku.utils]  - Installing debugging signal handler
[2022/03/10-16:04:32.534] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Log streaming terminated with return code 0
[2022/03/10-16:04:32.534] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Waiting for kubernetes job to finish
[2022/03/10-16:04:32.535] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - wait for job to complete, getting job status
[2022/03/10-16:04:32.535] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - Querying job status on job dataiku-exec-c-timeseries-forecast-1-train-evaluate-l2op4hr
[2022/03/10-16:04:32.650] [Thread-79] [DEBUG] [dku.utils]  - Process kubectl-get-job done (return code 0)
[2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [DEBUG] [dku.containers.kubernetes] - kubectl get job OK, out=Complete
[2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - Kubernetes Job is complete
[2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Error file found, trying to throw it: /home/dataiku/data_dir/jobs/QP1/Build_model__NP__2022-03-10T16-03-57.239/compute_M8oAPCj8_NP/custom-python-recipe/pyoutGPQCM1a6Jh6L/error.json
[2022/03/10-16:04:32.652] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Raw error is{"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","detailedMessage":"At line 39: \u003cclass \u0027ValueError\u0027\u003e: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","stackTrace":[]}
[2022/03/10-16:04:32.652] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Now err: {"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Error in python process: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","detailedMessage":"Error in python process: At line 39: \u003cclass \u0027ValueError\u0027\u003e: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","stackTrace":[]}
[2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [DEBUG] [dku.remoterun.registry] - Finished with container execution c-timeseries-forecast-1-train-evaluate-l2op4hr
[2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [DEBUG] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"guillaume-blondel","projectKey":"QP1","jobId":"Build_model__NP__2022-03-10T16-03-57.239","activityId":"compute_M8oAPCj8_NP","activityType":"recipe","recipeType":"CustomCode_timeseries-forecast-1-train-evaluate","recipeName":"compute_M8oAPCj8"},"type":"SINGLE_K8S_JOB","id":"wL6RcMlkssv2HgpO","startTime":1646928238355,"singleK8SJob":{"k8sClusterId":"aks-design-tier-useast2","executionId":"c-timeseries-forecast-1-train-evaluate-l2op4hr"}}
[2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Cleaning Kubernetes job
[2022/03/10-16:04:32.654] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Run command securely, as user dataiku
[2022/03/10-16:04:32.654] [FRT-58-FlowRunnable] [INFO] [dku.security.process] - Starting process (regular)
[2022/03/10-16:04:32.656] [FRT-58-FlowRunnable] [INFO] [dku.security.process] - Process started with pid=2927158
[2022/03/10-16:04:32.657] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Process reads from nothing
[2022/03/10-16:04:32.754] [null-out-113] [INFO] [dku.utils]  - secret "dkusecret-c-timeseries-forecast-1-train-evaluate-l2op4hr" deleted
[2022/03/10-16:04:32.768] [null-out-113] [INFO] [dku.utils]  - job.batch "dataiku-exec-c-timeseries-forecast-1-train-evaluate-l2op4hr" deleted
[2022/03/10-16:04:32.782] [FRT-58-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_M8oAPCj8_NP
com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 39: <class 'ValueError'>: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column.
	at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65)
	at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260)
	at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/03/10-16:04:32.892] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - activity is finished
[2022/03/10-16:04:32.892] [ActivityExecutor-46] [ERROR] [dku.flow.activity] running compute_M8oAPCj8_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 39: <class 'ValueError'>: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column.
	at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65)
	at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260)
	at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/03/10-16:04:32.893] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Executing default post-activity lifecycle hook
[2022/03/10-16:04:32.897] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Removing samples for QP1.performance
[2022/03/10-16:04:32.899] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Removing samples for QP1.evaluation
[2022/03/10-16:04:32.899] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Done post-activity tasks

Did anyone face this? Do you know what could be the problem?

 

Thank you,


Operating system used: Windows

0 Kudos
1 Reply
Guillaume_fdx
Level 1
Author

OK so the problem is with the resample recipe. It is not populating correctly all the empty dates. I fnd 3 dates not populated. I can try to run the recipe a second time but it doesn't change anything.

 

Any one faced that already?

0 Kudos