Time serie issue - Time column 'SHIP_DT' has missing values with frequency
Guillaume_fdx
Registered Posts: 2 ✭✭✭
Hello,
I keep getting this error. And the sugestion "You can use the Time Series Preparation plugin to resample your time column." Which is something I did of course.
I can see that I have continuous dates and all the cells are filled.
[2022/03/10-16:04:31.045] [null-out-105] [INFO] [dku.utils] - *************** Recipe code failed ************** [2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils] - Begin Python stack [2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils] - Traceback (most recent call last): [2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils] - File "/opt/dataiku/python/dataiku/container/exec_py_recipe.py", line 19, in <module> [2022/03/10-16:04:31.046] [null-out-105] [INFO] [dku.utils] - exec(fd.read()) [2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils] - File "<string>", line 39, in <module> [2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils] - File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 61, in prepare_timeseries_dataframe [2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils] - self._check_regular_frequency(dataframe_prepared) [2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils] - File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 129, in _check_regular_frequency [2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils] - assert_time_column_valid(identifiers_df, self.time_column_name, self.frequency) [2022/03/10-16:04:31.047] [null-out-105] [INFO] [dku.utils] - File "/home/dataiku/plugin/python-lib/timeseries_preparation/preparation.py", line 245, in assert_time_column_valid [2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils] - raise ValueError(error_message) [2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils] - ValueError: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column. [2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils] - End Python stack [2022/03/10-16:04:31.048] [null-out-105] [INFO] [dku.utils] - Installing debugging signal handler [2022/03/10-16:04:31.406] [null-out-105] [INFO] [dku.utils] - [2022-03-10 16:04:31,404] [1/MainThread] [ERROR] [root] Containerized process terminated by signal 1 [2022/03/10-16:04:31.406] [null-out-105] [INFO] [dku.utils] - [2022-03-10 16:04:31,404] [1/MainThread] [INFO] [root] Sending error.json to backend/JEK [2022/03/10-16:04:31.409] [null-out-105] [INFO] [dku.utils] - [2022-03-10 16:04:31,408] [1/MainThread] [DEBUG] [urllib3.connectionpool] Starting new HTTP connection (1): 172.19.86.10:37489 [2022/03/10-16:04:31.442] [null-out-105] [INFO] [dku.utils] - [2022-03-10 16:04:31,440] [1/MainThread] [DEBUG] [urllib3.connectionpool] http://172.19.86.10:37489 "POST /kernel/tintercom/containers/put-file?fileKind=EXECUTION_DIR&path=error.json&executionId=c-timeseries-forecast-1-train-evaluate-l2op4hr HTTP/1.1" 200 0 [2022/03/10-16:04:31.513] [null-out-105] [INFO] [dku.utils] - Installing debugging signal handler [2022/03/10-16:04:32.534] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Log streaming terminated with return code 0 [2022/03/10-16:04:32.534] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Waiting for kubernetes job to finish [2022/03/10-16:04:32.535] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - wait for job to complete, getting job status [2022/03/10-16:04:32.535] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - Querying job status on job dataiku-exec-c-timeseries-forecast-1-train-evaluate-l2op4hr [2022/03/10-16:04:32.650] [Thread-79] [DEBUG] [dku.utils] - Process kubectl-get-job done (return code 0) [2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [DEBUG] [dku.containers.kubernetes] - kubectl get job OK, out=Complete [2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [INFO] [dku.containers.kubernetes] - Kubernetes Job is complete [2022/03/10-16:04:32.651] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Error file found, trying to throw it: /home/dataiku/data_dir/jobs/QP1/Build_model__NP__2022-03-10T16-03-57.239/compute_M8oAPCj8_NP/custom-python-recipe/pyoutGPQCM1a6Jh6L/error.json [2022/03/10-16:04:32.652] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Raw error is{"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","detailedMessage":"At line 39: \u003cclass \u0027ValueError\u0027\u003e: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","stackTrace":[]} [2022/03/10-16:04:32.652] [FRT-58-FlowRunnable] [INFO] [dip.exec.resultHandler] - Now err: {"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Error in python process: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","detailedMessage":"Error in python process: At line 39: \u003cclass \u0027ValueError\u0027\u003e: Time column \u0027SHIP_DT\u0027 has missing values with frequency \u0027B\u0027. You can use the Time Series Preparation plugin to resample your time column.","stackTrace":[]} [2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [DEBUG] [dku.remoterun.registry] - Finished with container execution c-timeseries-forecast-1-train-evaluate-l2op4hr [2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [DEBUG] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"guillaume-blondel","projectKey":"QP1","jobId":"Build_model__NP__2022-03-10T16-03-57.239","activityId":"compute_M8oAPCj8_NP","activityType":"recipe","recipeType":"CustomCode_timeseries-forecast-1-train-evaluate","recipeName":"compute_M8oAPCj8"},"type":"SINGLE_K8S_JOB","id":"wL6RcMlkssv2HgpO","startTime":1646928238355,"singleK8SJob":{"k8sClusterId":"aks-design-tier-useast2","executionId":"c-timeseries-forecast-1-train-evaluate-l2op4hr"}} [2022/03/10-16:04:32.653] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Cleaning Kubernetes job [2022/03/10-16:04:32.654] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Run command securely, as user dataiku [2022/03/10-16:04:32.654] [FRT-58-FlowRunnable] [INFO] [dku.security.process] - Starting process (regular) [2022/03/10-16:04:32.656] [FRT-58-FlowRunnable] [INFO] [dku.security.process] - Process started with pid=2927158 [2022/03/10-16:04:32.657] [FRT-58-FlowRunnable] [INFO] [dku.recipes.code.base] - Process reads from nothing [2022/03/10-16:04:32.754] [null-out-113] [INFO] [dku.utils] - secret "dkusecret-c-timeseries-forecast-1-train-evaluate-l2op4hr" deleted [2022/03/10-16:04:32.768] [null-out-113] [INFO] [dku.utils] - job.batch "dataiku-exec-c-timeseries-forecast-1-train-evaluate-l2op4hr" deleted [2022/03/10-16:04:32.782] [FRT-58-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_M8oAPCj8_NP com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 39: <class 'ValueError'>: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column. at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260) at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2022/03/10-16:04:32.892] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - activity is finished [2022/03/10-16:04:32.892] [ActivityExecutor-46] [ERROR] [dku.flow.activity] running compute_M8oAPCj8_NP - Activity failed com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 39: <class 'ValueError'>: Time column 'SHIP_DT' has missing values with frequency 'B'. You can use the Time Series Preparation plugin to resample your time column. at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260) at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2022/03/10-16:04:32.893] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Executing default post-activity lifecycle hook [2022/03/10-16:04:32.897] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Removing samples for QP1.performance [2022/03/10-16:04:32.899] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Removing samples for QP1.evaluation [2022/03/10-16:04:32.899] [ActivityExecutor-46] [INFO] [dku.flow.activity] running compute_M8oAPCj8_NP - Done post-activity tasks
Did anyone face this? Do you know what could be the problem?
Thank you,
Operating system used: Windows
Tagged:
Answers
-
OK so the problem is with the resample recipe. It is not populating correctly all the empty dates. I fnd 3 dates not populated. I can try to run the recipe a second time but it doesn't change anything.
Any one faced that already?