Time series must have at least 3 values error with data in long format.

michaelm9689
michaelm9689 Partner, Dataiku DSS Core Designer, Registered Posts: 1 Partner

Hi,

I am having some problems with the the time series plugin in and running the Train and forecasting models recipe.

We are coming up with a use-case to predict Fantasy Football players points by using historical data. I have split up the players in different categories (mid, def, gk, etc) so that I can predict the best preforming player out of these categories to make up up a team.

We have put the data through the time series resampling recipe and then ran the Train and forecasting models recipe and we got a "Job failed: Error in python process: At line 59: <class 'ValueError'>: Time series must have at least 3 values"

I have attached an screenshot of the data set which I believe is in Long Format with the player name as the time series identifier, there will be 32 rows for each player for the 32 games in the season. As well as the parameters we set for the recipe. Has anyone had this issue before and how was it resolved?

Error code:

[13:53:41] [INFO] [dku.utils]  - *************** Recipe code failed **************
[13:53:41] [INFO] [dku.utils]  - Begin Python stack
[13:53:41] [INFO] [dku.utils]  - Traceback (most recent call last):
[13:53:41] [INFO] [dku.utils]  -   File "/opt/dataiku/python/dataiku/container/exec_py_recipe.py", line 19, in <module>
[13:53:41] [INFO] [dku.utils]  -     exec(fd.read())
[13:53:41] [INFO] [dku.utils]  - Installing debugging signal handler
[13:53:41] [INFO] [dku.utils]  -   File "<string>", line 59, in <module>
[13:53:41] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/gluonts_forecasts/training_session.py", line 114, in create_gluon_datasets
[13:53:41] [INFO] [dku.utils]  -     gluon_list_datasets = gluon_dataset.create_list_datasets(cut_lengths=[self.prediction_length, 0])
[13:53:41] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/gluonts_forecasts/gluon_dataset.py", line 52, in create_list_datasets
[13:53:41] [INFO] [dku.utils]  -     identifiers_df, cut_length, identifiers_values=identifiers_values
[13:53:41] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/gluonts_forecasts/gluon_dataset.py", line 73, in _create_gluon_multivariate_timeseries
[13:53:41] [INFO] [dku.utils]  -     self._check_minimum_length(dataframe, cut_length)
[13:53:41] [INFO] [dku.utils]  -   File "/home/dataiku/plugin/python-lib/gluonts_forecasts/gluon_dataset.py", line 130, in _check_minimum_length
[13:53:41] [INFO] [dku.utils]  -     raise ValueError(f"Time series must have at least {min_length} values")
[13:53:41] [INFO] [dku.utils]  - ValueError: Time series must have at least 3 values
[13:53:41] [INFO] [dku.utils]  - End Python stack
[13:53:41] [INFO] [dku.utils]  - [2021-10-25 13:53:41,627] [1/MainThread] [ERROR] [root] Containerized process terminated by signal 1
[13:53:41] [INFO] [dku.utils]  - [2021-10-25 13:53:41,628] [1/MainThread] [INFO] [root] Sending error.json to backend/JEK
[13:53:41] [INFO] [dku.utils]  - [2021-10-25 13:53:41,631] [1/MainThread] [DEBUG] [urllib3.connectionpool] Starting new HTTP connection (1): 10.0.178.91:36589
[13:53:41] [INFO] [dku.utils]  - [2021-10-25 13:53:41,656] [1/MainThread] [DEBUG] [urllib3.connectionpool] http://10.0.178.91:36589 "POST /kernel/tintercom/containers/put-file?executionId=c-timeseries-forecast-1-train-evaluate-io1ynbw&fileKind=EXECUTION_DIR&path=error.json HTTP/1.1" 200 0
[13:53:41] [INFO] [dku.utils]  - Installing debugging signal handler
[13:53:42] [INFO] [dku.recipes.code.base] - Log streaming terminated with return code 0
[13:53:42] [INFO] [dku.recipes.code.base] - Waiting for kubernetes job to finish
[13:53:42] [INFO] [dku.containers.kubernetes] - wait for job to complete, getting job status
[13:53:42] [INFO] [dku.containers.kubernetes] - Querying job status on job dataiku-exec-c-timeseries-forecast-1-train-evaluate-io1ynbw
[13:53:42] [DEBUG] [dku.utils]  - Process kubectl-get-job done (return code 0)
[13:53:42] [DEBUG] [dku.containers.kubernetes] - kubectl get job OK, out=Complete
[13:53:42] [INFO] [dku.containers.kubernetes] - Kubernetes Job is complete
[13:53:42] [INFO] [dip.exec.resultHandler] - Error file found, trying to throw it: /data/dataiku/datadir/jobs/MICHAELFPL/Build_train__NP__2021-10-25T13-53-21.210/compute_4yvkY8ym_NP/custom-python-recipe/pyoutxCY6Z0anRh06/error.json
[13:53:42] [INFO] [dip.exec.resultHandler] - Raw error is{"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Time series must have at least 3 values","detailedMessage":"At line 59: \u003cclass \u0027ValueError\u0027\u003e: Time series must have at least 3 values","stackTrace":[]}
[13:53:42] [INFO] [dip.exec.resultHandler] - Now err: {"errorType":"\u003cclass \u0027ValueError\u0027\u003e","message":"Error in python process: Time series must have at least 3 values","detailedMessage":"Error in python process: At line 59: \u003cclass \u0027ValueError\u0027\u003e: Time series must have at least 3 values","stackTrace":[]}
[13:53:42] [DEBUG] [dku.remoterun.registry] - Finished with container execution c-timeseries-forecast-1-train-evaluate-io1ynbw
[13:53:42] [DEBUG] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"michael.mcdonnell@dtsquared.co.uk","projectKey":"MICHAELFPL","jobId":"Build_train__NP__2021-10-25T13-53-21.210","activityId":"compute_4yvkY8ym_NP","activityType":"recipe","recipeType":"CustomCode_timeseries-forecast-1-train-evaluate","recipeName":"compute_4yvkY8ym"},"type":"SINGLE_K8S_JOB","id":"gxDvfMNZIEwFHu17","startTime":1635170009831,"singleK8SJob":{"k8sClusterId":"__builtin__","executionId":"c-timeseries-forecast-1-train-evaluate-io1ynbw"}}
[13:53:42] [INFO] [dku.recipes.code.base] - Cleaning Kubernetes job
[13:53:42] [INFO] [dku.recipes.code.base] - Run command securely, as user dataiku
[13:53:42] [INFO] [dku.security.process] - Starting process (regular)
[13:53:42] [INFO] [dku.security.process] - Process started with pid=6685
[13:53:42] [INFO] [dku.recipes.code.base] - Process reads from nothing
[13:53:42] [INFO] [dku.utils]  - secret "dkusecret-c-timeseries-forecast-1-train-evaluate-io1ynbw" deleted
[13:53:42] [INFO] [dku.utils]  - job.batch "dataiku-exec-c-timeseries-forecast-1-train-evaluate-io1ynbw" deleted
[13:53:42] [INFO] [dku.flow.activity] - Run thread failed for activity compute_4yvkY8ym_NP
com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 59: <class 'ValueError'>: Time series must have at least 3 values
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65)
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32)
 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260)
 at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80)
 at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[13:53:42] [INFO] [dku.flow.activity] running compute_4yvkY8ym_NP - activity is finished
[13:53:42] [ERROR] [dku.flow.activity] running compute_4yvkY8ym_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Error in python process: At line 59: <class 'ValueError'>: Time series must have at least 3 values
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleErrorFile(JobExecutionResultHandler.java:65)
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResultNoProcessDiedException(JobExecutionResultHandler.java:32)
 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.executeKubernetesCodeRecipe(AbstractCodeBasedRecipeRunner.java:260)
 at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:80)
 at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[13:53:42] [INFO] [dku.flow.activity] running compute_4yvkY8ym_NP - Executing default post-activity lifecycle hook
[13:53:42] [INFO] [dku.flow.activity] running compute_4yvkY8ym_NP - Removing samples for MICHAELFPL.preformance
[13:53:42] [INFO] [dku.flow.activity] running compute_4yvkY8ym_NP - Removing samples for MICHAELFPL.eval
[13:53:42] [INFO] [dku.flow.activity] running compute_4yvkY8ym_NP - Done post-activity tasks

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker

    Hi @michaelm9689
    ,

    Were you able to resolve this? As mentioned directly in your support ticket

    You will need to double-check that every player in your dataset has at least 3 rows.

    You can click on column Name column -> Analyze and check the Count of each value. As needed you can filter out the names that don't have at least 3 rows.

    Thanks,

Setup Info
    Tags
      Help me…