Time series must have at least 3 values
I am trying to get a forecast on some data I have, but this error keeps appearing, and when I try and Google it, nothing shows. I have time sampled the data successfully. To do this, I used 4 different identifying columns to build an ID for each row, and then used this new ID column as the "Column with identifier" while time sampling. I then filtered out any rows with null values, leaving me with what should be filtered, sampled data. To forecast the data, I use the AutoML - Quick Prototypes option. When I try and model it, I get the following error:
```Job failed: Error in Python process: At line 59: <class 'ValueError'>: Time series must have at least 3 values```.
I am not sure what the "3 values" is referring to.
I am fairly new to dataiku, so I am not too sure what other information is relevant. If I need to clarify anything, please let me know. Thanks
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi, @reeda19
! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community. -
Krishna Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Product Ideas Manager Posts: 18 Dataiker
Hi @reeda19
,It's possible that your data has not been prepared in the "Long format" https://doc.dataiku.com/dss/latest/time-series/data-formatting.html#long-format which would necessitate the need for the "Column with Identifier".
The error message, and your mention of "I used 4 different identifying columns to build an ID for each row", makes it sound like the issue is that you've created a unique identifier for each record, and passed that through as the "Column with Identifier". This would then lead DSS to believe you have several timeseries with a single record, whereas it's expecting at least 3 records per series.
If your input data does not contain multiple series 'stacked' together, then it it is in wide format and you should not use the 'column with identifier' parameter. If it is, then as per the documentation example, you ought to use a column such as 'carriergroup', as the 'column with identifier'.
Hope that helps.