New Forecast plugin - select the best model to predict for each identifier

Solved!
AngelsF
Level 2
New Forecast plugin - select the best model to predict for each identifier

Hello!

I have been testing the new Forecast plugin. I must say that it is really fast and has several good models to predict with no need to partition the data. Congratulations for that!

I have two doubts about it. I want to predict the monthly sales for many stores and in the first recipe (train and evaluate forecasting models) I configure all the requested information. I select "Expert-Choose Algorithms" as the Forecasting mode and all the available Statical and Deep Learning models. It seems that for some specific stores the AutoARIMA model fails:

Error in Python process: At line 63: <class 'gluonts_forecasts.model.ModelTrainingError'>: GluonTS 'autoarima' model crashed during training. Full error: Seasonality of AutoARIMA can't be set to 12. Error when testing seasonality with nsdiffs: shapes (4,2) and (1,) not aligned: 2 (dim 1) != 1 (dim 0)

So the whole execution fails.
I'm wondering if there is any way to avoid the failure for the rest of stores. Or as it is a "unique" process, the solution would be to deselect the AutoARIMA model.

On the other hand, in the second recipe (Forecast future values) I select the "Automatic" selection mode and the MAPE as Performance metric (in the first recipe I had chosen Expert - Choose Algorithms and all the models except AutoARIMA). When I see the results I realise that all the stores have been predicted with the FeedForward model. I imagine this is so because in the aggregated performace metrics the winner is indeed FeedForward. Am I right?

 

1.png

But in fact, I would like the behaviour was to select the best model for each store, not in the overall, as each one is predicted indepently. Is that possible?

 

Let me know if you need any further information.

Thank you very much in advanced.

Regards.

1 Solution
Alex_Combessie
Dataiker Alumni

Hi,

Thanks for the feedback!

I confirm that when using "Long format" to train models on multiple time series, each model must be able to converge on every time series. From the error message, it does look like AutoARIMA is not able to do so on some specific series (see documentation). You may be able to overcome this issue by activating the "Expert - Customize Algorithms" option in Forecasting mode and try a different Season length parameter. 

Screenshot 2021-03-10 at 17.20.20.png

If that doesn't work, then indeed, you will need to deactivate the AutoARIMA model.

I also confirm that the "Automatic" option of the Selection mode parameter picks a model based on the best __aggregated__ performance metric. We will log your request to add a more advanced strategy where we predict future values of each time series by ensembling the results of the best model for each time series. In the meantime, you can achieve a similar behavior by partitioning your data (see documentation). However, you will lose the benefit of learning patterns across multiple time series.

Hope it helps,

Alex

View solution in original post

1 Reply
Alex_Combessie
Dataiker Alumni

Hi,

Thanks for the feedback!

I confirm that when using "Long format" to train models on multiple time series, each model must be able to converge on every time series. From the error message, it does look like AutoARIMA is not able to do so on some specific series (see documentation). You may be able to overcome this issue by activating the "Expert - Customize Algorithms" option in Forecasting mode and try a different Season length parameter. 

Screenshot 2021-03-10 at 17.20.20.png

If that doesn't work, then indeed, you will need to deactivate the AutoARIMA model.

I also confirm that the "Automatic" option of the Selection mode parameter picks a model based on the best __aggregated__ performance metric. We will log your request to add a more advanced strategy where we predict future values of each time series by ensembling the results of the best model for each time series. In the meantime, you can achieve a similar behavior by partitioning your data (see documentation). However, you will lose the benefit of learning patterns across multiple time series.

Hope it helps,

Alex