Forecast Plugin - error in partitioned dataset - MAE

AngelsF
AngelsF Registered Posts: 6 ✭✭✭✭

Hello!

I'm using the Forecast plugin to predict the monthly sales for many stores, so the dataset is partitioned by store and all the flow is in a scenario.
In the second recipe of the plugin (Train models and evaluate errors on historical data) I select all the models except Seasonal Trend, in Expert Mode. In the Exponential Smoothing model, I choose Additive type for error, trend and seasonality.
When running the scenario and creating the evaluation result dataset and trained model folder, there is an error for some of the stores because of that model (Error in R process: simpleError : No model able to be fitted). This is because "Seasonal component could not be estimated" for those stores. But for the rest of the 4 models there is no error.
The scenario finishes because the option "Run this step" is set to "Always", but neither in the evaluation dataset nor in the forecast dataset those stores appear.

Can I get the evaluation and the forecast for those stores for the 4 models that run correctly and the 5 models for the rest of the stores?

Besides, I have a doubt in the calculation of MAE and MAPE of the models. The evaluation result dataset gives the MAE and MAPE for each store and each model, and the model is selected based on the MAE for each store. But when I calculate the MAE and MAPE from the forecast dataset, I don't get the same result as in the evaluation result dataset.
Reading some documentation, I think they are calculated with the last H values (test set in Error Evaluation) of the time series, 3 in my case.
Is that correct? How does the plugin calculate MAE and MAPE?

Let me know if you need more information.

Thank you very much in advance.
Regards.

Best Answer

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Answer ✓

    Hi,

    The steps you are taking are correct. The two numbers are actually different by design.

    In the output of the "Forecast future values and get historical residuals" recipe, the "forecast" and "forecast_residuals" columns are in-sample one-step forecasts (when you choose to include the history). More specifically, the plugin uses the $fitted values from the R forecast package (https://cran.r-project.org/web/packages/forecast/forecast.pdf)

    In contrast, the "Train models and evaluate errors" recipe computes the errors using out-of-sample multi-step forecasts. Taking the case of the temporal Train/Test Split, it trains the model on everything except the last Horizon steps, and evaluates the error on these last steps. It follows the steps described in: https://otexts.com/fpp2/accuracy.html

    Hope that clarifies,

    Alex

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    Thanks for the interest in this Dataiku plugin. To answer your two questions:

    1. Can I get the evaluation and the forecast for those stores for the 4 models that run correctly and the 5 models for the rest of the stores?

    Right now, this is not possible. If a model cannot converge correctly for some store partitions, it stops the model training and evaluation for that particular store. I understand your request and will pass it for consideration in our backlog. Right now, the solution is to deactivate this model in the recipe screen to avoid such convergence problems for all partitions.

    How does the plugin calculate MAE and MAPE?

    If you have chosen "Train/Test split" as error evaluation strategy, the MAE and MAPE are computed on the last "Horizon" steps of the training set. If you have chosen "Time Series Cross-Validation" then it's using a more advanced computation from the Prophet forecast library. You can find more information on this page: https://www.dataiku.com/product/plugins/forecast/

    Also, as this is an open-source project, feel free to read the code / fork / contribute. It's all on https://github.com/dataiku/dss-plugin-time-series-forecast.

    Hope it helps,

    Alex

  • AngelsF
    AngelsF Registered Posts: 6 ✭✭✭✭

    Hi again and thanks for your prompt reply.

    It’s ok about the first answer. Thanks.

    About the second one, “If you have chosen "Train/Test split" as error evaluation strategy, the MAE and MAPE are computed on the last "Horizon" steps of the training set”. Is it the training or the test set?

    Maybe I’m misunderstanding something, but I would like to clarify it with an example for just one store:

    This is the flow:

    1_flow.png

    And this is part of the cleaned dataset which will be trained:

    2_dataset.png

    As this is an example, we can choose “Automated mode” selecting all the models except Seasonal Trend. About the Error evaluation, I choose “Train/Test split” and the last 3 steps (last 3 rows in the dataset):

    3_second_recipe.png

    This is the evaluation result dataset with the MAE and MAPE, inter alia, for each model. In this case, Baseline will be chosen as it’s the one with fewer MAE:

    4_dataset.png

    This is the third recipe of the plugin:

    5_third_recipe.png

    As in every run I get the same results, I have assumed that I can calculate MAE from the last three rows of the dataset where I have real values (and which I think they are the 3 Horizon in Error Evaluation).

    This is the forecast dataset including the absolute error between sales and forecast (column forecast_residuals_abs):

    6_dataset.png

    So, these are my calculations to get MAE. We just calculate the mean of the three last values of the “forecast_residuals_abs” column:

    (54172.65 + 57309.02 + 4984.59) / 3 = 38822.09

    Which is not equal to the one provided in the evaluation result dataset (21809.99).

    The same happens with the MAPE.

    Am I wrong in any of the steps?

    Thanks for your answer in advance.

  • AngelsF
    AngelsF Registered Posts: 6 ✭✭✭✭

    Wow! Yeah, great, I understand. Thank you very much!

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi @AngelsF
    !

    We are proud to announce that we just released a new Forecast plugin. Among other features, it supports multivariate forecasting natively, with no need to partition your data

    Forecasting sales across 1000s of stores and departments is now as simple as this:

    Screenshot 2021-02-11 at 11.40.33.png

    On top of this, you will benefit from the latest Deep Learning models from GluonTS such as DeepAR and Transformer.

    Give it a try, let us know what you think, and reshare if you like it

    Cheers,

    Alex

Setup Info
    Tags
      Help me…