Forecast Plugin - partition dataset

Rik_Veenboer
Rik_Veenboer Registered Posts: 7 ✭✭✭✭

Your forecast plugin looks great, but the flow takes all values as a single timeseries. Is it possible to specify a column to partition the data on? It would be nice to train and predict forecast models for multiple entities in one go.

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hey,

    Thanks for your interest in this recent plugin release.

    If you want run the recipes to get multiple forecasting models per category (e.g. per product or store), you will need partitioning. That requires to have all datasets partitioned by 1 dimension for the category, using the discrete dimension feature in Dataiku. If the input data is not partitioned, you can use a Sync recipe to repartition it, as explained in this article.

    Hope it helps,

    Alex

  • Rik_Veenboer
    Rik_Veenboer Registered Posts: 7 ✭✭✭✭
    Thanks for your quick reply. I've tried rebuilding the flow with partitioning and that indeed works.

    Maybe it's more of a general question: can you run a recipe for all partitions in the dataset? I could not find instructions on how to do so in the manual or in this QA section. Manually specifying 10's or 100's of partitions is not practical, if technically possible at all (?).

    I was hoping to to find a feature that let's me to this in this plugin itself. Any help in scaling this up to more than a couple of entities would be very helpful.
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Indeed there is no visual way in the partitioning menu of a recipe (of this plugin or any other recipe) to select all partitions. To do so, you would need an additional step to compute the complete list of partitions and store it as a project variable. Here is a piece of boilerplate code in python to do so:

    combinations = np.unique(df["store_department"])
    combinations_str = "/".join(combinations)

    client = dataiku.api_client()
    project = client.get_project(dataiku.default_project_key())
    variables = project.get_variables()
    variables["standard"]["store_department_combinations"] = combinations_str
    project.set_variables(variables)

    Then you can copy paste the /-separated list of partitions in the partition menu of the plugin recipe.
  • lisa811
    lisa811 Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 14 Partner
    Hello Alex,
    I'm facing the same issue as Rik. I' ve partioned a dataset into 10 partitions according to an ID column. Now I would like to apply the forecast recipe to each of them.
    Could you tell me where exactly I have to enter this pyrhon code and where the list of partitions is stored?
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Hi, I have an example project to demonstrate the use of partitions with the plugin. Is there an email address I could send it to you?
  • alobrano
    alobrano Registered Posts: 1 ✭✭✭✭

    thanks for this plugin. I am also interested to see some example on how to work with partitions and your plugin

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    We are working on a public example of a forecasting project with partitions. It will be published this month on https://gallery.dataiku.com/home/.

    In the meantime, this video offers a good introduction to partitioning in DSS: https://www.youtube.com/watch?v=yULLxeqx3gI

    Cheers,

    Alex Combessie

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi @Rik_Veenboer
    @lisa811
    @alobrano
    ,

    We are proud to announce that we just released a new Forecast plugin. Among other features, it supports multivariate forecasting natively, with no need to partition your data

    Forecasting sales across 1000s of stores and departments is now as simple as this:

    Screenshot 2021-02-11 at 11.40.33.png

    On top of this, you will benefit from the latest Deep Learning models from GluonTS such as DeepAR and Transformer.

    Give it a try, let us know what you think, and reshare if you like it

    Cheers,

    Alex

  • Ouma
    Ouma Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 12 ✭✭✭✭

    Hi @Alex_Combessie
    !

    The new forecast plugin is just amazing, It makes multivariate Time series forecasting much easier.

    I'm wondering why classical Algo(Included in the old version of the plugin (legacy)) like Exponential smoothing, is not part of the new one?

    for instance, if I want to use Exp smoothing, I should partition my multivariate Data set and use the legacy forecast plugin while it would be good if I could directly use the new one.

    Thanks!

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi @Ouma
    ,

    Thanks for the kind words.

    When developing the new Forecast plugin, we have performed benchmarks on performance and runtime to help choose which algorithms to include by default. We are very much open to include new ones in future updates.

    In the specific case of Exponential Smoothing (ETS) it is available as a modeling option in the "Seasonal Trend" model, as explained here: https://www.dataiku.com/product/plugins/timeseries-forecast/#stat-models. It removes seasonality, and then applies an ETS model from statsmodels.

    Hope it helps,

    Alex

  • Ouma
    Ouma Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 12 ✭✭✭✭
Setup Info
    Tags
      Help me…