How can I compute an exponential moving average in a preparation script?

Options
UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker

Best Answer

  • jrouquie
    jrouquie Dataiker Alumni Posts: 87 ✭✭✭✭✭✭✭
    Answer ✓
    Options
    Currently the visual preparation recipes are working line by line. This allows to work on very large datasets by streaming, potentially in parallel on Hadoop. The downside is that each line is processed independently, so, an exponential moving average cannot reliably be done in this type of recipe.

    If the dataset fits in memory, I would go with a Python recipe and use the functions from Pandas.

Answers

  • 7TonRobot
    7TonRobot Registered Posts: 2 ✭✭✭
    Options

    Thank you Jrouquie.

    Struggled with this as a new user so wanted to share the Python/Pandas recipe. I am strugling to post all of the code in - even if in a code block.

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu

    # Read recipe inputs
    autoSleep_20190303_20200606_prepared = dataiku.Dataset("AutoSleep_20190303_20200606_prepared")
    autoSleep_20190303_20200606_prepared_df = autoSleep_20190303_20200606_prepared.get_dataframe()

    #Apply rolling average here
    autoSleep_20190303_20200606_prepared_df['bedtime_fortmattedAvg7']=autoSleep_20190303_20200606_prepared_df['bedtime_formatted'].rolling(window=7).mean()

    #Output to Dataiku
    autoSleep_RollingAverage_df = autoSleep_20190303_20200606_prepared_df

    #Update column with MA
    autoSleep_RollingAverage = dataiku.Dataset("AutoSleep_RollingAverage")
    #autoSleep_RollingAverage.write_with_schema(autoSleep_RollingAverage_df)

Setup Info
    Tags
      Help me…