How can I compute an exponential moving average in a preparation script?
Best Answer
-
Currently the visual preparation recipes are working line by line. This allows to work on very large datasets by streaming, potentially in parallel on Hadoop. The downside is that each line is processed independently, so, an exponential moving average cannot reliably be done in this type of recipe.
If the dataset fits in memory, I would go with a Python recipe and use the functions from Pandas.
Answers
-
Thank you Jrouquie.
Struggled with this as a new user so wanted to share the Python/Pandas recipe. I am strugling to post all of the code in - even if in a code block.
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu# Read recipe inputs
autoSleep_20190303_20200606_prepared = dataiku.Dataset("AutoSleep_20190303_20200606_prepared")
autoSleep_20190303_20200606_prepared_df = autoSleep_20190303_20200606_prepared.get_dataframe()#Apply rolling average here
autoSleep_20190303_20200606_prepared_df['bedtime_fortmattedAvg7']=autoSleep_20190303_20200606_prepared_df['bedtime_formatted'].rolling(window=7).mean()#Output to Dataiku
autoSleep_RollingAverage_df = autoSleep_20190303_20200606_prepared_df#Update column with MA
autoSleep_RollingAverage = dataiku.Dataset("AutoSleep_RollingAverage")
#autoSleep_RollingAverage.write_with_schema(autoSleep_RollingAverage_df)