How to resample a time series dataset

Dataiku
Dataiku Administrator, Dataiker, Alpha Tester Posts: 88 Administrator

Dataiku DSS has a time series preparation plugin that includes a resampling recipe. You can use this recipe to resample time series data occurring in irregular time intervals or to change the intervals from an existing time scale to another.

To use the resampling recipe, first install the Time Series Preparation plugin on your Dataiku instance (see the Instructions for installing plugins). Once installed, you should see the summary page of the installed plugin shown below:

Installed_prep_plugin.png

Before using the plugin with your dataset, make sure that the storage type for your date column is “date”. You can click the storage type for your date column and select “date” if this isn’t already the case.

Also, note that the plugin works with a parsed date column, that is, the column meaning (detected by DSS) is of “Date” type (highlighted by the blue oval in the following figure). The red box highlights the storage type.

resample_DSSmeaning.png

To resample the dataset, do the following

  • Click the dataset in the Flow, and then click the Time Series Preparation plugin in the right panel.

resample_select_plugin.png

  • Select the Time series resampling recipe from the window that appears.
  • Keep the default value for the “Input time series”, and name the output dataset. Then create the output dataset.

A “Time series resampling” window opens up. In this window, you can specify values for the parameters to resample your dataset. For example, if you want to resample your dataset to daily time steps:

  • Set the “Timestamp column” to the name of your date column.
  • For the resampling parameters, specify “Time step”: 1 and “Unit”: Days.
  • Select methods to use for interpolation or extrapolation from the dropdown menus.
  • Edit your time series by clipping the start or end, or by shifting the time stamps.
  • Specify if your data is in the long format. For data in the long format, specify the name of the column that contains the identifiers.

resampling_recipe.png

Save and run the recipe to build the resampled dataset.

What’s next?

  • For more details about the resampling recipe and its parameters, see the time series Resampling page in the reference documentation.
Setup Info
    Tags
      Help me…