How to resample a time series dataset

Dataiku DSS has a time series preparation plugin that includes a resampling recipe. You can use this recipe to resample time series data occurring in irregular time intervals or to change the intervals from an existing time scale to another.

To use the resampling recipe, first install the Time Series Preparation plugin on your Dataiku instance (see the Instructions for installing plugins). Once installed, you should see the summary page of the installed plugin shown below:

Summary page of the installed pluginSummary page of the installed plugin

Before using the plugin with your dataset, make sure that the storage type for your date column is “date”. You can click the storage type for your date column and select “date” if this isn’t already the case.

Also, note that the plugin works with a parsed date column, that is, the column meaning (detected by DSS) is of “Date” type (highlighted by the blue oval in the following figure). The red box highlights the storage type.

Column meaning (in blue oval) and storage type (in red box) of the "order_date" columnColumn meaning (in blue oval) and storage type (in red box) of the "order_date" column

To resample the dataset, do the following

  • Click the dataset in the Flow, and then click the Time Series Preparation plugin in the right panel.

Time Series Preparation plugin to apply on datasetTime Series Preparation plugin to apply on dataset

  • Select the Time series resampling recipe from the window that appears.
  • Keep the default value for the “Input time series”, and name the output dataset. Then create the output dataset.

A “Time series resampling” window opens up. In this window, you can specify values for the parameters to resample your dataset. For example, if you want to resample your dataset to daily time steps:

  • Set the “Timestamp column” to the name of your date column.
  • For the resampling parameters, specify “Time step”: 1 and “Unit”: Days.
  • Select methods to use for interpolation or extrapolation from the dropdown menus.
  • Edit your time series by clipping the start or end, or by shifting the time stamps.
  • Specify if your data is in the long format. For data in the long format, specify the name of the column that contains the identifiers.

“Time series resampling” window“Time series resampling” window

Save and run the recipe to build the resampled dataset.

What’s next?

  • For more details about the resampling recipe and its parameters, see the time series Resampling page in the reference documentation.
Share:

Labels

?
Labels (3)
Version history
Publication date:
01-01-2021 07:00 PM
Version history
Last update:
‎01-03-2020 06:12 PM
Updated by:
Contributors