featurize_time_series

Extracts features from time series data for machine learning or analysis. Supports three feature sets:

catch22: 22 time series features, plus optional mean and standard deviation (24 total). See details about each feature here.
tsfeatures: Statistical features including trend, seasonality, autocorrelation, etc. See details about each feature here.
growth: Simple, average, compound, and linear growth metrics.

The step takes a dataset with time series data in “tall” format (one row per time point) or “wide” format (time points in columns), and produces a dataset with the calculated features.

Usage

The following example shows how the step can be used in a recipe.

Examples

To calculate all “growth” metrics:

featurize_time_series(ds, {
  "id": "product_id",
  "time": "time_added",
  "value": "item_total",
  "sets": ["growth"]
}) -> (features)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

string (ds.column)

required

Column name containing the time series identifier.

time

string (ds.column)

required

Column name containing the timestamps.

value

string (ds.column)

required

Column name containing the values to featurize.

sets

[string, array[string]]

default:"catch22"

Feature sets to include. E.g. “catch22” or “tsfeatures”. If no individual features are configured using the features pararmeter, all features from the selected set will be computed. If multiple sets are selected, all features from each set will be computed. If all is selected, all features from all sets will be computed.Values must be one of the following:

catch22
tsfeatures
growth
all

Array items

features

object

Custom features to compute from each feature set.

Properties

Examples

freq

[null, string, integer]

Frequency to use by features in the TSFeatures set. The number of observations in a single cycle. Used by certain features (for now only in the tsfeatures set), that are based on seasonality. When a string (character) is provided, this is interpreted as the natural frequency of the time series and will be translated to the number of observations per cycle using the following mapping:

‘H’: 24 (hourly)
‘D’: 1 (daily)
‘M’: 12 (monthly)
‘Q’: 4 (quarterly)
‘W’: 1 (weekly)
‘Y’: 1 (yearly)

E.g. if the natural frequency of the time series is monthly (‘M’), will analyze seasonality with a period of 12 observations (months in a year). If a number is provided, this will be interpreted directly as the number of observations per cycle. If null, attempts to infer the frequency automatically.Also see this post by the author of the original tsfeatures package for more details on seasonality and the frequency parameter.

unit

string

default:"D"

Temporal unit to use. Only required for converting the time column to timestamps when it is numeric. Y=years, M=months, W=weeks, D=days, h=hours, m=minutes, s=seconds, ms=milliseconds, us=microseconds, ns=nanoseconds.Values must be one of the following:Y M W D h m s ms us ns

output

string

default:"wide"

Output format. The format of the output dataset. The following options are supported:

“wide”: One row per time series with features as multivalues (list) columns
“tall”: Features joined to the original data, preserving all rows.

Values must be one of the following:

wide
tall

n_jobs

integer

default:"-1"

Number of parallel jobs. If -1, all processors are used. If 1, no parallel computing code is used at all, which is useful for debugging. Using multiple processes with a large dataset may cause memory issues.Values must be in the following range:

-1 ≤ n_jobs < inf

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration