featurize_time_series
Summarizes time series data into aggregate metrics.
Extracts features from time series data for machine learning or analysis. Supports three feature sets:
- catch22: 22 time series features, plus optional mean and standard deviation (24 total). See details about each feature here.
- tsfeatures: Statistical features including trend, seasonality, autocorrelation, etc. See details about each feature here.
- growth: Simple, average, compound, and linear growth metrics.
The step takes a dataset with time series data in “tall” format (one row per time point) or “wide” format (time points in columns), and produces a dataset with the calculated features.
Usage
The following example shows how the step can be used in a recipe.
To calculate all “growth” metrics:
To calculate all “growth” metrics:
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Column name containing the time series identifier.
Column name containing the timestamps.
Column name containing the values to featurize.
Feature sets to include.
E.g. “catch22” or “tsfeatures”. If no individual features are configured using the features
pararmeter,
all features from the selected set will be computed. If multiple sets are selected, all features from
each set will be computed. If all
is selected, all features from all sets will be computed.
Values must be one of the following:
catch22
tsfeatures
growth
all
Custom features to compute from each feature set.
Frequency to use by features in the TSFeatures set. Where applicable. The number of cycles in a single period. If null, attempts to infer the frequency. The following are the default cycles when the corresponding frequency is auto-detected:
- ‘H’: 24 (hourly)
- ‘D’: 1 (daily)
- ‘M’: 12 (monthly)
- ‘Q’: 4 (quarterly)
- ‘W’: 1 (weekly)
- ‘Y’: 1 (yearly).
unit. Unit to use for converting the time column to timestamps when it is numeric. Y=years, M=months, W=weeks, D=days, h=hours, m=minutes, s=seconds, ms=milliseconds, us=microseconds, ns=nanoseconds.
Values must be one of the following:
Y
M
W
D
h
m
s
ms
us
ns
output. Output format:
- “wide”: One row per time series with features as columns
- “tall”: Features joined to the original data, preserving all rows.
Values must be one of the following:
wide
tall
Was this page helpful?