- catch22: 22 time series features, plus optional mean and standard deviation (24 total). See details about each feature here.
- tsfeatures: Statistical features including trend, seasonality, autocorrelation, etc. See details about each feature here.
- growth: Simple, average, compound, and linear growth metrics.
Usage
The following example shows how the step can be used in a recipe.Examples
Examples
- Example 1
- Signature
To calculate all “growth” metrics:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced
by name e.g. "churn-clf").
Inputs
Inputs
A dataset containing time series.
Outputs
Outputs
A dataset containing time series features.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output).
Parameters
Parameters
Column name containing the time series identifier.
Column name containing the timestamps.
Column name containing the values to featurize.
Feature sets to include.
E.g. “catch22” or “tsfeatures”. If no individual features are configured using the
features pararmeter,
all features from the selected set will be computed. If multiple sets are selected, all features from
each set will be computed. If all is selected, all features from all sets will be computed.Values must be one of the following:catch22tsfeaturesgrowthall
Array items
Array items
Each item in array.Values must be one of the following:
catch22tsfeaturesgrowthall
Custom features to compute from each feature set.
Properties
Properties
Catch22 features to compute.
See here
for detailed information about each possible feature.
Array items
Array items
Each item should be a name of a Catch22 feature.Values must be one of the following:
mode_5 mode_10 acf_timescale acf_first_min ami2 trev high_fluctuation stretch_high transition_matrix periodicity embedding_dist ami_timescale whiten_timescale outlier_timing_pos outlier_timing_neg centroid_freq stretch_decreasing entropy_pairs rs_range dfa low_freq_power forecast_error mean SDTSFeatures features to compute.
See here
for detailed information about each possible feature.
Array items
Array items
Each item should be the name of a TSFeature feature.Values must be one of the following:
acf_features arch_stat crossing_points entropy flat_spots heterogeneity holt_parameters lumpiness nonlinearity pacf_features stl_features stability hw_parameters unitroot_kpss unitroot_pp series_length hurstGrowth features to compute.
The different growth features are calculated as follows, where is the final value,
is the initial value, and is the number of periods in a time series.
"simple"Factional change between first and last value. Maintains direction of growth by dividing the change
by the absolute value of the initial value:"average"The average fraction of change between consecutive values. Also maintains direction,
unlike e.g. pandas pct_change function:"compound"Analogous to CAGR (Compound Annual Growth Rate).
The average growth rate over the entire period, assuming the growth is compounded:"linear"Fits a linear regression to the time series and returns the slope of the line.Array items
Array items
Each item in array.Values must be one of the following:
simpleaveragecompoundlinear
Examples
Examples
- E.g. deriving two features from catch22 and growth sets each:
Frequency to use by features in the TSFeatures set.
The number of observations in a single cycle. Used by certain features (for now only in the tsfeatures set),
that are based on seasonality. When a string (character) is provided, this is interpreted as the natural frequency
of the time series and will be translated to the number of observations per cycle using the following mapping:
- ‘H’: 24 (hourly)
- ‘D’: 1 (daily)
- ‘M’: 12 (monthly)
- ‘Q’: 4 (quarterly)
- ‘W’: 1 (weekly)
- ‘Y’: 1 (yearly)
null, attempts to
infer the frequency automatically.Also see this post by the author of
the original tsfeatures package for more details on seasonality and the frequency parameter.Temporal unit to use.
Only required for converting the time column to timestamps when it is numeric.
Y=years, M=months, W=weeks, D=days, h=hours, m=minutes, s=seconds,
ms=milliseconds, us=microseconds, ns=nanoseconds.Values must be one of the following:
Y M W D h m s ms us nsOutput format.
The format of the output dataset. The following options are supported:
- “wide”: One row per time series with features as multivalues (list) columns
- “tall”: Features joined to the original data, preserving all rows.
widetall
Number of parallel jobs.
If -1, all processors are used. If 1, no parallel computing code is used at all,
which is useful for debugging. Using multiple processes with a large dataset may
cause memory issues.Values must be in the following range: