Converts raw timestamped events, or contiguous time series data, from their original frequency to daily, weekly, monthly, quarterly, yearly or other frequencies. Essentially, groups and aggregates each time or event series by the specified time period and applies desired aggregations (count of events, total spend etc.).

The step accepts input data, and can generate output data, in both a tall and a wide format:

  • Tall format
    Each row represents a single event or observation, and the dataset contains scalar columns for the event’s timestamp as well as for identifying the series, customer or entity the event belongs to. This is the most common format for event data and the most probable to have been imported in Graphext.

  • Wide format
    Each row represents a single entity (customer), and the dataset contains columns of lists containing the event timestamps and values for each series or observation. In this case, all lists in the same row must have the same length. This format is the most convenient for analysis in Graphext, as it allows for easy exploration of the time series data. You maintain one row per customer (entity), instead of duplicating the customer’s information for each event, yet you can still access, plot and generally work with all of the customer’s time series.

Note that both formats have the same number of columns. The difference is that in the “tall” format, each row represents a single event, while in the “wide” format, each row represents a single entity and contains all its events. You can think of the wide format as the result of aggregating by the time series identifier, and collecting the timestamps and values in parallel columns of lists.

You can use the parameters below to configure the frequency to resample the data to, the format of the output dataset (tall vs wide), whether to fill gaps in the resampled data etc.

Usage

The following example shows how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).