Usage
The following examples show how the step can be used in a recipe.Examples
Examples
- Example 1
- Example 2
- Example 3
- Signature
This example groups the dataset by an exact match on the
category column and a date component (month level) on the date column, and then aggregates the count of sales and the sum of revenue:Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced
by name e.g. "churn-clf").
Inputs
Inputs
The input dataset containing the columns to group by and apply aggregations on.
Outputs
Outputs
A dataset containing the aggregated results based on the grouping operations.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output).
Parameters
Parameters
Columns to group by.
An array specifying the columns used for grouping. The
by parameter can be either:- An array of column names (e.g.,
["column1", "column2"]), which defaults toEXACTgrouping. - An array of objects with
by,groupingType, optionalnameand optionalparamproperties.
Array items
Array items
- Column name
- Grouping config
Column name to group by.
Aggregation functions to apply.
An array specifying the aggregation functions to apply on each group.
The array can be empty, in which case no aggregations are performed, but the dataset is still grouped by the specified columns.
Array items
Array items
Name of the output column.
Column on which the aggregation is applied.
If null, the aggregation applies to the entire group.
Type of aggregation function.
The type of aggregation function to perform on the specified column.
Includes support for standard aggregations (e.g.,
SUM, COUNT) as well as element-wise aggregations.
Notes:PERCENT_OF_ROWS_WHERE: Computes the percentage within each group where a condition is true.PERCENT_OF_ROWS: Computes the percentage relative to the total number of rows across all groups.
COUNT MIN MAX SUM AVG VARIANCE STDEV FIRST LAST P25 P50 P75 COUNT_WHERE NUMBER_OF_ROWS NUMBER_OF_ROWS_WHERE PERCENT_OF_ROWS PERCENT_OF_ROWS_WHERE METRIC MODE UNIQUE_VALUES LIST_UNIQUE LIST CONCATENATE ELEMENT_COUNT ELEMENT_MIN ELEMENT_MAX ELEMENT_SUM ELEMENT_AVG ELEMENT_VARIANCE ELEMENT_STDEV ELEMENT_FIRST ELEMENT_LASTThe graphext advanced query used to identify the rows to select previous to the grouping.