Usage
The following examples show how the step can be used in a recipe.Examples
Examples
This example groups the dataset by an exact match on the
category
column and a date component (month level) on the date
column, and then aggregates the count of sales
and the sum of revenue
:Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
The input dataset containing the columns to group by and apply aggregations on.
Outputs
Outputs
A dataset containing the aggregated results based on the grouping operations.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Columns to group by.
An array specifying the columns used for grouping. The
by
parameter can be either:- An array of column names (e.g.,
["column1", "column2"]
), which defaults toEXACT
grouping. - An array of objects with
by
,groupingType
, optionalname
and optionalparam
properties.
Array items
Array items
Column name to group by.
Aggregation functions to apply.
An array specifying the aggregation functions to apply on each group.
The array can be empty, in which case no aggregations are performed, but the dataset is still grouped by the specified columns.
Array items
Array items
Name of the output column.
Column on which the aggregation is applied.
If null, the aggregation applies to the entire group.
Type of aggregation function.
The type of aggregation function to perform on the specified column.
Includes support for standard aggregations (e.g.,
SUM
, COUNT
) as well as element-wise aggregations.
Notes:PERCENT_OF_ROWS_WHERE
: Computes the percentage within each group where a condition is true.PERCENT_OF_ROWS
: Computes the percentage relative to the total number of rows across all groups.
COUNT
MIN
MAX
SUM
AVG
VARIANCE
STDEV
FIRST
LAST
P25
P50
P75
COUNT_WHERE
NUMBER_OF_ROWS
NUMBER_OF_ROWS_WHERE
PERCENT_OF_ROWS
PERCENT_OF_ROWS_WHERE
METRIC
MODE
UNIQUE_VALUES
LIST_UNIQUE
LIST
CONCATENATE
ELEMENT_COUNT
ELEMENT_MIN
ELEMENT_MAX
ELEMENT_SUM
ELEMENT_AVG
ELEMENT_VARIANCE
ELEMENT_STDEV
ELEMENT_FIRST
ELEMENT_LAST
The graphext advanced query used to identify the rows to select previous to the grouping.