group_by
Group data by specified columns and apply aggregation functions to each group.
Usage
The following examples show how the step can be used in a recipe.
This example groups the dataset by an exact match on the category
column and a date component (month level) on the date
column, and then aggregates the count of sales
and the sum of revenue
:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Columns to group by.
An array specifying the columns used for grouping. The by
parameter can be either:
- An array of column names (e.g.,
["column1", "column2"]
), which defaults toEXACT
grouping. - An array of objects with
by
,groupingType
, optionalname
and optionalparam
properties.
Aggregation functions to apply. An array specifying the aggregation functions to apply on each group. The array can be empty, in which case no aggregations are performed, but the dataset is still grouped by the specified columns.
The graphext advanced query used to identify the rows to select previous to the grouping.
Was this page helpful?