Skip to main content

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Example 3
  • Signature
This example groups the dataset by an exact match on the category column and a date component (month level) on the date column, and then aggregates the count of sales and the sum of revenue:
group_by(ds, {
  "by": [
    { "by": "category", "groupingType": "EXACT" },
    { "by": "date", "groupingType": "DATE_COMPONENT", "param": { "component": "MONTH", "timezone": "UTC" } }
  ],
  "aggregations": [
    { "name": "total_sales", "on": "sales", "type": "COUNT" },
    { "name": "total_revenue", "on": "revenue", "type": "SUM" }
  ]
}) -> (ds_grouped)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
ds_in
dataset
required
The input dataset containing the columns to group by and apply aggregations on.
ds_out
dataset
required
A dataset containing the aggregated results based on the grouping operations.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

by
array
required
Columns to group by. An array specifying the columns used for grouping. The by parameter can be either:
  • An array of column names (e.g., ["column1", "column2"]), which defaults to EXACT grouping.
  • An array of objects with by, groupingType, optional name and optional param properties.
  • Column name
  • Grouping config
{_}
string (ds_in.column)
Column name to group by.
aggregations
array[object]
required
Aggregation functions to apply. An array specifying the aggregation functions to apply on each group. The array can be empty, in which case no aggregations are performed, but the dataset is still grouped by the specified columns.
name
string
Name of the output column.
on
[string, null]
Column on which the aggregation is applied. If null, the aggregation applies to the entire group.
type
string
Type of aggregation function. The type of aggregation function to perform on the specified column. Includes support for standard aggregations (e.g., SUM, COUNT) as well as element-wise aggregations. Notes:
  • PERCENT_OF_ROWS_WHERE: Computes the percentage within each group where a condition is true.
  • PERCENT_OF_ROWS: Computes the percentage relative to the total number of rows across all groups.
Values must be one of the following:COUNT MIN MAX SUM AVG VARIANCE STDEV FIRST LAST P25 P50 P75 COUNT_WHERE NUMBER_OF_ROWS NUMBER_OF_ROWS_WHERE PERCENT_OF_ROWS PERCENT_OF_ROWS_WHERE METRIC MODE UNIQUE_VALUES LIST_UNIQUE LIST CONCATENATE ELEMENT_COUNT ELEMENT_MIN ELEMENT_MAX ELEMENT_SUM ELEMENT_AVG ELEMENT_VARIANCE ELEMENT_STDEV ELEMENT_FIRST ELEMENT_LAST
params
object
Additional parameters for specific aggregations.
value
[number, string, null]
value. Additional value used for certain aggregation types (e.g., COUNT_WHERE, METRIC, or LIST for presorting).
query
string
The graphext advanced query used to identify the rows to select previous to the grouping.
I