Skip to content

Filter with formula

Filter rows using a (pandas-compatible) formula.

Allowed elements in the fomula are column names as well as common operators and values for comparison (strings need to be specified using single quotes, see example below).

For more details about valid formulas see here.

Example

The first example keeps those rows where the "salary" column is either "low" or "high":

filter_with_formula(ds, {
  "formula": "salary == 'low' or salary == 'high'"
  }) -> (ds_filtered)
More examples

The next example drops those rows where the column "number_project" is less than 3 or greater than 4, i.e. it keeps values in the range [3, 4] only.

filter_with_formula(ds, {
  "formula": "number_project >= 3 and number_project <= 4"
}) -> (ds_filtered)

Usage

The following are the step's expected inputs and outputs and their specific types.

filter_with_formula(ds_in: dataset, {"param": value}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


ds_in: dataset

An input dataset to filter.

Outputs


ds_out: dataset

A new dataset containing the same columns as the input dataset but only those rows passing the filter query.

Parameters


formula: string

A formula describing the matching operation to perform on each row.

Example parameter values:

  • "salary == 'low' or salary == 'high'"
  • "number_project >= 3 and number_project <= 4"