Skip to content

Filter with formula

Filter rows using a (pandas-compatible) formula.

Allowed elements in the fomula are column names as well as common operators and values for comparison (strings need to be specified using single quotes, see example below).

For more details about valid formulas see here.


The following are the step's expected inputs and outputs and their specific types.

Step signature
filter_with_formula(ds_in: dataset, {"param": value}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


The first example keeps those rows where the "salary" column is either "low" or "high":

Example call (in recipe editor)
filter_with_formula(ds, {
  "formula": "salary == 'low' or salary == 'high'"
  }) -> (ds_filtered)
More examples

The next example drops those rows where the column "number_project" is less than 3 or greater than 4, i.e. it keeps values in the range [3, 4] only.

Example call (in recipe editor)
filter_with_formula(ds, {
  "formula": "number_project >= 3 and number_project <= 4"
}) -> (ds_filtered)


ds_in: dataset

An input dataset to filter.


ds_out: dataset

A new dataset containing the same columns as the input dataset but only those rows passing the filter query.


formula: string

A formula describing the matching operation to perform on each row.

Example parameter values:

  • "salary == 'low' or salary == 'high'"
  • "number_project >= 3 and number_project <= 4"