Skip to content

Filter with formula

Filter rows using a (pandas-compatible) formula.

Allowed elements in the fomula are column names as well as common operators and values for comparison (strings need to be specified using single quotes, see example below).

For more details about valid formulas see here.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
filter_with_formula(ds_in: dataset, {
    "param": value
}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

The first example keeps those rows where the "salary" column is either "low" or "high":

Example call (in recipe editor)
filter_with_formula(ds, {
  "formula": "salary == 'low' or salary == 'high'"
  }) -> (ds_filtered)
More examples

The next example drops those rows where the column "number_project" is less than 3 or greater than 4, i.e. it keeps values in the range [3, 4] only.

Example call (in recipe editor)
filter_with_formula(ds, {
  "formula": "number_project >= 3 and number_project <= 4"
}) -> (ds_filtered)

Inputs


ds_in: dataset

An input dataset to filter.

Outputs


ds_out: dataset

A new dataset containing the same columns as the input dataset but only those rows passing the filter query.

Parameters


formula: string

A formula describing the matching operation to perform on each row.

Example parameter values:

  • "salary == 'low' or salary == 'high'"
  • "number_project >= 3 and number_project <= 4"