Filter with formula¶
Filter rows using a (pandas-compatible) formula.
Allowed elements in the fomula are column names as well as common operators and values for comparison (strings need to be specified using single quotes, see example below).
For more details about valid formulas see here.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
filter_with_formula(ds_in: dataset, {"param": value}) -> (ds_out: dataset)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
The first example keeps those rows where the "salary" column is either "low" or "high":
filter_with_formula(ds, {
"formula": "salary == 'low' or salary == 'high'"
}) -> (ds_filtered)
More examples
The next example drops those rows where the column "number_project" is less than 3 or greater than 4, i.e. it keeps values in the range [3, 4] only.
filter_with_formula(ds, {
"formula": "number_project >= 3 and number_project <= 4"
}) -> (ds_filtered)
Inputs¶
ds_in: dataset
An input dataset to filter.
Outputs¶
ds_out: dataset
A new dataset containing the same columns as the input dataset but only those rows passing the filter query.
Parameters¶
formula: string
A formula describing the matching operation to perform on each row.
Example parameter values:
"salary == 'low' or salary == 'high'"
"number_project >= 3 and number_project <= 4"