filter_containing
Filter rows containing any or all of a number of specified values.
Includes or excludes rows of the input datset based on the values of a selected text or list column. Depending on the configuration, if the column contains any or all of the specified values, the corresponding rows will be kept or dropped in the output dataset.
“Containment” here means texts in a text column containing one or more specified substrings (words), or lists in a list column containing one or more elements matching the specified values. See below for illustrative examples.
Usage
The following examples show how the step can be used in a recipe.
E.g., to keep only those rows whose values in the “address” column contain the text string “Madrid”:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Name of column to be matched against the specified values
.
Values to be matched in each row to decide its inclusion or exclusion. May be a single value or a list of values to be matched.
If true
, matching rows will be excluded from the output dataset.
I.e., only rows not containing the specified values will be returned.
Rows must contain all specified value to pass filter, rather than any.
Text values must match case to pass filter.
Was this page helpful?