Skip to main content
Includes or excludes rows of the input datset based on the values of a selected text or list column. Depending on the configuration, if the column contains any or all of the specified values, the corresponding rows will be kept or dropped in the output dataset. “Containment” here means texts in a text column containing one or more specified substrings (words), or lists in a list column containing one or more elements matching the specified values. See below for illustrative examples.

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Signature
E.g., to keep only those rows whose values in the “address” column contain the text string “Madrid”:
filter_containing(ds, {"column": "address", "values": ["Madrid"]}) -> (ds_filtered)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
ds_in
dataset
required
An input dataset to filter.
ds_out
dataset
required
A dataset containing the same columns as the input dataset, but including or excluding the matched rows.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

column
string (ds_in.column:text|list)
required
Name of column to be matched against the specified values.
values
[number, string, array[['number', 'string']]]
required
Values to be matched in each row to decide its inclusion or exclusion. May be a single value or a list of values to be matched.
Item
[number, string]
Each item in array.
  • the
  • [‘the’, ‘cat’]
  • 2
  • [2, 3]
exclude
boolean
default:"false"
If true, matching rows will be excluded from the output dataset. I.e., only rows not containing the specified values will be returned.
contains_all
boolean
default:"false"
Rows must contain all specified value to pass filter, rather than any.
case_sensitive
boolean
default:"true"
Text values must match case to pass filter.
I