filter_missing
Filter rows based on missing values in one or more columns.
By default keeps only those rows where values in selected columns are not missing (non-NaNs). Using the exclude
parameter, the row selection can be inverted, such that only rows with missing values in selected rows
will be returned.
Usage
The following example shows how the step can be used in a recipe.
Examples
Examples
To keep only those rows where neither “address” nor “name” is missing
To keep only those rows where neither “address” nor “name” is missing
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
An input dataset to filter.
Outputs
Outputs
A dataset containing the same columns as the input dataset but including or excluding the matched rows.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Names of columns used to detect and filter rows containing missing values.
Array items
Array items
Each item in array.
if true
, rows with non-missing values will be excluded.
I.e., only rows containing missing values in the selected columns will be included in the resulting dataset.