Filter missing¶
missing data • NaN
Filter rows based on missing values in one or more columns.
By default keeps only those rows where values in selected columns are not missing (non-NaNs). Using the exclude
parameter, the row selection can be inverted, such that only rows with missing values in selected rows
will be returned.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
filter_missing(ds_in: dataset, {"param": value}) -> (ds_out: dataset)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
To keep only those rows where neither "address" nor "name" is missing
filter_missing(ds, {"columns": ["address", "name"]} -> (ds_filtered)
Inputs¶
ds_in: dataset
An input dataset to filter.
Outputs¶
ds_out: dataset
A dataset containing the same columns as the input dataset but including or excluding the matched rows.
Parameters¶
columns: array[string]
Names of columns used to detect and filter rows containing missing values.
Items in columns
item: string
exclude: boolean = False
if true
, rows with non-missing values will be excluded. I.e., only rows containing missing values in the selected columns will be included in the resulting dataset.