is_missing
Check for missing values in a given column.
This step checks each row of the input column to determine if the value is missing (null or NaN). The result is a new boolean column, where each row indicates whether the corresponding element in the input column is missing.
The step can work with single-valued and multi-valued columns, and the output can be configured to be either boolean (true/false), numeric (0/1) or categorical (custom labels).
- For single-valued columns: Each row in the output column will be
true
if the corresponding value in the input column is missing, andfalse
otherwise. - For multivalued columns: Each row in the output column will be
true
if the corresponding sub-list in the input column is empty, andfalse
otherwise.
Usage
The following examples show how the step can be used in a recipe.
Examples
Examples
Check for missing values in a numeric column.
Check for missing values in a numeric column.
Check for missing values in a text column and set output type to numeric.
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
The input column to check for missing values.
Outputs
Outputs
The output column indicating the presence of missing values in the input column.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Output type. The data type of the output column.
- ‘boolean’: Output is true/false indicating missing or not.
- ‘number’: Output is 0/1 indicating missing or not.
- ‘category’: Output is specified by params[“labels”][“true”] and params[“labels”][“false”].
Values must be one of the following:
boolean
number
category