Is missing¶
fast step missing values • NaN
Check for missing values in a given column.
This step checks each row of the input column to determine if the value is missing (null or NaN). The result is a new boolean column, where each row indicates whether the corresponding element in the input column is missing.
The step can work with single-valued and multi-valued columns, and the output can be configured to be either boolean (true/false), numeric (0/1) or categorical (custom labels).
- For single-valued columns: Each row in the output column will be
true
if the corresponding value in the input column is missing, andfalse
otherwise. - For multivalued columns: Each row in the output column will be
true
if the corresponding sub-list in the input column is empty, andfalse
otherwise.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
is_missing(column: column, {
"param": value
}) -> (result: column)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
Check for missing values in a numeric column.
is_missing(ds.numeric_col) -> (ds.numeric_col_missing)
More examples
Check for missing values in a text column and set output type to numeric.
is_missing(ds.string_col, {"out_type": "number"}) -> (ds.string_col_missing)
Inputs¶
column: column
The input column to check for missing values.
Outputs¶
result: column
The output column indicating the presence of missing values in the input column.
Parameters¶
out_type: string = "boolean"
Output type. The data type of the output column. - 'boolean': Output is true/false indicating missing or not. - 'number': Output is 0/1 indicating missing or not. - 'category': Output is specified by params["labels"]["true"] and params["labels"]["false"].
Must be one of:
"boolean"
,
"number"
,
"category"
labels: object
Labels for the true and false categories. An object mapping the "true" and "false" categories to custom labels.
Items in labels
True: string = True
Label for the "true" category.
False: string = False
Label for the "false" category.