Check the row-wise equality of all input columns.
For each row, checks whether all values in that row are equal. The result is a boolean column
indicating equality for each row as true
or false
.
Note that if the types of input columns are not compatible, the result will be False
for all
rows. Compatibility here means that input columns must be
By default, missing values (NaNs) in the same location are considered equal in this step. However,
check the parameter keep_nans
below to control how the presence of NaNs affects the result.
Also, when performing numeric comparison, the parameters rel_tol
and abs_tol
can be used to check
for approximate equality. The desired tolerance (precision) can then be expressed either as a
proportion of a reference value; and/or as an absolute maximum difference). More specifically,
the equation used to check for numeric equality between values a
and b
is:
absolute(a - b) <= (rel_tol * absolute(b) + abs_tol)
.
Also see the parameter descriptions below, or the corresponding numpy documentation for further details.
The following examples show how the step can be used in a recipe.
Examples
To check exact equality of numeric columns num1
and num2
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
One or more columns to check for equality.
Outputs
Output column indicating row-wise equality of the input columns.
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Parameters
Absolute tolerance. The absolute (positive) difference of two numbers must be smaller than or equal to this value for them to be considered equal.
Relative tolerance.
The absolute (positive) difference of two numbers a
and b
must be smaller than or equal
to rel_tol * absolute(b)
for them to be considered equal.
Whether to maintain missing values (NaNs) in the result.
The possible values are {true, false, "any", "all"}
:
If false
: use default NaN comparison. I.e. NaN == value => false
but NaN == NaN => true
.
Note that this means the result will never contain any NaNs.
If true
or any
: the result will be NaN if any value in a row is NaN
If all
: the result will be NaN if all values in a row are NaN.
Values must be one of the following:
any
all
True
False