Skip to content

Equal

math ยท compare

Check the row-wise equality of all input columns.

For each row, checks whether all values in that row are equal. The result is a boolean column indicating equality for each row as True or False.

Note that if the types of input columns are not compatible, the result will be False for all rows. Compatibility here means that input columns must be

  • all numeric or boolean (the latter being interpreted as 0.0/1.0), OR
  • all string-like (categorical or text), OR
  • all list-like

By default, missing values (NaNs) in the same location are considered equal in this step. However, check the parameter keep_nans below to control how the presence of NaNs affects the result.

Also, when performing numeric comparison, the parameters rel_tol and abs_tol can be used to check for approximate equality. The desired tolerance (precision) can then be expressed either as a proportion of a reference value; and/or as an absolute maximum difference). More specifically, the equation used to check for numeric equality between values a and b is:

absolute(a - b) <= (rel_tol * absolute(b) + abs_tol).

Also see the parameter descriptions below, or the corresponding numpy documentation for further details.

Example

To check exact equality of numeric columns num1 and num2

equal(ds.num1, ds.num2) -> (ds.num1_num2_eq)
More examples

To check approximate equality of numeric columns num1 and num2, with differences of less than 0.001 being considered "equal" use the abs_tol parameter, (note that for reasons of limited precision in how numbers are stored it would be safer to use e.g. 0.0011 or even 0.002 to approximate equality to three decimals):

equal(ds.num1, ds.num2, {"abs_tol": 0.001}) -> (ds.aprox_eq)

Usage

The following are the step's expected inputs and outputs and their specific types.

equal(*columns: column, {"param": value}) -> (result: boolean)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


*columns: column

One or more columns to check for equality.

Outputs


result: column:boolean

Output column indicating row-wise equality of the input columns.

Parameters


abs_tol: number = 0

Absolute tolerance. The absolute (positive) difference of two numbers must be smaller than or equal to this value for them to be considered equal.


rel_tol: number = 0

Relative tolerance. The absolute (positive) difference of two numbers a and b must be smaller than or equal to rel_tol * absolute(b) for them to be considered equal.


keep_nans: boolean | string

Whether to maintain missing values (NaNs) in the result. The possible values are {true, false, "any", "all"}:

  • If false: use default NaN comparison. I.e. NaN == value => False but NaN == NaN => True. Note that this means the result will never contain any NaNs.

  • If true or any: the result will be NaN if any value in a row is NaN

  • If all: the result will be NaN if all values in a row are NaN.

Must be one of: "any", "all", True, False