Note, this is a somewhat advanced step. In particular, due to its generality, its parameters will not be validated before execution, and so it is possible to call this step with parameters that will lead to failure.

The function to be applied must be accesible as a method of a pandas Series. For further detail see the corresponding pandas documentation. However, only functions compatible with the column’s type should be used (not e.g. the function sum when the input column contains texts). To ensure the correct type given a desired function, you may cast the input column to a different type before applying the function (see the in_type parameter below).

Some additional functions specific to datetime, text and categorical columns are available under pandas’ dt, str, and cat accessors. See the acc parameter below.

Also, any function available in numpy’s or pandas’ global namespace (i.e. as np.func or pd.func), and which transform a singe element (rather than a whole column), may be applied to the elements of the input using apply as the func parameter, and the name of a specific function as the elem_func parameter.

Finally, the result of applying the desired function can be forced to a specific output type using the out_type parameter.

See below examples for usage in the different scenarios.

Usage

The following examples show how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).