pandas_func
Applies an arbitrary pandas supported function to the values of an input column.
Note, this is a somewhat advanced step. In particular, due to its generality, its parameters will not be validated before execution, and so it is possible to call this step with parameters that will lead to failure.
The function to be applied must be accesible as a method of a pandas Series
.
For further detail see the corresponding pandas documentation.
However, only functions compatible with the column’s type should be used (not e.g. the function sum
when
the input column contains texts). To ensure the correct type given a desired function, you may cast the input
column to a different type before applying the function (see the in_type
parameter below).
Some additional functions specific to datetime, text and categorical columns are available under pandas’
dt
, str
, and cat
accessors.
See the acc
parameter below.
Also, any function available in numpy’s or pandas’ global namespace (i.e. as np.func
or pd.func
), and which
transform a singe element (rather than a whole column), may be applied to the elements of the input using
apply
as the func
parameter, and the name of a specific function as the elem_func
parameter.
Finally, the result of applying the desired function can be forced to a specific output type using the
out_type
parameter.
See below examples for usage in the different scenarios.
Usage
The following examples show how the step can be used in a recipe.
To create a column indicating whether a value in the input is missing or not
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
The name of a pandas function to be applied. Must be accesible as a method of a pandas Series object.
The semantic type to cast the input column to before calling the specified function func
.
Values must be one of the following:
Category
Date
Number
Boolean
Url
Sex
Text
List[Number]
List[Category]
List[Url]
List[Boolean]
List[Date]
number
boolean
url
sex
text
list[number]
list[category]
list[url]
list[boolean]
list[date]
The semantic type to cast the result to after calling the specified function func
.
Values must be one of the following:
category
date
number
boolean
url
sex
text
list[number]
list[category]
list[url]
list[boolean]
list[date]
A pandas accessor used on the input column before calling the specified function func
.
For further information see accessors.
Values must be one of the following:
str
dt
cat
When func
is apply
, the name of a function to be applied to the elements of the input column.
Was this page helpful?