Skip to content

Pandas func

Applies an arbitrary pandas supported function to the values of an input column.

Note, this is a somewhat advanced step. In particular, due to its generality, its parameters will not be validated before execution, and so it is possible to call this step with parameters that will lead to failure.

The function to be applied must be accesible as a method of a pandas Series. For further detail see the corresponding pandas documentation. However, only functions compatible with the column's type should be used (not e.g. the function sum when the input column contains texts). To ensure the correct type given a desired function, you may cast the input column to a different type before applying the function (see the in_type parameter below).

Some additional functions specific to datetime, text and categorical columns are available under pandas' dt, str, and cat accessors. See the acc parameter below.

Also, any function available in numpy's or pandas' global namespace (i.e. as np.func or pd.func), and which transform a singe element (rather than a whole column), may be applied to the elements of the input using apply as the func parameter, and the name of a specific function as the elem_func parameter.

Finally, the result of applying the desired function can be forced to a specific output type using the out_type parameter.

See below examples for usage in the different scenarios.


To create a column indicating whether a value in the input is missing or not

pandas_func(ds.input, {"func": "isna"}) -> (ds.output)
More examples

Using a custom configuration to apply the np.log function to all (non-NaN) elements of a numeric(!) input column:

pandas_func(ds.input, {
  "func": "apply",
  "elem_func": "np.log",
}) -> (ds.log_var)

Cast a numeric input column to text, then use Pandas' str accessor to get the length of the number strings, i.e. the number of characters.

pandas_func(ds.input, {
  "in_type": "Text",
  "acc": "str",
  "func": "len",
  "out_type": "number"
}) -> (ds.number_digits)

To calculate the length of lists (note that in this case it would be simpler and better to use the dedicated step length):

pandas_func(ds.input_lists, {
  "func": "apply",
  "elem_func": "np.size"
}) -> (ds.list_length)


The following are the step's expected inputs and outputs and their specific types.

pandas_func(input, {"param": value}) -> (output)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


input: column

An arbitrary input column. But, note that the pandas function to be called has to be compatible with the data type of the input column!


output: column

The result of calling the specified Pandas function on the input series.


Any additional parameters not mentioned explicitly below will be passed on as arguments to the specified function func.

func: string

The name of a pandas function to be applied. Must be accesible as a method of a pandas Series object.

in_type: string | null

The semantic type to cast the input column to before calling the specified function func

Must be one of: "Category", "Date", "Number", "Boolean", "Url", "Sex", "Text", "List[Number]", "List[Category]", "List[Url]", "List[Boolean]", "List[Date]", "number", "boolean", "url", "sex", "text", "list[number]", "list[category]", "list[url]", "list[boolean]", "list[date]"

out_type: string

The semantic type to cast the result to after calling the specified function func

Must be one of: "category", "date", "number", "boolean", "url", "sex", "text", "list[number]", "list[category]", "list[url]", "list[boolean]", "list[date]"

acc: string | null

A pandas accessor used on the input column before calling the specified function func For further information see accessors.

Must be one of: "str", "dt", "cat"

elem_func: string | null

When func is apply, the name of a function to be applied to the elements of the input column.