> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# pandas_func

> Applies an arbitrary pandas supported function to the values of an input column. 

Note, this is a somewhat advanced step. In particular, due to its generality, its parameters will not be
validated before execution, and so it is possible to call this step with parameters that will lead to
failure.

The function to be applied must be accesible as a method of a pandas `Series`.
For further detail see the corresponding [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html).
However, only functions compatible with the column's type should be used (not e.g. the function `sum` when
the input column contains texts). To ensure the correct type given a desired function, you may cast the input
column to a different type before applying the function (see the `in_type` parameter below).

Some additional functions specific to datetime, text and categorical columns are available under pandas'
`dt`, `str`, and `cat` [accessors](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#accessors).
See the `acc` parameter below.

Also, any function available in numpy's or pandas' global namespace (i.e. as `np.func` or `pd.func`), and which
transform a singe element (rather than a whole column), may be applied to the elements of the input using
`apply` as the `func` parameter, and the name of a specific function as the `elem_func` parameter.

Finally, the result of applying the desired function can be forced to a specific output type using the
`out_type` parameter.

See below examples for usage in the different scenarios.

## Usage

The following examples show how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      To create a column indicating whether a value in the input is missing or not

      ```stan theme={null}
      pandas_func(ds.input, {"func": "isna"}) -> (ds.output)
      ```
    </Tab>

    <Tab title="Example 2">
      Using a custom configuration to apply the `np.log` function to all (non-NaN) elements of a numeric(!) input column:

      ```stan theme={null}
      pandas_func(ds.input, {
        "func": "apply",
        "elem_func": "np.log",
      }) -> (ds.log_var)
      ```
    </Tab>

    <Tab title="Example 3">
      Cast a numeric input column to text, then use Pandas' `str` accessor to get the length of the number strings, i.e. the number of characters.

      ```stan theme={null}
      pandas_func(ds.input, {
        "in_type": "Text",
        "acc": "str",
        "func": "len",
        "out_type": "number"
      }) -> (ds.number_digits)
      ```
    </Tab>

    <Tab title="Example 4">
      To calculate the length of lists (note that in this case it would be simpler and better to use the dedicated step `length`):

      ```stan theme={null}
      pandas_func(ds.input_lists, {
        "func": "apply",
        "elem_func": "np.size"
      }) -> (ds.list_length)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      pandas_func(input: column, {
          "param": value,
          ...
      }) -> (output: column)
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="input" type="column" required>
    An arbitrary input column. But, note that the pandas function to be called has to be compatible with the data type of the input column!
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="output" type="column" required>
    The result of calling the specified Pandas function on the input series.
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="func" type="string" required>
    The name of a pandas function to be applied.
    Must be accesible as a method of a pandas [Series object](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html).
  </ParamField>

  <ParamField path="in_type" type="[string, null]">
    The semantic type to cast the input column to *before* calling the specified function `func`.

    Values must be one of the following:

    `Category` `Date` `Number` `Boolean` `Url` `Sex` `Text` `List[Number]` `List[Category]` `List[Url]` `List[Boolean]` `List[Date]` `number` `boolean` `url` `sex` `text` `list[number]` `list[category]` `list[url]` `list[boolean]` `list[date]`
  </ParamField>

  <ParamField path="out_type" type="string" required>
    The semantic type to cast the result to *after* calling the specified function `func`.

    Values must be one of the following:

    `category` `date` `number` `boolean` `url` `sex` `text` `list[number]` `list[category]` `list[url]` `list[boolean]` `list[date]`
  </ParamField>

  <ParamField path="acc" type="[string, null]">
    A pandas accessor used on the input column before calling the specified function `func`.
    For further information see [accessors](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#accessors).

    Values must be one of the following:

    * `str`
    * `dt`
    * `cat`
  </ParamField>

  <ParamField path="elem_func" type="[string, null]">
    When `func` is  `apply`, the name of a function to be applied to the *elements* of the input column.
  </ParamField>
</Accordion>
