> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# aggregate_neighbours

> For each node in a network, group and aggregate over its neighbours. 

Using the link columns in the provided dataset (including at least a targets columns containing
lists of target row numbers that each row connects to), for each row calculate requested aggregations
over all its direct (first-degree) neighbours.

Will use the first set of link columns encountered in the datasets metadata.

## Usage

The following example shows how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      Assuming a dataset `products` where each row represents a supermarket product (having at least a `price` and `aisle` column),
      and containing a targets column dataset representing connections between similar products, the following example calculates for
      each product

      * the average price of similar products
      * the percentage of similar products assigned to aisles "produce", "deli" and "drinks"

      ```stan theme={null}
      aggregate_neighbours(products, {
        "aggregations": {
          "price": {
            "similar_price_avg": {"func": "mean"}
          },
          "aisle": {
            "similar_pct_produce": {"func": "percent_where", "value": "produce"},
            "similar_pct_deli": {"func": "percent_where", "value": "deli"},
            "similar_pct_drinks": {"func": "percent_where", "value": "drinks"}
          }
        }
      }) -> (products_agg)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      aggregate_neighbours(ds_in: dataset, {
          "param": value,
          ...
      }) -> (ds_out: dataset)
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="ds_in" type="dataset" required>
    A dataset containing the nodes (rows) to group and aggregate, and its corresponding links.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="ds_out" type="dataset" required>
    The original dataset plus newly aggregated columns. Will have one column per specified aggregation function
    (more than one aggregation can be specified for each original input column).
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="presort" type="object">
    Pre-aggregation row sorting.
    Sort the dataset rows before aggregating, e.g. when in a particular aggregation function (such as `list`) the encountered order is important.

    <Accordion title="Properties">
      <ParamField path="columns" type="[null, string, array]">
        The sort column name(s).
        These column(s) will be used to sort the dataset before aggregating (if multiple, in specified order).
        E.g. to first sort links by their weight, and if the weight column is called "gx\_weight", use `"gx_weight"`

        <Accordion title="Options">
          <Tabs>
            <Tab title="No sorting">
              <ParamField path="{_}" type="null">
                When `null`, no sorting is applied.
              </ParamField>
            </Tab>

            <Tab title="Single column">
              <ParamField path="{_}" type="string (ds_in.column)">
                Single column name. Sort by this column.
              </ParamField>
            </Tab>

            <Tab title="Multiple columns">
              <ParamField path="{_}" type="array[string]">
                List of column names. Orders by the first column, then the second, etc.

                <Accordion title="Array items">
                  <ParamField path="Item" type="string (ds_in.column)">
                    Each item in array.
                  </ParamField>
                </Accordion>
              </ParamField>
            </Tab>
          </Tabs>
        </Accordion>

        <Accordion title="Examples">
          * date\_added
          * \['lastname', 'firstname']
        </Accordion>
      </ParamField>

      <ParamField path="ascending" type="boolean" default="true">
        Whether to sort in ascending order (or in descending order if false).
      </ParamField>
    </Accordion>

    <Accordion title="Examples">
      * For example, to sort first by price, then dimension, and in descending order:

      ```json theme={null}
      {
      "columns": ["price", "dimension"],
      "ascending": false
      }
      ```
    </Accordion>
  </ParamField>

  <ParamField path="aggregations" type="object" required>
    Definition of desired aggregations.
    A dictionary mapping original columns to new aggregated columns, specifying an aggregation function for each.
    *Aggregations* are functions that reduce all the values in a particular column of a single group to a single summary value of that group.
    E.g. a `sum` aggregation of column A calculates a single total by adding up all the values in A belonging to each group.

    Possible aggregations functions accepted as `func` parameters are:

    * `n`, `size` or `count`: calculate number of rows in group
    * `sum`: sum total of values
    * `mean`: take mean of values
    * `max`: take max of values
    * `min`: take min of values
    * `mode`: find most frequent value (returns first mode if multiple exist)
    * `first`: take first item found
    * `last`: take last item found
    * `unique`: collect a list of unique values
    * `n_unique`: count the number of unique values
    * `list`: collect a list of all values
    * `concatenate`: convert all values to text and concatenate them into one long text
    * `concat_lists`: concatenate lists in all rows into a single larger list
    * `count_where`: number of rows in which the column matches a value, needs parameter `value` with the value that you want to count
    * `percent_where`: percentage of the column where the column matches a value, needs parameter `value` with the value that you want to count

    Note that in the case of `count_where` and `percent_where` an additional `value` parameter is required.

    <Accordion title="Item properties">
      <ParamField path="input_aggregations" type="object">
        One item per input column.
        Each key should be the name of an input column, and each value an object defining one or more aggregations for that column.
        An individual aggregation consists of the name of a desired output column, mapped to a specific aggregation function.
        For example:

        ```json theme={null}
        {
        "input_col": {
        "output_col": {"func": "sum"}
        }
        }
        ```

        <Accordion title="Item properties">
          <ParamField path="aggregation_func" type="object">
            Object defining how to aggregate a single output column.
            Needs at least the `"func"` parameter. If the aggregation function accepts further arguments,
            like the `"value"` parameter in case of `count_where` and `percent_where`, these need to be provided also.
            For example:

            ```json theme={null}
            {
            "output_col": {"func": "count_where", "value": 2}
            }
            ```

            <Accordion title="Properties">
              <ParamField path="func" type="string">
                Aggregation function.

                Values must be one of the following:

                `n` `size` `count` `sum` `mean` `n_unique` `count_where` `percent_where` `concatenate` `max` `min` `first` `last` `mode` `concat_lists` `unique` `list`
              </ParamField>
            </Accordion>
          </ParamField>
        </Accordion>
      </ParamField>
    </Accordion>

    <Accordion title="Examples">
      * Including an aggregation function with additional parameters:

      ```json theme={null}
      {
      "product_id": {
      "products": {"func": "list"},
      "size": {"func": "count"}
      },
      "item_total": {
      "total": {"func": "sum"},
      },
      "item_category": {
      "num_food_items": {"func": "count_where", "value": "food"}
      }
      }
      ```
    </Accordion>
  </ParamField>

  <ParamField path="directed" type="boolean" default="false">
    Whether the links provided should be interpreted as being directed.
    *Directed* here meaning that the link A→B (from node A to B) may be different from the link B→A (i.e. they may
    have different weight attributes for example). When `"directed": false`, in contrast, i.e. links are *undirected*,
    it is assumed that the link A→B is always identical to B→A (i.e. A↔B always). This is usually the case when
    links represent a *similarity* between nodes.
  </ParamField>
</Accordion>
