aggregate_neighbours
For each node in a network, group and aggregate over its neighbours.
Using the link columns in the provided dataset (including at least a targets columns containing lists of target row numbers that each row connects to), for each row calculate requested aggregations over all its direct (first-degree) neighbours.
Will use the first set of link columns encountered in the datasets metadata.
A dataset containing the nodes (rows) to group and aggregate, and its corresponding links.
The original dataset plus newly aggregated columns. Will have one column per specified aggregation function (more than one aggregation can be specified for each original input column).
Pre-aggregation row sorting.
Sort the dataset rows before aggregating, e.g. when in a particular aggregation function (such as list
) the encountered order is important.
The sort column name(s).
These column(s) will be used to sort the dataset before aggregating (if multiple, in specified order).
E.g. to first sort links by their weight, and if the weight column is called “gx_weight”, use "gx_weight"
- date_added
- [‘lastname’, ‘firstname’]
Whether to sort in ascending order (or in descending order if false).
- For example, to sort first by price, then dimension, and in descending order:
{
"columns": ["price", "dimension"],
"ascending": false
}
Definition of desired aggregations.
A dictionary mapping original columns to new aggregated columns, specifying an aggregation function for each.
Aggregations are functions that reduce all the values in a particular column of a single group to a single summary value of that group.
E.g. a sum
aggregation of column A calculates a single total by adding up all the values in A belonging to each group.
Possible aggregations functions accepted as func
parameters are:
n
,size
orcount
: calculate number of rows in groupsum
: sum total of valuesmean
: take mean of valuesmax
: take max of valuesmin
: take min of valuesfirst
: take first item foundlast
: take last item foundunique
: collect a list of unique valuesn_unique
: count the number of unique valueslist
: collect a list of all valuesconcatenate
: convert all values to text and concatenate them into one long textconcat_lists
: concatenate lists in all rows into a single larger listcount_where
: number of rows in which the column matches a value, needs parametervalue
with the value that you want to countpercent_where
: percentage of the column where the column matches a value, needs parametervalue
with the value that you want to count
Note that in the case of count_where
and percent_where
an additional value
parameter is required.
One item per input column. Each key should be the name of an input column, and each value an object defining one or more aggregations for that column. An individual aggregation consists of the name of a desired output column, mapped to a specific aggregation function. For example:
{
"input_col": {
"output_col": {"func": "sum"}
}
}
Object defining how to aggregate a single output column.
Needs at least the "func"
parameter. If the aggregation function accepts further arguments,
like the "value"
parameter in case of count_where
and percent_where
, these need to be provided also.
For example:
{
"output_col": {"func": "count_where", "value": 2}
}
Aggregation function.
Values must be one of the following:
n
size
count
sum
mean
n_unique
count_where
percent_where
concatenate
max
min
first
last
concat_lists
unique
list
- Including an aggregation function with additional parameters:
{
"product_id": {
"products": {"func": "list"},
"size": {"func": "count"}
},
"item_total": {
"total": {"func": "sum"},
},
"item_category": {
"num_food_items": {"func": "count_where", "value": "food"}
}
}
Whether the links provided should be interpreted as being directed.
Directed here meaning that the link A→B (from node A to B) may be different from the link B→A (i.e. they may
have different weight attributes for example). When "directed": false
, in contrast, i.e. links are undirected,
it is assumed that the link A→B is always identical to B→A (i.e. A↔B always). This is usually the case when
links represent a similarity between nodes.
Was this page helpful?