Skip to content

Filter network


Filter nodes (rows) from the network and fix dangling links.

Allows specification of multiple types of filters (rows, range, and values), and the operation to combine them.


E.g., to apply a row filter excluding nodes in row 0, 2 and 4

filter_network(ds, links, {
  "filters": [
    {"kind": "rows", "ids": [0, 2, 4], "exclude": true}
}) -> (data_flt, links_flt)


The following are the step's expected inputs and outputs and their specific types.

    data: dataset,
    links: dataset, 
        "param": value
) -> (data_flt: dataset, links_flt: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


data: dataset

A dataset containing the nodes (rows) to de-duplicate.

links: dataset

A dataset containing the links between nodes (rows) of the input dataset.


data_flt: dataset

A new dataset containing the same columns as the input dataset but only those rows passing the filter.

links_flt: dataset

A new dataset containing the same columns as the input links, but having connections only between the filtered nodes.


filters: array[object]

A list of filters to apply.

Items in filters

kind: string

Kind of filter to apply. "rows" filters nodes based on their row number in the input dataset. "range" allows specification of minimum ("min") and maximum ("max") values, and "values" matches rows exactly.

Must be one of: "rows", "range", "values"

exclude: boolean = False

If true, the selection will be inverted.

combine: string

How to logically combine the individual filters. I.e. whether to use the AND or the OR operation.

Must be one of: "and", "or"