> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# link_similar_rows

> Create network links calculating similarity between multidimensional and multitype documents. 

Creates a link between each row and the N rows most similar to it. Broadly, the similarity between two rows
is a weighted similarity of the individual columns. The step accepts all data types, i.e. texts, quantitative,
categorical columns etc.

## Usage

The following example shows how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      ```stan theme={null}
      link_similar_rows(ds[["bio", "salary", "age", "department"]]) -> (links)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      link_similar_rows(ds: dataset, {
          "param": value,
          ...
      }) -> (targets: column, weights: column)
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="ds" type="dataset" required>
    A dataset containing the columns to be included in the calculation of pair-wise similarities.
    Note: a subset of columns can always be selected in a recipe using the ds\[\["column1", "column2", ...]] syntax.
    Or to exclude: ds\[!\["column1", "column2", ...]].
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="targets" type="column" required>
    A column containing for each row a list of row numbers identfying all other rows it is similar to.
  </ParamField>

  <ParamField path="weights" type="column" required>
    A column containing for each row a list of weights identfying the "importance" of each
    link to other rows identified in the `targets` column (identifying how similar the rows are).
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="n_similar_docs" type="integer" default="10">
    Number of similar docs.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ n_similar_docs < inf
    ```
  </ParamField>

  <ParamField path="minhash" type="boolean" default="false">
    Whether to use minhash as a similarity measure.
  </ParamField>

  <ParamField path="n_terms" type="integer" default="15">
    Number of terms to use.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ n_terms < inf
    ```
  </ParamField>

  <ParamField path="min_term_freq" type="integer" default="2">
    Minimum term frequency. (For TFIDF).

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ min_term_freq < inf
    ```
  </ParamField>

  <ParamField path="min_doc_freq" type="integer" default="2">
    Minimum doc frequency. (For TFIDF).

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ min_doc_freq < inf
    ```
  </ParamField>

  <ParamField path="max_doc_perc" type="number" default="0.9">
    Maximum doc percentage. (For TFIDF).

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ max_doc_perc ≤ 1
    ```
  </ParamField>

  <ParamField path="min_should_match" type="integer" default="2">
    Minimum of terms that should match.

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ min_should_match < inf
    ```
  </ParamField>

  <ParamField path="separator" type="string" default="[\W0-9]{1,100}">
    Regex to recognize as string separator.
  </ParamField>

  <ParamField path="stopwords_langs" type="string" default="ES,EN">
    Languages to use for stopwords.
    supports ES, EN and both using commas "ES,EN".

    Values must be one of the following:

    * `ES`
    * `EN`
    * `ES,EN`
    * `EN,ES`
  </ParamField>
</Accordion>
