> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# embed_items

> Trains an _item2vec_ model on provided lists of items (or sentences of words, etc.). 

This is essentially the [*word2vec*](https://en.wikipedia.org/wiki/Word2vec) algorithm applied to arbitrary lists
of items. *Word2vec* computes vectors representing words such that nearby (similar) vectors represent words that are
often found in a similar context. *Item2vec* refers to using the exact same algorithm but applying it to arbitrary
lists of items in which the order of items has a comparable interpretation to words in a sentence (the items may be
categories, tags, IDs etc.).

Note, that if the order of items in the list (session/basket etc.) is not important, and you simply want item vectors
to be similar if the corresponding items usually occur together in the same list, use the `window` parameter (see
below) with a value of "all".

We use [gensim](https://radimrehurek.com/gensim/) to train the *item2vec* model, so for further details also see it's
[word2vec page](https://radimrehurek.com/gensim/models/word2vec.html).

## Usage

The following example shows how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      The following uses default parameter values only, and thus would be equivalent to using the step without specifying
      any parameters.

      ```stan theme={null}
      embed_items(products.id, baskets.product_ids, {
        "size": 48,
        "sg": 1,
        "negative": 20,
        "alpha": 0.025,
        "window": 5,
        "min_count": 3,
        "iter": 10,
        "sample": 0
      }) -> (products.embedding)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      embed_items(items: category|number, sessions: list[category]|list[number], {
          "param": value,
          ...
      }) -> (embeddings: list[number])
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="items" type="column[category|number]" required>
    A column containing item identifiers (IDs).
  </ParamField>

  <ParamField path="sessions" type="column[list[category]|list[number]]" required>
    A column containing lists, where each row is a session, and each session a list of item identifiers (IDs) compatible
    with the values of the items column.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="embeddings" type="column[list[number]]" required>
    A list column containing item embeddings in the same order as the items input column. Embeddings are lists of numbers
    (vectors).
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="size" type="integer" default="48">
    Length of resulting embedding vectors.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ size < inf
    ```
  </ParamField>

  <ParamField path="sg" type="integer" default="1">
    Whether to use the skip-gram or CBOW algorithm.
    Set this to 1 for skip-gram, and 0 for CBOW.

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ sg ≤ 1
    ```
  </ParamField>

  <ParamField path="negative" type="integer" default="20">
    Update maximum for negative-sampling.
    Only update these many word vectors.
  </ParamField>

  <ParamField path="alpha" type="number" default="0.025">
    Initial learning rate.

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ alpha ≤ 1
    ```
  </ParamField>

  <ParamField path="window" type="[integer, string]" default="5">
    Size of word context window.
    Must be either an integer (the number of neighbouring words to consider), or any of "auto", "max" or "all",
    in which case the window is equal to the whole list/session/basket.

    <Accordion title="Options">
      <Tabs>
        <Tab title="integer">
          <ParamField path="{_}" type="integer">
            integer.

            Values must be in the following range:

            ```javascript theme={null}
            1 ≤ {_} < inf
            ```
          </ParamField>
        </Tab>

        <Tab title="string">
          <ParamField path="{_}" type="string">
            string.

            Values must be one of the following:

            * `auto`
            * `max`
            * `all`
          </ParamField>
        </Tab>
      </Tabs>
    </Accordion>
  </ParamField>

  <ParamField path="min_count" type="integer" default="3">
    Minimum count of item in dataset.
    If an item occurs fewer than this many times it will be ignored.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ min_count < inf
    ```
  </ParamField>

  <ParamField path="iter" type="integer" default="10">
    Iterations.
    How many epochs to run the algorithm for.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ iter < inf
    ```
  </ParamField>

  <ParamField path="sample" type="number" default="0">
    Percentage of most-common items to filter out (equivalent to "stop words").

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ sample ≤ 1
    ```
  </ParamField>

  <ParamField path="normalize" type="boolean" default="true">
    Whether to return normalized item vectors.
  </ParamField>
</Accordion>
