> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# embed_sessions

> Trains an _item2vec_ model on provided lists of items. 

Lists of items may represent pages visited in a browsing session, shopping baskets and the products they contain,
sentences of words, etc. The step calculates embeddings vectors for all item lists, such that two vectors are similar
if their corresponding lists of items are similar. Similarity here is measured as an average over the individual
items. Essentially, we first calculate embeddings vectors representing individual items (using [word2vec](https://en.wikipedia.org/wiki/Word2vec)),
and then average over all items belonging to the same list/session.

As an example, consider a dataset containing shopping baskets. In this case the step will first calculate embeddings
for individual products. The resulting vectors will be similar if they represent objects that are often bought together.
E.g. the vectors for sausages and hot dog bread may be more similar to each other than those representing shampoo and
toys. Then, to arrive at an embedding vector for each basket, we simply average over all its individual products. The result
will capture the similarity between baskets in terms of the mix of products they contain. And so the vectors representing
baskets of people buying a significant amount of baby products will be more similar to each other than to vectors representing
baskets of people buying products for a BBQ party.

To only calculate individual item embeddings see the complementary [embed\_items](../embed_items/) step.

Also, we use [gensim](https://radimrehurek.com/gensim/) to train the *item2vec* model, so for further details also see it's
[word2vec page](https://radimrehurek.com/gensim/models/word2vec.html).

## Usage

The following example shows how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      ```stan theme={null}
      embed_sessions(baskets.products, {
        "size": 48,
        "sg": 1,
        "negative": 20,
        "alpha": 0.025,
        "window": 5,
        "min_count": 3,
        "iter": 10,
        "sample": 0
      }) -> (baskets.embedding)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      embed_sessions(sessions: list[category]|list[number], {
          "param": value,
          ...
      }) -> (embeddings: list[number])
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="sessions" type="column[list[category]|list[number]]" required>
    A column containing lists, where each row is a session, and each session a list of items.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="embeddings" type="column[list[number]]" required>
    A column containing item embeddings in the same order as the items input column. Embeddings are lists of numbers.
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="size" type="integer" default="48">
    Length of embedding vectors.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ size < inf
    ```
  </ParamField>

  <ParamField path="sg" type="integer" default="1">
    Use Skip-Gram or CBOW.
    Set this to 1 to use Skip-Gram, 0 for CBOW.

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ sg ≤ 1
    ```
  </ParamField>

  <ParamField path="negative" type="integer" default="20">
    Update maximum for negative-sampling.
    Only update these many word vectors.
  </ParamField>

  <ParamField path="alpha" type="number" default="0.025">
    Initial Learning Rate.

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ alpha ≤ 1
    ```
  </ParamField>

  <ParamField path="window" type="[integer, string]" default="5">
    Word context window.
    Must be either an integer or "auto", "max" or "all".

    <Accordion title="Options">
      <Tabs>
        <Tab title="integer">
          <ParamField path="{_}" type="integer">
            integer.

            Values must be in the following range:

            ```javascript theme={null}
            1 ≤ {_} < inf
            ```
          </ParamField>
        </Tab>

        <Tab title="string">
          <ParamField path="{_}" type="string">
            string.

            Values must be one of the following:

            * `auto`
            * `max`
            * `all`
          </ParamField>
        </Tab>
      </Tabs>
    </Accordion>
  </ParamField>

  <ParamField path="min_count" type="integer" default="3">
    Minimum count of item in dataset. Otherwise filtered out.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ min_count < inf
    ```
  </ParamField>

  <ParamField path="iter" type="integer" default="10">
    Iterations.
    How many epochs to run the algorithm for.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ iter < inf
    ```
  </ParamField>

  <ParamField path="sample" type="number" default="0">
    Sample.
    Percentage of most-common items to filter out.

    Values must be in the following range:

    ```javascript theme={null}
    0 ≤ sample ≤ 1
    ```
  </ParamField>

  <ParamField path="normalize" type="boolean" default="true">
    Whether to return normalized item vectors.
  </ParamField>
</Accordion>
