embed_text
Parse and calculate a (word-averaged) embedding vector for each text.
An embedding vector is a numerical representation of a text, such that different numerical components of the vector
capture different dimensions of the text’s meaning. Embeddings can be used, for example, to calculate the semantic similarity
between pairs of texts (see link_embeddings
, for example, to create a network of texts connected by similarity).
In this step, embeddings of texts are calculated as (weighted) averages of the embeddings of each text’s individual words (the individual word embeddings are GloVe vectors, as provided by spaCy’s language models).
Use either the language
parameter or a second input column to specify the language of the input texts. If neither
is provided, the language will be inferred automatically from the texts themselves (which is equivalent to first creating
a language column using the infer_language
step).
Was this page helpful?