Uses Spotify’sDocumentation Index
Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
Use this file to discover all available pages before exploring further.
Annoy to perform approximate nearest neighbour search.
Usage
The following examples show how the step can be used in a recipe.Examples
Examples
- Example 1
- Example 2
- Signature
To link similar embeddings with default configuration
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced
by name e.g. "churn-clf").
Inputs
Inputs
A categorical column containing embeddings (numerical vectors/lists). Usually the result of
previously executing a step embed_[entity].
Outputs
Outputs
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output).
Parameters
Parameters
Number of nearest neighbours to connect to.Values must be in the following range:
Minimum similarity for connecting two nodes.Values must be in the following range:
Minimum similarity for connecting two nodes, expressed as a quantile of the similarity distribution.Values must be in the following range:
Number of trees.
Affects the build time and the index size. A larger value will give more accurate results, but will take
longer to create a larger index.
Accuracy multipler.
A larger value will give more accurate results, but will take longer time to return.
Metric to use, only angular supported for now.
Annoy’s angular metric is equivalent to sqrt(2*(1-cos(u,v))), whose max. is sqrt(2*2) = 2.
I.e. the distance between (1,0) and (-1,0), at maximum angular separation, should be exactly 2
Note that for the weights of the resulting network links Annoy’s distances are converted to
similarities in the interval [0,1].Values must be one of the following:
angulareuclideanmanhattanhammingdot
Used to seed the random number generator, creating deterministic results.