Usage
The following example shows how the step can be used in a recipe.Examples
Examples
- Example 1
- Signature
To configure a minimum similarity between embeddings to create a link
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
Outputs
Outputs
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Number of nearest embeddings to take into account.Values must be in the following range:
Minimum similarity for connecting two nodes (similarity ∈ [0, 1]).Values must be in the following range:
Minimum similarity for connecting two nodes, expressed as a quantile of the similarity distribution (similarity ∈ [0, 1]).Values must be in the following range:
Number of trees.
Affects the build time and the index size. A larger value will give more accurate results, but will take
longer to create a larger index.
Accuracy multipler.
A larger value will give more accurate results, but will take longer time to return.
Metric to use, only angular supported for now.
Annoy’s angular metric is equivalent to sqrt(2*(1-cos(u,v))), whose max. is sqrt(2*2) = 2.
I.e. the distance between (1,0) and (-1,0), at maximum angular separation, should be exactly 2
Note that for the weights of the resulting network links Annoy’s distances are converted to similarities in the interval [0,1].Values must be one of the following:
angular
euclidean
manhattan
hamming
dot
Used to seed the random number generator, creating deterministic results.