link_embeddings

Uses Spotify’s Annoy to perform approximate nearest neighbour search.

Usage

The following examples show how the step can be used in a recipe.

Examples

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

n_nearest

integer

default:"15"

Number of nearest neighbours to connect to.Values must be in the following range:

1 ≤ n_nearest < inf

similarity_min

number

default:"0"

Minimum similarity for connecting two nodes.Values must be in the following range:

0 ≤ similarity_min ≤ 1

similarity_min_q

number

default:"0"

Minimum similarity for connecting two nodes, expressed as a quantile of the similarity distribution.Values must be in the following range:

0 ≤ similarity_min_q ≤ 1

n_trees

integer

default:"30"

Number of trees. Affects the build time and the index size. A larger value will give more accurate results, but will take longer to create a larger index.

search_k_mult

integer

default:"2"

Accuracy multipler. A larger value will give more accurate results, but will take longer time to return.

metric

string

default:"angular"

Metric to use, only angular supported for now. Annoy’s angular metric is equivalent to sqrt(2*(1-cos(u,v))), whose max. is sqrt(2*2) = 2. I.e. the distance between (1,0) and (-1,0), at maximum angular separation, should be exactly 2 Note that for the weights of the resulting network links Annoy’s distances are converted to similarities in the interval [0,1].Values must be one of the following:

angular
euclidean
manhattan
hamming
dot

seed

number

Used to seed the random number generator, creating deterministic results.

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration