Skip to main content
A link (or association) A->B is created between items A and B if the presence of A makes the presence of B in the same session N times more likely. For further details about the algorithm see e.g. association rule learning.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
The following call creates links between pairs of items A and B, if:
  • A occurs in at least 7 sessions
  • B occurs in at least 25% of sessions containing A
  • The presence of A in a session makes the presence of B in the same session at least twice as likely.
Note that the last condition is equivalent to saying that the overall frequency of B in all sessions must be less than 12.5% (half of 25%). In other words, a minimum lift of 2 means that the frequency of B, in sessions already containing A, must be twice the background frequency of B in general.As an example, the percentage of shopping baskets containing milk (item B) may be 10%. However, amongst those baskets already containing cereals, the percentage containing milk is likely to be higher. If milk occured e.g. in 30% of baskets also having cereals, than the lift of the rule cereal->milk would be 3. The buying of cereal make the buying of milk 3 times more likely.
link_session_items(items.id, sessions.item_ids, {
  "min_support": 7
  "min_confidence": 25
  "min_lift": 2
}) -> (items.targets, items.weights)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
items
column[category|number]
required
A column containing the IDs of items to analyze.
sessions
column[list[category]|list[number]]
required
A column containing lists of IDs corresponding to items in the same sessions, basket etc.
targets
column
required
A column containing for each item a list of IDs (row numbers) identfying other items it will be linked to.
weights
column
required
A column containing for each item a list of weights identfying the “importance” of each link to other items identified in the targets column.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

min_support
[number, integer]
default:"10"
Minimum Support. Minimum support of a rule antecedent. If it is < 1 it will be taken as a proportion. In any other case it will be expected as a positive integer representing the count. Create link A->B only if A occurred in at least this many sessions.
  • number
  • integer
{_}
number
number.Values must be in the following range:
0 < {_} < 1
min_confidence
number
default:"20"
Minimum Confidence. Expressed as a rule as a percentage. Include link A->B only if B occurred in at least this percentage of sessions also containing A.Values must be in the following range:
0min_confidence100
min_lift
[number, null]
Minimum Lift. Expressed as multipler/ratio. Include link A->B only if A makes the presence of B in the same sessions at least this many times more likely.
weight_metric
string
default:"rule_lift_pct"
Metric for link weight.Values must be one of the following:itemset_support_abs itemset_support_pct filter_metric_abs filter_metric_pct antecedent_support_abs antecedent_support_pct consequent_support_abs consequent_support_pct rule_confidence_pct rule_lift_abs rule_lift_pct
I