Usage
The following example shows how the step can be used in a recipe.Examples
Examples
The following call creates rules between pairs of items A and B, if:
- A occurs in at least 7 sessions
- B occurs in at least 25% of sessions containing A
- The presence of A in a session makes the presence of B in the same session at least twice as likely.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
A long input dataset with one row per item (product) and session (basket). In other words, sessions
or baskets should be _dis_aggregated, but each row should uniquely identify the item/product and
session/basket by id or name.
Outputs
Outputs
A new output dataset containing products and rules, connected into a network such that products are
linked to the association rules in which they occur.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Name of column uniquely identifying all items/products.
Name(s) of column(s) uniquely identifying all sessions/baskets/orders.
Array items
Array items
Each item in array.
Column used to label items in a user-friendly manner.
Minimum size of itemsets to identify.
E.g. an itemsize of 3 means association rules will have 2 antecedents (e.g. A, B)
and 1 consequent (C), resulting in rules of the form (A, B) -> C. The step will
currently generate only single items as consequents.Values must be in the following range:
Maximum size of itemsets to identify.
E.g. an itemsize of 3 means association rules will have 2 antecedents (e.g. A, B)
and 1 consequent (C), resulting in rules of the form (A, B) -> C. The step will
currently generate only single items as consequents.Values must be in the following range:
Minimum Support.
Minimum support of a rule antecedent. If it is < 1 it will be taken as a proportion.
In any other case it will be expected as a positive integer representing the count.
Create rule A->B only if A occurred in at least this many sessions.
Options
Options
number.Values must be in the following range:
Minimum Confidence.
Expressed as a percentage. Include link A->B only if B occurred in at least this
percentage of sessions also containing A.Values must be in the following range:
Minimum Lift.
Expressed as multipler/ratio. Include link A->B only if A makes the presence of B in the same
sessions at least this many times more likely.
Metric for link weight.
Which association rule metric to use as the weight of links in the network generated by this step.Values must be one of the following:
itemset_support_abs
itemset_support_pct
antecedent_support_abs
antecedent_support_pct
consequent_support_abs
consequent_support_pct
rule_confidence_pct
rule_lift_abs
rule_lift_pct
Whether to link items to rules.
Otherwise, a product (antecedent) will be linked only to other products (consequent).
Only keep N links with largest weight.
This applies individually to each node in the network, filtering its outgoing links to keep only
the first N by weight. The value of weights itself is selected using the
weight_metric
parameter,
i.e. corresponds to one of the association rule metrics (support, confidence etc.). If null
,
all links will be kept.Definition of desired aggregations for (consequent) items.
A dictionary mapping original columns to new aggregated columns, specifying an aggregation function for each.
Aggregations are functions that reduce all the values in a particular column of a single item/product
to a single summary value for that item/product. E.g. a
sum
aggregation of column A calculates a single
total by adding up all the values in A belonging to each item.Possible aggregations functions accepted as func
parameters are:n
,size
orcount
: calculate number of rows in groupsum
: sum total of valuesmean
: take mean of valuesmax
: take max of valuesmin
: take min of valuesfirst
: take first item foundlast
: take last item foundunique
: collect a list of unique valuesn_unique
: count the number of unique valueslist
: collect a list of all valuesconcatenate
: convert all values to text and concatenate them into one long textconcat_lists
: concatenate lists in all rows into a single larger listcount_where
: number of rows in which the column matches a value, needs parametervalue
with the value that you want to countpercent_where
: percentage of the column where the column matches a value, needs parametervalue
with the value that you want to count
count_where
and percent_where
an additional value
parameter is required.Definition of desired aggregations for rules (all items in rule).
A dictionary mapping original columns to new aggregated columns, specifying an aggregation function for each.
Aggregations are functions that reduce all the values in a particular column of a single item/product
to a single summary value for that item/product. E.g. a
sum
aggregation of column A calculates a single
total by adding up all the values in A belonging to each item.Possible aggregations functions accepted as func
parameters are:n
,size
orcount
: calculate number of rows in groupsum
: sum total of valuesmean
: take mean of valuesmax
: take max of valuesmin
: take min of valuesfirst
: take first item foundlast
: take last item foundunique
: collect a list of unique valuesn_unique
: count the number of unique valueslist
: collect a list of all valuesconcatenate
: convert all values to text and concatenate them into one long textconcat_lists
: concatenate lists in all rows into a single larger listcount_where
: number of rows in which the column matches a value, needs parametervalue
with the value that you want to countpercent_where
: percentage of the column where the column matches a value, needs parametervalue
with the value that you want to count
count_where
and percent_where
an additional value
parameter is required.