explain_predictions

Explains the predictions of a trained machine learning model. Currently the only supported method is SHAP, which provides a unified measure of feature importance and feature effects. For more information see the SHAP documentation.

Usage

The following example shows how the step can be used in a recipe.

Examples

To get json-encoded explanations for a test set:

Copy

Ask AI

explain_predictions(ds_test, "my-model", {"positive_class": "True", "verbose": false}) -> (ds.explanation)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

dataset

required

A dataset containing features (but not target column) to calculate explanations for.

model

file[model_classification[ds]]

required

A trained model to explain.

Outputs

explanation

column

required

A json-encoded, verbose, or list of explanations of the model’s predictions for dataset ds.

*prediction

column

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

positive_class

[string, null]

required

Positive class. Name/label of the target class to generate explanations for if model is a classifier.

groups

object

Feature groups. A dictionary mapping feature names to group names. If provided, explanations will be calculated for each group of features, rather than for individual features. The resulting explanations will be the sum of the SHAP values of all features in each group. This can be useful for understanding the overall effect of a group of features.

Item properties

Items

array[string]

One or more additional parameters.

Array items

Item

string

Each item in array.

sign

[integer, null]

Sign of the SHAP values. Can be 1 or -1 to focus on SHAP values contributing positively or negatively to the predictions. This will be taken into account when:

grouping: only features with the specified sign will be included in the configured groups, while features not in the group mapping or of a a different sign will be grouped separately as “uncategorized”. Note that this means that the original variables summed in each group can be different across data points.
ranking: when selecting the top N features (see below parameter), SHAP values will be ordered and filtered using the signed values, rather than the absolute values.

0 or null means the sign is ignored when grouping, filtering or ranking SHAP values.Values must be one of the following:

-1
1
0
None

topn

[integer, null]

Top N features. Number of top features to include in the explanation. If not provided, all features will be included.

round

[integer, null]

default:"4"

Round numerical explanations. How many decimal places to round the explanations to. If not provided, or null will not round.

output

string

default:"json"

Output format of the explanations. If json, the default, explanations will be json-encoded. For each row in the dataset, the explanation consists of an array containing one object for each of the topn features, with each object in turn containing the feature name, the SHAP value, and the feature value (e.g. "[{'name': 'events': 'data': 5071, 'value': 0.15}, {...}, ...]"). The resulting json-encoded output column can be processed further in Graphext using the extract_json_values step.If verbose, explanations will be more verbal, using a configurable template to generate a human-readable explanation. The default format is shown in the format parameter below.If columns, the explanations will be returned as separate columns. The first column will contain in each row a list of the feature names of the topn features, sorted descending by SHAP value. The second (optional) column will contain the corresponding SHAP values in the same order. A third (optional) column will contain the corresponding feature values.Values must be one of the following:

columns
json
verbose

flat_records

boolean

default:"true"

Flat or nested records. If true, and the output parameter is "json", entries in each output row are flat lists of objects, each containing the name, the value and SHAP contribution of a feature in the dataset. Additional information, such as the sum of remaining SHAP contributions (when topn or groups is set, see include_tail below), or the base value, will be included with special names "<tail>" and "<base>", respectively, as if they were features themselved.If false, each output row will contain an object instead, where proper SHAP values are nested under the “shap_values” key, while the tail and base value are top-level key-value pairs.

include_tail

boolean

default:"true"

Include tail. If true, the sum of SHAP values of features not included in the topn items or groups will also be included in the output.

include_base

boolean

default:"false"

Include base value. If true, the base value of the model will be included in the output. Note that this value is usually identical for all rows in the dataset.

format

[string, null]

default:"{name}(={data}): {shap}"

Verbal explanation format. A template string to generate a human-readable explanation (applicable only if parameter "output": "verbose"). The template can contain placeholders for the feature name, the SHAP value, and the feature value (data). The default format is ”(=): ”. An even more verbose explanation format could be "{name} has a SHAP value of {value} and a feature value of {data}", for example. The topn features will be converted using this format and then concatenated using the below separator parameter.

separator

[string]

default:", "

Verbal explanation separator. A string to separate the explanations of the topn features (applicable only if parameter "verbose": true).

space

string

default:"probability"

Explanation space. The space in which to calculate the explanations. “raw” corresponds to the internal prediction space of the model, e.g. log-odds in the case of a Catboost classifier. “normalized” will re-normalize the explanations to the range [0, 1] for each feature. SHAP values for all features in a single row will sum to 1.0 in this case. “probability” will convert SHAP values to probabilities by rescaling the sum of SHAP values for each row such that they sum to the difference between the base probability and the model’s prediction.Values must be one of the following:

raw
normalized
probability

base_value

[number, null]

Base value for to use in explanations. The base value to use when converting SHAP values to probabilities if the model is a classifier. If not provided, the mean of the model’s predictions on the dataset will be used. Only relevant if space is set to “probability”.

method

string

default:"shap"

Explanation method.Values must be one of the following:

shap

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration