An n-gram here means a contiguous sequence of n words in the original text. The step extracts all n-grams of a given text, i.e. starting at each individual word in original order. The result is one list of n-grams per input text, where each n-gram is a single text string with individual words separated by spaces (unless configured otherwise). The maximum size and kind of n-grams extracted, as well as how to represent them in the result can be configured via the parameters described below. The step also allows filtering of n-grams based on their frequency in the dataset.

Usage

The following example shows how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).