Skip to main content

Usage

The following examples show how the step can be used in a recipe.

Examples

Extract positive and negative aspects from a text column
infer_aspect_polarity(ds.texts, {
    "integration": "open-ai-1",
    "model": "openai/gpt-4.1",
}) -> (ds.positive_aspects, ds.negative_aspects)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
texts
column[category|text]
required
Column containing the texts to extract aspect polarity from.
*aspects
column
Output columns for aspect polarity results. If two column names are provided, returns positive and negative aspects. If four column names are provided, additionally returns the reasons for the positive and negative classifications. If five column names are provided and aspect_categories is set, additionally returns the aggregated categories.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

integration
string
required
Associated integration.
model
string
default:"openai/gpt-4.1"
AI Model. AI model used for aspect-based polarity extraction. Each text is processed individually to identify entities and classify them as positive or negative.Values must be one of the following:openai/gpt-4.1 openai/gpt-4.1-mini openai/gpt-4.1-nano openai/gpt-5 openai/gpt-5-mini openai/gpt-5-nano openai/gpt-5.1 openai/gpt-5.2
instructions
string
default:""
Additional Instructions. Additional instructions to guide the polarity extraction process. Use this to provide domain-specific guidance or to focus on particular types of entities.
aspect_categories
array[string]
Aspect Categories. Optional list of aspect categories to focus on during extraction. When provided, the model will classify each extracted entity into one of these categories, and an additional categories output column will be available. For example: [“food”, “service”, “pricing”, “cleanliness”, “wait times”].
Item
string
Each item in array.
deduplicate
[boolean, array[string]]
default:"['entities', 'reasons']"
Deduplicate Results. Whether to deduplicate extracted entities and/or reasons. Can be false (no deduplication), true (deduplicate reasons only), or a list specifying which fields to deduplicate (e.g., [“entities”, “reasons”]).
Item
string
Each item in array.Values must be one of the following:
  • entities
  • reasons
dedupe_model
string
default:"openai/gpt-4.1"
Deduplication Model. AI model used for deduplication when deduplicate is enabled.Values must be one of the following:openai/gpt-4.1 openai/gpt-4.1-mini openai/gpt-4.1-nano openai/gpt-5 openai/gpt-5-mini openai/gpt-5-nano openai/gpt-5.1 openai/gpt-5.2
dedupe_batch_size
integer
default:"2000"
Deduplication Batch Size. Maximum number of entities per deduplication LLM call. Increase for larger datasets, decrease if hitting context limits.Values must be in the following range:
100dedupe_batch_size10000
dedupe_instructions
string
default:""
Deduplication Instructions. Additional instructions to guide the deduplication clustering process. Use this to specify domain-specific clustering rules, e.g., “Keep ride-related complaints separate from food-related complaints” or “Treat pricing complaints for different items as distinct”.
consolidate
boolean
default:"true"
Consolidate Clusters. Whether to run an extra merge pass to consolidate similar clusters. This helps reduce near-duplicate clusters even within a single batch.
params
object
API Parameters. Additional parameters passed to the responses API.