Skip to content

Prompt ai


Call OpenAI's models on each row of the dataset for a given prompt.

Use any of OpenAI's models on a row-by-row basis. This step doesn't feed the whole dataset into the model, so you won't be able to perform operations that require more than one row at a time. It can be used to perform a variety of tasks. Keep in mind that OpenAI's models are generative AI technologies, and thus can give incorrect responses. It comes with a predefined budget of 5 $USD, which will prevent the step from executing if it will cost over that budget. It is advised that you use a filter step first to test the prompt out on a few rows, then launch it on the whole dataset.

Keep in mind our budget is a rough estimate, if you're concerned about cost you should set limits on OpenAI's side. Your prompt will be configured by using two parameters: 'prompt' and 'response_format'. prompt is a text field while response_format allows you to specify a JSON format for the model's response, in the format of {[expected_column]: "description"}.

Both in the prompt and the response format descriptions you may refer to the row's attributes by using ${attribute_name}. Check the examples and parameter documentation below for more information.

API integration

To use this step your team needs to have the OpenAI integration configured in Graphext. The corresponding credentials are required to connect to a third-party API. You can configure API integrations following the INTEGRATIONS or ADD INTEGRATION link in the top-left corner of your Team's page, selecting API keys, and then the name of the desired third-party service.

First, create an OpenAI account or sign in. Next, navigate to the API key page and "Create new secret key", optionally naming the key. Make sure to save this somewhere safe and do not share it with anyone. Optionally, you can specify the organization the key belongs to. On OpenAI's' page, you can set general budgets for your api key and other settings that may interest you.


The following are the step's expected inputs and outputs and their specific types.

Step signature
prompt_ai(ds: dataset, {
    "param": value
}) -> (*outputs: column)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


Get country for each address, as a category

Example call (in recipe editor)
prompt_ai(ds, { # contains column address
    "integration": "MY_INTEGRATION_ID",
    "prompt": "What is the country for ${address}"
}) -> (
More examples

Specify model

Example call (in recipe editor)
prompt_ai(ds, { # contains column 'Local Address'
    "integration": "MY_INTEGRATION_ID",
    "model": {
      "id": "gpt-4",
      "temperature": 0.2
    "prompt": "What is the country for ${Local Address}"
}) -> (

Get the zip code from a series of messy adresses, as a number

Example call (in recipe editor)
prompt_ai(ds, { # contains column address
    "integration": "MY_INTEGRATION_ID",
    "prompt": "Extract the ZIP code from ${address}",
    "out_types": {
      "zip_code": "number"
}) -> (ds.zip_code)

Perform sentiment analysis on some reviews.

Example call (in recipe editor)
prompt_ai(ds, { # contains column 'Client Review',
    "integration": "MY_INTEGRATION_ID",
    "prompt": "Classify the following text in neutral, positive or negative, and assign a score: ${Client Review}",
    "response_format": {
        "sentiment": "score consisting on an integer from -10 (most negative) to 10 (most positive)",
        "language": "language identified in the given text"
    "out_types": {
        "sentiment": "number",
        "language": "category"
}) -> (ds.sentiment, ds.language)


ds: dataset

Dataset to enrich. Make sure it contains the necessary columns.


*outputs: column

Number of columns to specify. By default it's set as only one column, of type category.


prompt: string

Main prompt for the API call. The main body of instructions you wish to perform.

integration: string

Associated integration.

response_format: object

Prompt instructions for each output column. Further prompt instructions for each output column.

Items in response_format

*param: string

One or more additional parameters. Note that all parameters should have the same type.

out_types: object

Types for the output column(s). Desired types for each output column. By default, they will all be categories.

Items in out_types

*param: string = "category"

One or more additional parameters. Note that all parameters should have the same type.

Must be one of: "category", "date", "number", "boolean", "url", "sex", "text", "list[number]", "list[category]", "list[url]", "list[boolean]", "list[date]"

model: object

Model Configuration. Configuration for OpenAI's model.

Items in model

id: string = "gpt-3.5-turbo-1106"

OpenAI model to choose.

Must be one of: "gpt-3.5-turbo-1106", "gpt-3.5-turbo", "gpt-4", "gpt-4-32k", "gpt-4-0125-preview", "gpt-4-1106-preview"

temperature: number = 0.7

Temperature. Higher means more creativity, but also makes the model more likely to hallucinate. Lower temperature yields more deterministic results.

Range: 0 ≤ temperature ≤ 1

budget: number = 5

Budget. If present, the step will not execute if estimated input token cost exceeds this amount in USD. If max_out_tokens is not set, we will minimum of the cost. If it is set, we will give a ceiling. Actual cost may vary depending on a number of factors like your OpenAI plan. Check your plan before executing.

max_out_tokens: number

Maximum output tokens. If set, each individual response will add to at most this amount. Allows for a budget theorical ceiling to be calculated before executing.