> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# prompt_ai

> Call OpenAI's models on each row of the dataset for a given prompt. 

Use any of OpenAI's models on a row-by-row basis. This step doesn't feed the whole dataset into the model, so you won't be able to
perform operations that require more than one row at a time.
It can be used to perform a variety of tasks. Keep in mind that OpenAI's models are generative AI technologies, and thus can give incorrect responses.
It comes with a predefined budget of 5 \$USD, which will prevent the step from executing if it will cost over that budget.
It is advised that you use a filter step first to test the prompt out on a few rows, then launch it on the whole dataset.

Keep in mind our budget is a rough estimate, if you're concerned about cost you should set limits on OpenAI's side.
Your prompt will be configured by using two parameters: 'prompt' and 'response\_format'.
prompt is a text field while response\_format allows you to specify a JSON format for the model's
response, in the format of `{[expected_column]: "description"}`.

Both in the prompt and the response format descriptions you may refer to the row's attributes by using
`${attribute_name}`. Check the examples and parameter documentation below for more information.

???+ info "API integration"
To use this step your team needs to have the *OpenAI* integration configured in Graphext. The corresponding credentials
are required to connect to a third-party API. You can configure API integrations following the `INTEGRATIONS` or `ADD INTEGRATION`
link in the top-left corner of your Team's page, selecting `API keys`, and then the name of the desired third-party service.

First, create an OpenAI account or sign in.
Next, navigate to the API key page and "Create new secret key", optionally naming the key.
Make sure to save this somewhere safe and do not share it with anyone.
Optionally, you can specify the organization the key belongs to.
On [OpenAI](https://platform.openai.com/)'s' page, you can set general budgets for your api key and other settings that may interest you.

## Usage

The following examples show how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      Specify model

      ```stan theme={null}
      prompt_ai(ds[["Local Address"]], { # contains column 'Local Address'
          "integration": "MY_INTEGRATION_ID",
          "model": {
            "id": "gpt-4.1-mini",
            "temperature": 0.2
          },
          "prompt": "What is the country for ${Local Address}"
      }) -> (ds.country)
      ```
    </Tab>

    <Tab title="Example 2">
      Get attributes from disneyland reviews

      ```stan theme={null}
      prompt_ai(ds[["Review_Text"]],
            {
                "integration": "open-ai-1-70",
                "prompt": "The following is a review from Disneyland. I want you to extract the topics mentioned, the Names of the Disney Characters mentioned and the Names of rides mentioned in this paragraph: '${Review_Text}'. If you do not find or recognize any name of people, company, or rides, simply do not answer anything. NEVER ANSWER WITH 'NULL' VALUE. IMPORTANT: DO NOT ANSWER ANYTHING ELSE IN ANY OTHER CIRCUMSTANCE. DO NOT ANSWER ANYTHING ELSE APART FROM THE JSON",
                "model": {
                    "id": "gpt-4.1-nano"
                },
                "response_format": {
                    "topics": "topics mentioned",
                    "names_of_characters": "names of Disney Characters",
                    "names_of_rides": "names of rides"
                },
                "force_format": {
                  "topics": ["fun", "children", "ride", "food"]
                },
                "out_types": {
                  "topics": "list[category]"
                }
            }) -> (ds.topics,
                  ds.names_of_characters,
                  ds.names_of_rides)
      ```
    </Tab>

    <Tab title="Example 3">
      Classify Tweets

      ```stan theme={null}
      prompt_ai(ds[["authorName", "text"]],
                {
                    "integration": "victoriano-apikey",
                    "budget": 15,
                    "model": {
                        "id": "gpt-4.1-mini",
                        "temperature": 0.2
                    },
                    "prompt": "Classify the following tweet text if it implicitly: criticizes, benefits, is neutral, or is unrelated to each of the main political parties in Spain or any of their members and leaders: ${text} considering the bias of the medium that wrote it with the medium's name: ${authorName}",
                    "response_format": {
                        "Clasificacion_PP": "classify the tweet text into only one of these 4 categories related to the Partido Popular (PP), its leader (Álberto Nuñez Feijoo), or any of its members: criticizes PP, benefits PP, neutral for PP, does not mention PP",
                        "Clasificacion_PSOE": "classify the tweet text into only one of these 4 categories related to the Spanish Socialist Workers' Party (PSOE), its leader (Pedro Sánchez), or any of its members: criticizes PSOE, benefits PSOE, neutral for PSOE, does not mention PSOE",
                        "Clasificacion_VOX": "classify the tweet text into only one of these 4 categories related to the VOX party, its leader (Santiago Abascal), or any of its members: criticizes VOX, benefits VOX, neutral for VOX, does not mention VOX",
                        "Clasificacion_SUMAR": "classify the tweet text into only one of these 4 categories related to the SUMAR party, its leader (Yolanda Díez), or any of its members: criticizes SUMAR, benefits SUMAR, neutral for SUMAR, does not mention SUMAR",
                        "media_bias": "classify the bias of the medium as: right, center, left"
                    },
                    "out_types": {
                        "Clasificacion_PP": "category",
                        "Clasificacion_PSOE": "category",
                        "Clasificacion_VOX": "category",
                        "Clasificacion_SUMAR": "category",
                        "media_bias": "category"
                    }
                }) -> (ds.Clasificacion_PP,
                      ds.Clasificacion_PSOE,
                      ds.Clasificacion_VOX,
                      ds.Clasificacion_SUMAR,
                      ds.media_bias)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      prompt_ai(ds: dataset, {
          "param": value,
          ...
      }) -> (*outputs: column)
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="ds" type="dataset" required>
    Dataset to enrich. Make sure it contains the necessary columns.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="*outputs" type="column">
    Number of columns to specify. By default it's set as only one column, of type category.
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="prompt" type="string">
    Main prompt for the API call.
    The main body of instructions you wish to perform.
  </ParamField>

  <ParamField path="integration" type="string">
    Associated integration.
  </ParamField>

  <ParamField path="response_format" type="object">
    Prompt instructions for each output column.
    Further prompt instructions for each output column.

    <Accordion title="Item properties">
      <ParamField path="Items" type="string">
        One or more additional parameters.
      </ParamField>
    </Accordion>
  </ParamField>

  <ParamField path="force_format" type="object">
    Values allowed in each output column.
    If provided, values in each column will be restricted.

    <Accordion title="Item properties">
      <ParamField path="Items" type="array[string]">
        One or more additional parameters.

        <Accordion title="Array items">
          <ParamField path="Item" type="string">
            Each item in array.
          </ParamField>
        </Accordion>
      </ParamField>
    </Accordion>
  </ParamField>

  <ParamField path="out_types" type="object">
    Types for the output column(s).
    Desired types for each output column. By default, they will all be categories.

    <Accordion title="Item properties">
      <ParamField path="Items" type="string" default="category">
        One or more additional parameters.

        Values must be one of the following:

        `category` `date` `number` `boolean` `url` `sex` `text` `list[number]` `list[category]` `list[url]` `list[boolean]` `list[date]`
      </ParamField>
    </Accordion>
  </ParamField>

  <ParamField path="model" type="object">
    Model Configuration.
    Configuration for OpenAI's model.

    <Accordion title="Properties">
      <ParamField path="id" type="string" default="gpt-4.1-nano">
        OpenAI model to choose.

        Values must be one of the following:

        `gpt-4.1` `gpt-4.1-mini` `gpt-4.1-nano` `gpt-5-mini` `gpt-5-nano` `o4-mini`
      </ParamField>

      <ParamField path="temperature" type="number" default="0.7">
        Temperature. Higher means more creativity, but also makes the model more likely to hallucinate. Lower temperature yields more deterministic results. Ignored for reasoning models (gpt-5-mini, gpt-5-nano, o4-mini).

        Values must be in the following range:

        ```javascript theme={null}
        0 ≤ temperature ≤ 1
        ```
      </ParamField>
    </Accordion>
  </ParamField>

  <ParamField path="budget" type="number" default="5">
    Budget.
    If present, the step will not execute if estimated input token cost exceeds this amount in USD.
    If max\_out\_tokens is not set, we will minimum of the cost. If it is set, we will give a ceiling.
    Actual cost may vary depending on a number of factors like your OpenAI plan. Check your plan before executing.
  </ParamField>

  <ParamField path="max_out_tokens" type="number">
    Maximum output tokens.
    If set, each individual response will add to at most this amount. Allows for a budget theorical ceiling to be calculated before executing.
  </ParamField>

  <ParamField path="batch_size" type="integer" default="100">
    Size of concurrent request at a time.
    Lowering this if you have very low rate limits in your plan might prevent empty responses.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ batch_size ≤ 1000
    ```
  </ParamField>

  <ParamField path="timeout" type="integer" default="60">
    Timeout for requests to OpenAI.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ timeout < inf
    ```
  </ParamField>
</Accordion>
