Use any of OpenAI’s models on a row-by-row basis. This step doesn’t feed the whole dataset into the model, so you won’t be able to perform operations that require more than one row at a time. It can be used to perform a variety of tasks. Keep in mind that OpenAI’s models are generative AI technologies, and thus can give incorrect responses. It comes with a predefined budget of 5 $USD, which will prevent the step from executing if it will cost over that budget. It is advised that you use a filter step first to test the prompt out on a few rows, then launch it on the whole dataset.

Keep in mind our budget is a rough estimate, if you’re concerned about cost you should set limits on OpenAI’s side. Your prompt will be configured by using two parameters: ‘prompt’ and ‘response_format’. prompt is a text field while response_format allows you to specify a JSON format for the model’s response, in the format of {[expected_column]: "description"}.

Both in the prompt and the response format descriptions you may refer to the row’s attributes by using ${attribute_name}. Check the examples and parameter documentation below for more information.

???+ info “API integration” To use this step your team needs to have the OpenAI integration configured in Graphext. The corresponding credentials are required to connect to a third-party API. You can configure API integrations following the INTEGRATIONS or ADD INTEGRATION link in the top-left corner of your Team’s page, selecting API keys, and then the name of the desired third-party service.

First, create an OpenAI account or sign in. Next, navigate to the API key page and “Create new secret key”, optionally naming the key. Make sure to save this somewhere safe and do not share it with anyone. Optionally, you can specify the organization the key belongs to. On OpenAI’s’ page, you can set general budgets for your api key and other settings that may interest you.

Usage

The following examples show how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).