Skip to main content
Uses a simplified JsonPath-like syntax to extract values from JSON objects. Supported syntax:
  • Dot notation: address.city, address.anotherLevel.key
  • Array index: phoneNumbers[0].type, phoneNumbers[1].number
  • Array slice (all elements): phoneNumbers[:], phoneNumbers[::]
  • Array slice (range): phoneNumbers[0:2], phoneNumbers[0:2:1]
  • Array slice (with step): phoneNumbers[::2]
  • Quoted keys (for special characters): address["other info"]
  • Root array access: $[:].firstName
Not supported:
  • Wildcard [*] — use [:] instead
  • Negative indices [-1]
  • Recursive descent ..
See darro#243 for tracking support of these operators.

Usage

The following examples show how the step can be used in a recipe.

Examples

Extract a simple nested value.
extract_json_values(ds.json_col, {
  "path": "address.city",
  "type": "text"
}) -> (ds.cities)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
text
column[text|category]
required
A text column with Json values to extract parts from.
value_extracted
column
required
The column resulting from evaluating the JsonPath expression on the input column.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

path
string
required
JsonPath-like string used to extract values from the JSON column. Supports dot notation, array indices, slices and quoted keys. Does not support wildcard [*], negative indices or recursive descent (..).
  • address.city
  • phoneNumbers[:].type
type
string
required
Output column type. Select the desired type using a shortened yet fully specified name.Values must be one of the following:boolean category date number text url list[number] list[category] list[url]