Skip to content

Writing recipes

Writing a recipe is one way to instruct Graphext on how to build a project/visualization given some tabular data (others being the use of the Wizard, or the application of pre-defined recipes).

A recipe itself is nothing more than a number of steps, which are functions that accept some data and output new, transformed or enriched data. A recipe can have an arbitrary number of such steps, and can generate an arbitrary number of intermediate datasets. But, the output must always be a single dataset that serves as the basis for visual exploration in the resulting project.

When you open the Recipe Editor, you will have done so having selected first a dataset that serves as the main input for the recipe. This dataset is made available by default with the name ds, and so the simplest possible recipe is simply

create_project(ds)

i.e. a recipe with a single step called create_project which accepts a dataset as input and has no output. This is a special case. Since the result of this step is the creation of a project, it doesn't generate any output that can be further processed inside the recipe.

In practice you'll almost always want to somehow transform or enrich your dataset however, and so you'll want to add one or more of the many steps available in Graphext before the final step of project creation.

Steps

In general, the syntax for adding a step is very simple and always of the form:

step_name(inputs, ..., {params}) -> (outputs, ...)

I.e. you provide the name of the step, and in parentheses any inputs it will consume. The inputs may be either specific columns of a dataset, a dataset itself, or a model. Details about the expected types of inputs depend on the specific step in question, and will be documented in that step's page (see the categorized step documentation in the left sidebar).

As the last argument in parenthesis you can provide parameters to configure in detail how exactly the step will transform the input data (if that step has such parameters of course; most but not all do). More on those below.

Finally, in another set of parenthesis (and separated by ->), you provide names for the outputs that the step will generate. Again, the outputs may be one more columns or datasets.

To differentiate between input datasets and columns, column names need to be prefixed with the name of the dataset it belongs to, while datasets can be referred to by their name only. In other words, ds refers to the dataset with the name "ds" and to pick out a specific column you'd use either ds.my_column or ds["my_column"]. The two forms are generally interchangeable, but the latter is required if a column name contains spaces.

To given an example, a simple step that splits the texts in a given column in two at the first comma, might be written as

split_string(ds.text, {"pattern": ","}) -> (ds.left_part, ds.right_part)

The result of the split will be two new columns named "left_part" and "right_part" in the dataset "ds".

A very simple recipe including a transformation could thus be

# Transform the input data
split_string(ds.text, {"pattern": ","}) -> (ds.left_part, ds.right_part)

# Select which dataset forms the basis for the final project
create_project(ds)

where the columns resulting from the split will now be included in the final project created.

Input/output names

Usually, when you start typing the beginning of a step's name in the Recipe editor, the rest of the step's signature will be autocompleted, including the default names of any outputs it creates. So you only need to change the names if you don't like the default ones (or if they clash with other outputs you may have generated already).

Parameters

Parameters let you configure how a step will process its inputs. The syntax of parameters corresponds to a valid json object, for those familiar with json or javascript. For those who are not, it's simply a number of quoted parameter names and corresponding values in between curly braces. E.g. we have already seen the example

{"pattern": ","}

where "pattern" is the parameter's name and "," its value.

In general, all parameter names must be quoted strings, while values may be

  • quoted strings also
  • numbers
  • lists of numbers or strings
  • another, nested object in curly braces, following the above rules

Each step's individual documentation will describe its valid parameters. And, even better, the recipe editor will help you configure the step by highlighting any invalid parameters you may accidentally have selected.

Variables

To make it easier to re-use the same name for a column or parameter in multiple places, you may define such a name as a variable at any place in the recipe, and any reference to that variable will be substituted automatically. E.g.

TODO ...