Skip to content

Cast

fast step  data types

Interprets and changes a column's data to another (semantic) type.

This has two consequences:

  1. It will allow the resulting column to be used by steps only accepting the new type, e.g. when casting a column of concatenated texts to the "url" type, so that it may be used where Urls are expected (e.g. the step fetch_url_content).
  2. It will change any values not conformant with the new type to the missing value (NaN). E.g., casting a column of mixed data containing numbers to the "number" type, will replace all values that cannot be read as numbers with NaN.

Note that for each possible type a column can be cast to (via the "type" parameter, e.g. "number", "category" etc.), the steps accepts different configuration parameters. See the subsections under Parameters below for further details.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
cast(input: column, {
    "param": value
}) -> (output: column)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

E.g. to simply convert a text column to a category column, use:

Example call (in recipe editor)
cast(ds.text, {"type": "category"}) -> (ds.new_cat)

Inputs


input: column

The column you wish to cast.

Outputs


output: column

A new column with original data cast to the desired type.

Parameters


The input column can be cast to any of the specified types, each having their own parameters controlling exactly how to cast the input values.

type: string = "number"

Desired semantic type of the converted data. Make data numerical with "type": "number"


decimal: string = "."

Separator to mark the decimal part. Use "." or "," to indicate how decimal values are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the thousands separator. E.g. "decimal": "." assumes that the period "." is used to separate decimals and "," thousands, as in the number string "12,173.12".

Must be one of: ".", ","


thousand: string = ","

Separator to mark the thousands. Use "." or "," to indicate how thousands are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the decimal separator. E.g. "thousand": "." assumes that the period "." is used to separate thousands and "," decimals, as in the number string "12.173,12".

Must be one of: ".", ","

type: string = "list[number]"

Desired semantic type of the converted data. Make data numerical with "type": "list[number]"


brackets: string | null

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.


decimal: string = "."

Separator to mark the decimal part. Use "." or "," to indicate how decimal values are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the thousands separator. E.g. "decimal": "." assumes that the period "." is used to separate decimals and "," thousands, as in the number string "12,173.12".

Must be one of: ".", ","


thousand: string = ","

Separator to mark the thousands. Use "." or "," to indicate how thousands are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the decimal separator. E.g. "thousand": "." assumes that the period "." is used to separate thousands and "," decimals, as in the number string "12.173,12".

Must be one of: ".", ","

type: string = "currency"

Desired semantic type of the converted data. Make data a currency with "type": "currency"


decimal: string = "."

Separator to mark the decimal part. Use "." or "," to indicate how decimal values are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the thousands separator. E.g. "decimal": "." assumes that the period "." is used to separate decimals and "," thousands, as in the number string "12,173.12".

Must be one of: ".", ","


thousand: string = ","

Separator to mark the thousands. Use "." or "," to indicate how thousands are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the decimal separator. E.g. "thousand": "." assumes that the period "." is used to separate thousands and "," decimals, as in the number string "12.173,12".

Must be one of: ".", ","

type: string = "list[currency]"

Desired semantic type of the converted data. Make data a currency with "type": "list[currency]"


brackets: string | null

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.


decimal: string = "."

Separator to mark the decimal part. Use "." or "," to indicate how decimal values are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the thousands separator. E.g. "decimal": "." assumes that the period "." is used to separate decimals and "," thousands, as in the number string "12,173.12".

Must be one of: ".", ","


thousand: string = ","

Separator to mark the thousands. Use "." or "," to indicate how thousands are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the decimal separator. E.g. "thousand": "." assumes that the period "." is used to separate thousands and "," decimals, as in the number string "12.173,12".

Must be one of: ".", ","

type: string = "date"

Desired semantic type of the converted data. Convert data to the Date type with "type": "date". This will allow e.g. the extraction of particular components of the date, like year, month, or day of week (with extract_date_components), the calculation of elapsed time since a given date (time_interval), as well as enable the use of the Trends section in graphext's interface.


format: string

Format to parse date strings. When input data contains strings (dates in text format), indicate how these strings are constructed. E.g. if dates are in the format "21/07/2020", use "format": “%d/%m/%Y” to indicate the day, month, year order and the use of "/" as the separator of date components. For more details on how to indicate the different components of the date format see e.g. Python's strftime.


unit: string

Unit of timestamp data. When input data is numeric, indicates whether the numbers correspond to seconds, milliseconds, microseconds or nanoseconds. Dates will be interpreted as so many elapsed units since the origin (see origin parameter below).

For example, with "unit": "ms" and "origin": "unix" (the default), this would calculate the date corresponding to x milliseconds since 01/01/1970, where x denotes the input numbers.

Must be one of: "D", "s", "ms", "us", "ns"

type: string = "list[date]"

Desired semantic type of the converted data. Convert data to the Date type with "type": "date". This will allow e.g. the extraction of particular components of the date, like year, month, or day of week (with extract_date_components), the calculation of elapsed time since a given date (time_interval), as well as enable the use of the Trends section in graphext's interface.


brackets: string | null

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.


format: string

Format to parse date strings. When input data contains strings (dates in text format), indicate how these strings are constructed. E.g. if dates are in the format "21/07/2020", use "format": “%d/%m/%Y” to indicate the day, month, year order and the use of "/" as the separator of date components. For more details on how to indicate the different components of the date format see e.g. Python's strftime.


unit: string

Unit of timestamp data. When input data is numeric, indicates whether the numbers correspond to seconds, milliseconds, microseconds or nanoseconds. Dates will be interpreted as so many elapsed units since the origin (see origin parameter below).

For example, with "unit": "ms" and "origin": "unix" (the default), this would calculate the date corresponding to x milliseconds since 01/01/1970, where x denotes the input numbers.

Must be one of: "D", "s", "ms", "us", "ns"

type: string = "text"

Desired semantic type of the converted data. Convert data to the Text type with "type": "text". This allows the resulting column to be used e.g. in steps involving natural language processing (NLP).

type: string = "category"

Desired semantic type of the converted data. Convert data to the Category type with "type": "category". This will influence how the column is presented in graphext's interface, and enables the use of steps like trim_frequencies, merge_categories etc.

type: string = "list[category]"

Desired semantic type of the converted data. Convert data to the Category type with "type": "list[category]". This will influence how the column is presented in graphext's interface, and enables the use of steps like trim_frequencies, merge_categories etc.


brackets: string | null

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.

type: string = "url"

Desired semantic type of the converted data. Convert data to the Url type with "type": "url". This will allow e.g. fetching of any textual content found at the specified Url (with fetch_url_content), or linking of a network node in the interface to the given website (configure_node_url).

type: string = "list[url]"

Desired semantic type of the converted data. Convert data to the Url type with "type": "list[url]"


brackets: string | null

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.

type: string = "sex"

Desired semantic type of the converted data. Convert data to the Sex type with "type": "sex". This is essentially a categorical type with two predefined values for male and female. How the two categories are detected/parsed in raw data, and with which label to represent them can be configured with below parameters.


labels: object = {'female': 'Female', 'male': 'Male'}

The labels used to identify female and male categories. An object of the form {"female": "female_label", "male": "male_label"}, indicating how to represent each sex in the data. E.g. as F/M or ♀️/♂️ etc.

Items in labels

female: string = "Female"

Label for the female category.


male: string = "Male"

Label for the male category.

type: string = "list[sex]"

Desired semantic type of the converted data. Convert data to the sex type with "type": "list[sex]"


brackets: string | null

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.


labels: object = {'female': 'Female', 'male': 'Male'}

The labels used to identify female and male categories. An object of the form {"female": "female_label", "male": "male_label"}, indicating how to represent each sex in the data. E.g. as F/M or ♀️/♂️ etc.

Items in labels

female: string = "Female"

Label for the female category.


male: string = "Male"

Label for the male category.

type: string = "boolean"

Desired semantic type of the converted data. Convert data to the Boolean (logical) type with "type": "boolean". If the input data is numeric, 0s will be treated as False and all other values as True. If the input data contains text strings, the values {"t", "true", "1", "1.0"} in lower- or uppercase will be interpreted as True, and the values {"f", "false", "0", "0.0"} as False. Any remaining values will be converted to NaN (missing).

type: string = "list[boolean]"

Desired semantic type of the converted data. Convert data to the Boolean (logical) type with "type": "list[boolean]". If the input data is numeric, 0s will be treated as False and all other values as True. If the input data contains text strings, the values {"t", "true", "1", "1.0"} in lower- or uppercase will be interpreted as True, and the values {"f", "false", "0", "0.0"} as False. Any remaining values will be converted to NaN (missing).


brackets: string | null

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.