cast
Interprets and changes a column’s data to another (semantic) type.
This has two consequences:
- It will allow the resulting column to be used by steps only accepting the new type,
e.g. when casting a column of concatenated texts to the
"url"
type, so that it may be used where Urls are expected (e.g. the stepfetch_url_content
). - It will change any values not conformant with the new type to the missing value (NaN). E.g.,
casting a column of mixed data containing numbers to the
"number"
type, will replace all values that cannot be read as numbers with NaN.
Note that for each possible type a column can be cast to (via the "type"
parameter, e.g. "number"
,
"category"
etc.), the steps accepts different configuration parameters. See the subsections under
Parameters below for further details.
Usage
The following example shows how the step can be used in a recipe.
E.g. to simply convert a text
column to a category
column, use:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Desired semantic type of the converted data.
Make data numerical with "type": "number"
.
Separator to mark the decimal part.
Use ”.” or ”,” to indicate how decimal values are separated when parsing text strings
into numerical format. It is automatically assumed that the other character is used as
the thousands separator. E.g. "decimal": "."
assumes that the period ”.” is used to
separate decimals and ”,” thousands, as in the number string “12,173.12”.
Values must be one of the following:
.
,
Separator to mark the thousands.
Use ”.” or ”,” to indicate how thousands are separated when parsing text strings
into numerical format. It is automatically assumed that the other character is used as
the decimal separator. E.g. "thousand": "."
assumes that the period ”.” is used to
separate thousands and ”,” decimals, as in the number string “12.173,12”.
Values must be one of the following:
.
,
Was this page helpful?