Prepare
Transform
Step | Fast | Description |
---|---|---|
add_noise | Add noise to a column with numbers or lists of numbers | |
calculate | Evaluates a formula containing basic arithmetic over a dataset’s columns | |
cast | ⚡ | Interprets and changes a column’s data to another (semantic) type |
concatenate | ⚡ | Concatenate columns as text or lists with optional separator as well as pre- and postfix |
count_unique | ⚡ | Counts the number of unique elements in each list/array of the input column |
derive_column | ⚡ | Derive a new column with a custom JS script |
discretize_on_quantiles | ⚡ | Discretize column into bins based on quantiles |
discretize_on_values | Discretize column by binning its values using explicitly specified cuts points | |
divide | ⚡ | Divide two or more numeric columns in given order |
equal | ⚡ | Check the row-wise equality of all input columns |
explode | Explode (extract) items from column(s) of lists into separate rows | |
extract_date_component | ⚡ | Extract a component such as day, week, weekday etc. from a date column |
extract_emoji | Parse texts and extract their emoji | |
extract_entities | Parse texts and extract the entities mentioned (persons, organizations etc.) | |
extract_hashtags | Parse texts and extract any hashtags mentioned | |
extract_json_values | ⚡ | Extract values from JSON columns using JsonPath |
extract_keywords | Parse and extract keywords from texts | |
extract_mentions | Parse texts and extract any mentions detected | |
extract_ngrams | Parse texts and extract their n-grams | |
extract_range | ⚡ | Create a copy of a column nullifying values outside a specified range |
extract_regex | ⚡ | Extract parts of texts detected using regular expressions |
extract_text_features | Parse and process texts to extract multiple features at once | |
extract_url_components | Extract components from an URL | |
is_missing | ⚡ | Check for missing values in a given column |
label_bios | Categorize people into fields of occupation using their bios (biographies) | |
label_categories | ⚡ | Relabel categories based on the top terms in each category |
label_encode | Encode categories with values between 0 and N-1, where N is the number of unique categories | |
label_holidays | Indicate if there are any holidays for given date, location pairs | |
label_political_subtopics | Categorize the political sub-topics of texts in Spanish | |
label_political_topics | Categorize the political topics of texts in Spanish | |
label_texts_containing | Categorize texts containing specific keywords with custom labels | |
label_texts_containing_from_query | Label texts given an elastic-like query string | |
length | ⚡ | Calculates the length of lists (number of elements) or texts/categories (number of characters) |
make_constant | ⚡ | Creates a new constant column (with a single unique value) of the same length as the input column |
math_func | Applies a mathematical function to the values of a (single) numeric column | |
merge_similar_semantics | Group categories with similar meanings | |
merge_similar_spellings | Group categories with similar spellings | |
multiply | ⚡ | Multiply two or more numeric columns |
normalize | ⚡ | Normalizes a numerical column by subtracting the mean and dividing by its standard deviation |
observed_duration | ⚡ | Calculate the duration between two dates and determine whether an event was observed before a specified observation da… |
order_categories | ⚡ | (Re-)order the categories of a categorical column |
pandas_func | Applies an arbitrary pandas supported function to the values of an input column | |
pct_change | Calculate percentage change between consecutive numbers in a numeric column | |
percentile_rank | ⚡ | Convert the values in a numeric or date column into their percentile rank |
query | ⚡ | Generate a boolean column based on a query string, marking rows that match the condition |
replace_missing | ⚡ | Replace missing values (NaNs) with either a specified constant value or the result of a given function |
replace_regex | ⚡ | Replace parts of text detected with a regular expression |
replace_values | ⚡ | Replace specified values in a column with new ones |
scale | ⚡ | Scales the values of a numerical column to lie between a specified minimum and maximum |
segment_rows | ⚡ | Create a segmentation using graphext’s advanced query syntax (similar to Elasticsearch) |
slice | ⚡ | Extract a range/slice of elements from a column of texts or lists |
split_string | ⚡ | Split a single column containing texts into two |
subtract | ⚡ | Subtract two or more numeric columns |
sum | ⚡ | Calculate the row-wise sum of numeric columns |
time_interval | ⚡ | Calculates the duration of a time interval between two dates (datetimes/timestamps) |
tokenize | Parse texts and separate them into lists of tokens (words, lemmas, etc.) | |
trim_frequencies | Remove values whose frequencies (counts) are above/below a given threshold | |
unique | ⚡ | Extracts the unique elements in each list/array |
unpack_list | Unpack (extract) items from a column of lists into separate columns |
Was this page helpful?