| add_noise | | Add noise to a column with numbers or lists of numbers |
| calculate | | Evaluates a formula containing basic arithmetic over a dataset’s columns |
| cast | ⚡ | Interprets and changes a column’s data to another (semantic) type |
| concatenate | ⚡ | Concatenate columns as text or lists with optional separator as well as pre- and postfix |
| count_unique | ⚡ | Counts the number of unique elements in each list/array of the input column |
| derive_column | ⚡ | Derive a new column with a custom JS script |
| discretize_on_quantiles | ⚡ | Discretize column into bins based on quantiles |
| discretize_on_values | ⚡ | Discretize column by binning its values using explicitly specified cuts points |
| divide | ⚡ | Divide two or more numeric columns in given order |
| equal | ⚡ | Check the row-wise equality of all input columns |
| explode | ⚡ | Explode (extract) items from column(s) of lists into separate rows |
| extract_date_component | ⚡ | Extract a component such as day, week, weekday etc. from a date column |
| extract_emoji | | Parse texts and extract their emoji |
| extract_entities | | Parse texts and extract the entities mentioned (persons, organizations etc.) |
| extract_hashtags | | Parse texts and extract any hashtags mentioned |
| extract_json_values | ⚡ | Extract values from JSON columns using JsonPath |
| extract_keywords | | Parse and extract keywords from texts |
| extract_mentions | | Parse texts and extract any mentions detected |
| extract_ngrams | | Parse texts and extract their n-grams |
| extract_range | ⚡ | Create a copy of a column nullifying values outside a specified range |
| extract_regex | ⚡ | Extract parts of texts detected using regular expressions |
| extract_text_features | | Parse and process texts to extract multiple features at once |
| extract_url_components | | Extract components from an URL |
| is_missing | ⚡ | Check for missing values in a given column |
| label_bios | | Categorize people into fields of occupation using their bios (biographies) |
| label_categories | ⚡ | Relabel categories based on the top terms in each category |
| label_encode | | Encode categories with values between 0 and N-1, where N is the number of unique categories |
| label_holidays | | Indicate if there are any holidays for given date, location pairs |
| label_political_subtopics | | Categorize the political sub-topics of texts in Spanish |
| label_political_topics | | Categorize the political topics of texts in Spanish |
| label_texts_containing | | Categorize texts containing specific keywords with custom labels |
| label_texts_containing_from_query | | Label texts given an elastic-like query string |
| length | ⚡ | Calculates the length of lists (number of elements) or texts/categories (number of characters) |
| make_constant | ⚡ | Creates a new constant column (with a single unique value) of the same length as the input column |
| math_func | | Applies a mathematical function to the values of a (single) numeric column |
| merge_similar_semantics | | Group categories with similar meanings |
| merge_similar_spellings | | Group categories with similar spellings |
| multiply | ⚡ | Multiply two or more numeric columns |
| normalize | ⚡ | Normalizes a numerical column by subtracting the mean and dividing by its standard deviation |
| observed_duration | ⚡ | Calculate the duration between two dates and determine whether an event was observed before a specified observation da… |
| order_categories | ⚡ | (Re-)order the categories of a categorical column |
| pandas_func | | Applies an arbitrary pandas supported function to the values of an input column |
| pct_change | | Calculate percentage change between consecutive numbers in a numeric column |
| percentile_rank | ⚡ | Convert the values in a numeric or date column into their percentile rank |
| query | ⚡ | Generate a boolean column based on a query string, marking rows that match the condition |
| replace_missing | ⚡ | Replace missing values (NaNs) with either a specified constant value or the result of a given function |
| replace_regex | ⚡ | Replace parts of text detected with a regular expression |
| replace_values | ⚡ | Replace specified values in a column with new ones |
| scale | ⚡ | Scales the values of a numerical column to lie between a specified minimum and maximum |
| segment_rows | ⚡ | Create a segmentation using graphext’s advanced query syntax (similar to Elasticsearch) |
| slice | ⚡ | Extract a range/slice of elements from a column of texts or lists |
| split_string | ⚡ | Split a single column containing texts into two |
| subtract | ⚡ | Subtract two or more numeric columns |
| sum | ⚡ | Calculate the row-wise sum of numeric columns |
| time_interval | ⚡ | Calculates the duration of a time interval between two dates (datetimes/timestamps) |
| tokenize | | Parse texts and separate them into lists of tokens (words, lemmas, etc.) |
| trim_frequencies | | Remove values whose frequencies (counts) are above/below a given threshold |
| unique | ⚡ | Extracts the unique elements in each list/array |
| unpack_list | | Unpack (extract) items from a column of lists into separate columns |