add_noise | | Add noise to a column with numbers or lists of numbers |
calculate | | Evaluates a formula containing basic arithmetic over a dataset’s columns |
cast | ⚡ | Interprets and changes a column’s data to another (semantic) type |
concatenate | ⚡ | Concatenate columns as text or lists with optional separator as well as pre- and postfix |
count_unique | ⚡ | Counts the number of unique elements in each list/array of the input column |
derive_column | ⚡ | Derive a new column with a custom JS script |
discretize_on_quantiles | ⚡ | Discretize column into bins based on quantiles |
discretize_on_values | ⚡ | Discretize column by binning its values using explicitly specified cuts points |
divide | ⚡ | Divide two or more numeric columns in given order |
equal | ⚡ | Check the row-wise equality of all input columns |
explode | ⚡ | Explode (extract) items from column(s) of lists into separate rows |
extract_date_component | ⚡ | Extract a component such as day, week, weekday etc. from a date column |
extract_emoji | | Parse texts and extract their emoji |
extract_entities | | Parse texts and extract the entities mentioned (persons, organizations etc.) |
extract_hashtags | | Parse texts and extract any hashtags mentioned |
extract_json_values | ⚡ | Extract values from JSON columns using JsonPath |
extract_keywords | | Parse and extract keywords from texts |
extract_mentions | | Parse texts and extract any mentions detected |
extract_ngrams | | Parse texts and extract their n-grams |
extract_range | ⚡ | Create a copy of a column nullifying values outside a specified range |
extract_regex | ⚡ | Extract parts of texts detected using regular expressions |
extract_text_features | | Parse and process texts to extract multiple features at once |
extract_url_components | | Extract components from an URL |
is_missing | ⚡ | Check for missing values in a given column |
label_bios | | Categorize people into fields of occupation using their bios (biographies) |
label_categories | ⚡ | Relabel categories based on the top terms in each category |
label_encode | | Encode categories with values between 0 and N-1, where N is the number of unique categories |
label_holidays | | Indicate if there are any holidays for given date, location pairs |
label_political_subtopics | | Categorize the political sub-topics of texts in Spanish |
label_political_topics | | Categorize the political topics of texts in Spanish |
label_texts_containing | | Categorize texts containing specific keywords with custom labels |
label_texts_containing_from_query | | Label texts given an elastic-like query string |
length | ⚡ | Calculates the length of lists (number of elements) or texts/categories (number of characters) |
make_constant | ⚡ | Creates a new constant column (with a single unique value) of the same length as the input column |
math_func | | Applies a mathematical function to the values of a (single) numeric column |
merge_similar_semantics | | Group categories with similar meanings |
merge_similar_spellings | | Group categories with similar spellings |
multiply | ⚡ | Multiply two or more numeric columns |
normalize | ⚡ | Normalizes a numerical column by subtracting the mean and dividing by its standard deviation |
observed_duration | ⚡ | Calculate the duration between two dates and determine whether an event was observed before a specified observation da… |
order_categories | ⚡ | (Re-)order the categories of a categorical column |
pandas_func | | Applies an arbitrary pandas supported function to the values of an input column |
pct_change | | Calculate percentage change between consecutive numbers in a numeric column |
percentile_rank | ⚡ | Convert the values in a numeric or date column into their percentile rank |
query | ⚡ | Generate a boolean column based on a query string, marking rows that match the condition |
replace_missing | ⚡ | Replace missing values (NaNs) with either a specified constant value or the result of a given function |
replace_regex | ⚡ | Replace parts of text detected with a regular expression |
replace_values | ⚡ | Replace specified values in a column with new ones |
scale | ⚡ | Scales the values of a numerical column to lie between a specified minimum and maximum |
segment_rows | ⚡ | Create a segmentation using graphext’s advanced query syntax (similar to Elasticsearch) |
slice | ⚡ | Extract a range/slice of elements from a column of texts or lists |
split_string | ⚡ | Split a single column containing texts into two |
subtract | ⚡ | Subtract two or more numeric columns |
sum | ⚡ | Calculate the row-wise sum of numeric columns |
time_interval | ⚡ | Calculates the duration of a time interval between two dates (datetimes/timestamps) |
tokenize | | Parse texts and separate them into lists of tokens (words, lemmas, etc.) |
trim_frequencies | | Remove values whose frequencies (counts) are above/below a given threshold |
unique | ⚡ | Extracts the unique elements in each list/array |
unpack_list | | Unpack (extract) items from a column of lists into separate columns |