Documentation Index
Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
Use this file to discover all available pages before exploring further.
| Step | Fast | Description |
|---|---|---|
| add_noise | Add noise to a column with numbers or lists of numbers | |
| calculate | Evaluates a formula containing basic arithmetic over a dataset’s columns | |
| cast | ⚡ | Interprets and changes a column’s data to another (semantic) type |
| concatenate | ⚡ | Concatenate columns as text or lists with optional separator as well as pre- and postfix |
| count_unique | ⚡ | Counts the number of unique elements in each list/array of the input column |
| derive_column | ⚡ | Derive a new column with a custom JS script |
| discretize_on_quantiles | ⚡ | Discretize column into bins based on quantiles |
| discretize_on_values | ⚡ | Discretize column by binning its values using explicitly specified cuts points |
| divide | ⚡ | Divide two or more numeric columns in given order |
| equal | ⚡ | Check the row-wise equality of all input columns |
| explode | ⚡ | Explode (extract) items from column(s) of lists into separate rows |
| extract_date_component | ⚡ | Extract a component such as day, week, weekday etc. from a date column |
| extract_emoji | Parse texts and extract their emoji | |
| extract_entities | Parse texts and extract the entities mentioned (persons, organizations etc.) | |
| extract_hashtags | Parse texts and extract any hashtags mentioned | |
| extract_json_values | ⚡ | Extract values from JSON columns using JsonPath |
| extract_keywords | Parse and extract keywords from texts | |
| extract_mentions | Parse texts and extract any mentions detected | |
| extract_ngrams | Parse texts and extract their n-grams | |
| extract_range | ⚡ | Create a copy of a column nullifying values outside a specified range |
| extract_regex | ⚡ | Extract parts of texts detected using regular expressions |
| extract_text_features | Parse and process texts to extract multiple features at once | |
| extract_url_components | Extract components from an URL | |
| is_missing | ⚡ | Check for missing values in a given column |
| label_bios | Categorize people into fields of occupation using their bios (biographies) | |
| label_categories | ⚡ | Relabel categories based on the top terms in each category |
| label_encode | Encode categories with values between 0 and N-1, where N is the number of unique categories | |
| label_holidays | Indicate if there are any holidays for given date, location pairs | |
| label_political_subtopics | Categorize the political sub-topics of texts in Spanish | |
| label_political_topics | Categorize the political topics of texts in Spanish | |
| label_texts_containing | Categorize texts containing specific keywords with custom labels | |
| label_texts_containing_from_query | Label texts given an elastic-like query string | |
| length | ⚡ | Calculates the length of lists (number of elements) or texts/categories (number of characters) |
| make_constant | ⚡ | Creates a new constant column (with a single unique value) of the same length as the input column |
| math_func | Applies a mathematical function to the values of a (single) numeric column | |
| merge_similar_semantics | Group categories with similar meanings | |
| merge_similar_spellings | Group categories with similar spellings | |
| multiply | ⚡ | Multiply two or more numeric columns |
| normalize | ⚡ | Normalizes a numerical column by subtracting the mean and dividing by its standard deviation |
| observed_duration | ⚡ | Calculate the duration between two dates and determine whether an event was observed before a specified observation da… |
| order_categories | ⚡ | (Re-)order the categories of a categorical column |
| pandas_func | Applies an arbitrary pandas supported function to the values of an input column | |
| pct_change | Calculate percentage change between consecutive numbers in a numeric column | |
| percentile_rank | ⚡ | Convert the values in a numeric or date column into their percentile rank |
| query | ⚡ | Generate a boolean column based on a query string, marking rows that match the condition |
| replace_missing | ⚡ | Replace missing values (NaNs) with either a specified constant value or the result of a given function |
| replace_regex | ⚡ | Replace parts of text detected with a regular expression |
| replace_values | ⚡ | Replace specified values in a column with new ones |
| scale | ⚡ | Scales the values of a numerical column to lie between a specified minimum and maximum |
| segment_rows | ⚡ | Create a segmentation using graphext’s advanced query syntax (similar to Elasticsearch) |
| slice | ⚡ | Extract a range/slice of elements from a column of texts or lists |
| split_string | ⚡ | Split a single column containing texts into two |
| subtract | ⚡ | Subtract two or more numeric columns |
| sum | ⚡ | Calculate the row-wise sum of numeric columns |
| time_interval | ⚡ | Calculates the duration of a time interval between two dates (datetimes/timestamps) |
| tokenize | Parse texts and separate them into lists of tokens (words, lemmas, etc.) | |
| trim_frequencies | Remove values whose frequencies (counts) are above/below a given threshold | |
| unique | ⚡ | Extracts the unique elements in each list/array |
| unpack_list | Unpack (extract) items from a column of lists into separate columns |