An advanced query allows you to filter data using a simple text query, the syntax of which may be familiar to users of SQL or Elasticsearch. For example, you may want to find the tweets with more than a certain amount of retweets - “RT > 12”; all customers that have paid their invoices - “invoice_paid: True”; specific patterns in the title of documents - “title: Obama”; or find the votes whose values are greater than the median - “num_votes > MEDIAN”.Advances queries allow you to succinctly express such conditions, as well as compose multiple conditions in a single query.
In categorical or text columns you can search for specific words (tokens) or categories. You can combine searches using boolean logic (i.e. string together multiple conditions using the AND/OR keywords).Select rows where the “department” column contains “engineering”:
Copy
Ask AI
department: engineering
or where the “text” column contains “he” and “she”:
Copy
Ask AI
text: he AND she
or filter the four most frequent categories in the “department” column:
You can compare the values in numeric and date columns against constants to test for equality or against a desired range (mininum and/or maximum). For queries in date columns you can use dates expressed in the ISO format 8061 standard (e.g. “2019-01-01”).
In addition to using constants and ranges, you may also compare date and numeric values against certain column statistics. Specifically, all the following statistics are supported:
MIN: minimum value
MAX: maximum value
MEAN: the average
P25: or Q1, 25% of data lies below this point
MEDIAN: that is, P50 or Q2. This is the median of the data and 50% of it lies below this point
Dates have to be specified in the ISO format 8061 standard, where the date part is required and the time part optional. I.e. both the following dates are valid: 2020-02-23 and 2020-02-23T14:30:00. In the first example, only year, month and date are specified, while the second includes the time also (hours, minutes and seconds)To select all rows whose “date” field falls into the year 2019:
Copy
Ask AI
date: >= 2019-01-01 AND <= 2019-12-31
Or to match a specific date:
Copy
Ask AI
date: 2019-01-01
The = operator is implicit in the above query, so the following query produces the same result:
Copy
Ask AI
date: =2019-01-01
Select dates before 2020, January 1st, 12:35 PM:
Copy
Ask AI
date: <= 2020-01-01T12:35:00
Dates before the median date in the same column:
Copy
Ask AI
date: >= 2020-02-23 AND <= MEDIAN
And you may of course also combine constants with calculated statistics:
Copy
Ask AI
date: >= 2020-02-23 AND <= P75
Note that queries in date columns only support =, >= or operators, while > and < are not supported.
In numeric columns, numbers can also be specified using scientific notation. The following two numbers are both valid and represent the same number: 145000 and 1.45e5.To select rows whose age field falls into the range 10 to 55:
Copy
Ask AI
age: >= 10 AND <= 55
To match a specific number:
Copy
Ask AI
age: 12
The = operator is implicit in the above query, so the following query produces the same result:
Copy
Ask AI
age: =12
Include ages from 10 up to but not including 55 (exclusive smaller/greater than):
Copy
Ask AI
age: >= 10 AND < 55
Using scientific notation:
Copy
Ask AI
age: >= 5.6e-4 AND <= 9.35e-2
Using a computed statistic:
Copy
Ask AI
age: >= P25
Combining both constants and computed statistics:
Copy
Ask AI
age: >= 5 AND < MEAN
Queries in numeric columns support =, >, >=, < and operators.