Query Types
Couchbase Full Text Search supports multiple types of query.
Introduction to Query Types
Full Text Search allows text-data to be queried. Multiple options are provided for ensuring the right kinds of match. This page describes the purpose of each query-type, and provides sample JSON objects that indicate how queries can be constructed.
Available query-types include:
- Simple Queries
-
Accept input-text in the form of words and phrases, and attempt to find matches across bodies of text that have been indexed. Analyzers are applied to both input and target, potentially to strip out unnecessary characters, reduce words to the basic stems on which matching should occur, handle punctuation, and more. Additionally, match accuracy-levels can be specified; and multiple queries can be expressed together, with their respective priorities boosted, (to ensure their results' prominence in the eventual result-set).
- Compound Queries
-
Accept multiple queries simultaneously, and return either the conjunction of results from the result-sets, or a disjunction.
- Range Queries
-
Accept ranges for dates and numbers, and return documents that contain values within those ranges.
- Query String Queries
-
Accept query strings, which express query-requirements in a special syntax.
- Geospatial Queries
-
Accept longitude-latitude coordinate pairs, in order to return documents that specify a geographical location.
- Non-Analytic Queries
-
Accept words and phrases on which exact matches only are returned. No analysis is performed.
- Special Queries
-
For testing purposes, return either all of the documents in an index, or none.
These query-types are explained in greater detail below.
Examples are provided, using the Couchbase REST API query-syntax.
(Note that Full Text Search can also be performed with the Couchbase Web Console and the Couchbase SDK.)
The JSON data refers to the travel-sample
bucket, and assumes that demonstration full text indexes have been created, as described in Demonstration Indexes.
To run the examples using curl
, use the following syntax:
$ curl -u Administrator:password -X POST -H "Content-Type: application/json" \
-d '{your query in JSON here...}' \
http://localhost:8094/api/index/index_name/query
Note that the examples below show only the JSON fragments that constitute non-generic parts of the queries they describe. For actual use in a Full Text Search, these JSON fragments should be wrapped in the following generic configuration:
{
"explain": false,
"fields": [
"*"
],
"highlight": {},
"query":{ your_query_details_here }
}
For more information on using the REST API to perform queries, see Searching with the REST API.
Simple Queries
Match Query
A match query analyzes input text, and uses the results to query an index. Options include specifying an analyzer, performing a fuzzy match, and performing a prefix match. By default, the analyzer used for the search text is what was set for the specified field, during index creation. Note that if the field isn’t specified, the match query will target the _all field within the index. Including content within the _all field is a setting during index creation. For information on analyzers, see Understanding Analyzers.
When fuzzy matching is used, if the single parameter is set to a non-zero integer, the analyzed text is matched with a corresponding level of fuzziness. The maximum supported fuzziness is 2.
When a prefix match is used, the prefix_length
parameter specifies that for a match to occur, a prefix of specified length must be shared by the input-term and the target text-element.
When an operator field is used, the operator
decides the boolean logic used to interpret the text in match field. For example, an operator value of "and"
means the match query text would be treated like location
AND hostel
.
The default operator value of "or"
means the match query text would be treated like location
OR hostel
.
The following JSON object demonstrates specification of a match query:
{
"match": "location hostel",
"field": "reviews.content",
"analyzer": "standard",
"fuzziness": 2,
"prefix_length": 4,
"operator": "and"
}
A match query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Match Phrase Query
The input text is analyzed, and a phrase query is built with the terms resulting from the analysis. This type of query searches for terms in the target that occur in the positions and offsets indicated by the input: this depends on term vectors, which must have been included in the creation of the index used for the search.
For example, a match phrase query for location for functions
is matched with locate the function
, if the standard analyzer is used: this analyzer uses a stemmer, which tokenizes location
and locate
to locat
, and reduces functions
and function
to function
.
Additionally, the analyzer employs stop removal, which removes small and less significant words from input and target text, so that matches are attempted on only the more significant elements of vocabulary: in this case for
and the
are removed.
Following this processing, the tokens locat
and function
are recognized as common to both input and target; and also as being both in the same sequence as, and at the same distance from one another; and therefore a match is made.
{
"match_phrase": "very nice",
"field": "reviews.content"
}
A match phrase query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Fuzzy Query
A fuzzy query matches terms within a specified edit (or Levenshtein) distance: meaning that terms are considered to match when they are to a specified degree similar, rather than exact. A common prefix of a stated length may be also specified as a requirement for matching.
Please note that the fuzzy query is a non-analytic query, meaning it won’t perform any text analysis on the query text.
Fuzziness is specified by means of a single integer. For example:
{
"term": "interest",
"field": "reviews.content",
"fuzziness": 2
}
Fuzziness is demonstrated by means of the Java SDK, in the context of the term query (see below), in Searching from the SDK. Note that two such queries are specified, with the difference in fuzziness between them resulting in different forms of match, and different sizes of result-sets.
Prefix Query
A prefix query finds documents containing terms that start with the specified prefix. Please note that the prefix query is a non-analytic query, meaning it won’t perform any text analysis on the query text.
{
"prefix": "inter",
"field": "reviews.content"
}
Regexp Query
A regexp query finds documents containing terms that match the specified regular expression. Please note that the regex query is a non-analytic query, meaning it won’t perform any text analysis on the query text.
{
"regexp": "inter.+",
"field": "reviews.content"
}
A regexp query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Wildcard Query
A wildcard query uses a wildcard expression, to search within individual terms for matches.
Wildcard expressions can be any single character (?
) or zero to many characters (*
).
Wildcard expressions can appear in the middle or end of a term, but not at the beginning.
Please note that the wildcard query is a non-analytic query, meaning it won’t perform any text analysis on the query text.
{
"wildcard": "inter*",
"field": "reviews.content"
}
A wildcard query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Boolean Field Query
A boolean field query searches fields that contain boolean true
or false
values.
A boolean field query searches the actual content of the field, and should not be confused with the boolean queries (described below, in the section on compound queries) that modify whether a query must, should, or must not be present.
{
"bool": true,
"field": "free_breakfast"
}
Compound Queries
Conjunction Query (AND)
A conjunction query contains multiple child queries. Its result documents must satisfy all of the child queries.
{
"conjuncts":[
{"field":"reviews.content", "match": "location"},
{"field":"free_breakfast", "bool": true}
]
}
A conjunction query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Disjunction Query (OR)
A disjunction query contains multiple child queries.
Its result documents must satisfy a configurable min
number of child queries.
By default this min
is set to 1.
For example, if three child queries — A, B, and C — are specified, a min
of 1 specifies that the result documents should be those returned uniquely for A (with all returned uniquely for B and C, and all returned commonly for A, B, and C, omitted).
{
"disjuncts":[
{"field":"reviews.content", "match": "location"},
{"field":"free_breakfast", "bool": true}
]
}
A disjunction query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Boolean Query
A boolean query is a combination of conjunction and disjunction queries. A boolean query takes three lists of queries:
-
must
: Result documents must satisfy all of these queries. -
should
: Result documents should satisfy these queries. -
must not
: Result documents must not satisfy any of these queries.
{
"must": {
"conjuncts":[{"field":"reviews.content", "match": "location"}]},
"must_not": {
"disjuncts": [{"field":"free_breakfast", "bool": false}]},
"should": {
"disjuncts": [{"field":"free_breakfast", "bool": true}]}
}
Doc ID Query
A doc ID query returns the indexed document or documents among the specified set. This is typically used in conjunction queries, to restrict the scope of other queries’ output.
{ "ids": [ "hotel_10158", "hotel_10159" ] }
A doc ID Query is demonstrated by means of the Java SDK, in Searching from the SDK.
Range Queries
Date Range Query
A date range query finds documents containing a date value, in the specified field within the specified range.
Dates should be in the format specified by RFC-3339, which is a specific profile of ISO-8601.
Define the endpoints using the fields start
and end
.
One endpoint can be omitted, but not both.
The inclusive_start
and inclusive_end
properties in the query JSON control whether or not the endpoints are included or excluded.
{
"start": "2001-10-09T10:20:30-08:00",
"end": "2016-10-31",
"inclusive_start": false,
"inclusive_end": false,
"field": "review_date"
}
Numeric Range Query
A numeric range query finds documents containing a numeric value in the specified field within the specified range.
Define the endpoints using the fields min
and max
.
You can omit one endpoint, but not both.
The inclusive_min
and inclusive_max
properties control whether or not the endpoints are included or excluded.
By default, min
is inclusive and max
is exclusive.
{
"min": 100, "max": 1000,
"inclusive_min": false,
"inclusive_max": false,
"field": "id"
}
A numeric range Query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Term Range Query
A term range query finds documents containing a term in the specified field within the specified range.
Define the endpoints using the fields min
and max
.
You can omit one endpoint, but not both.
The inclusive_min
and inclusive_max
properties control whether or not the endpoints are included or excluded.
By default, min
is inclusive and max
is exclusive.
{
"min": "foo", "max": "foof",
"inclusive_min": false,
"inclusive_max": false,
"field": "desc"
}
Query String Query
A query string can be used, to express a given query by means of a special syntax.
{ "query": "+nice +view" }
A query string Query is demonstrated by means of the Java SDK, in Searching from the SDK. Note also that the Full Text Searches conducted with the Couchbase Web Console themselves use query strings. (See Searching from the UI.)
Certain queries supported by FTS are not yet supported by the query string syntax. These include wildcards and regular expressions.
More detailed information is provided in Query String Queries.
Non-Analytic Queries
Term and Phrase queries support no analysis on their inputs. This means that only exact matches are returned.
In most cases, given the benefits of using analyzers, use of match and match phrase queries is preferable to that of term and phrase. For information on analyzers, see Understanding Analyzers.
Term Query
A term query is the simplest possible query. It performs an exact match in the index for the provided term.
{
"term": "locate",
"field": "reviews.content"
}
Term queries are also demonstrated by means of the Java SDK, in Searching from the SDK.
Phrase Query
A phrase query searches for terms occurring in the specified position and offsets. It performs an exact term-match for all the phrase-constituents, without using an analyzer.
{
"terms": ["nice", "view"],
"field": "reviews.content"
}
A phrase query is also demonstrated by means of the Java SDK, in Searching from the SDK.
Geospatial Queries
Geospatial queries return documents that each specify a geographical location. The location-data provided by a geospatial query can be any of the following:
-
A location, specified as a longitude-latitude coordinate pair; and a distance, in miles. The location determines the center of a circle whose radius-length is the specified distance. Documents are returned if they reference a location within the circle.
-
Two longitude-latitude coordinate pairs. These are respectively taken to indicate the top left and bottom right corners of a rectangular bounding box. Documents are returned if they reference a location within the box.
-
An array of three of more longitude-latitude coordinate pairs. Each of the pairs is taken to indicate one corner of a polygonal bounding box. Documents are returned if they reference a location within the box.
A geospatial query must be applied to an index that applies the geopoint type mapping to the document-field that contains the target longitude-latitude coordinate pair.
More detailed information is provided in Geospatial Queries.
Special Queries
Special queries are usually employed either in combination with other queries, or to test the system.
Match All Query
Matches all documents in an index, irrespective of terms.
For example, if an index is created on the travel-sample
bucket for documents of type zucchini
, the match all query returns all document IDs from the travel-sample
bucket, even though the bucket contains no documents of type zucchini
.
{ "match_all": {} }