You are viewing the documentation for a prerelease version.

View Latest

Query Types

Couchbase Full Text Search supports multiple types of query.

Introduction to Query Types

Full Text Search allows text-data to be queried. Multiple options are provided for ensuring the right kinds of match. This page describes the purpose of each query-type, and provides sample JSON objects that indicate how queries can be constructed.

Available query-types include:

Simple Queries

Accept input-text in the form of words and phrases, and attempt to find matches across bodies of text that have been indexed. Analyzers are applied to both input and target, potentially to strip out unnecessary characters, reduce words to the basic stems on which matching should occur, handle punctuation, and more. Additionally, match accuracy-levels can be specified; and multiple queries can be expressed together, with their respective priorities boosted, (to ensure their results' prominence in the eventual result-set).

Compound Queries

Accept multiple queries simultaneously, and return either the conjunction of results from the result-sets, or a disjunction.

Range Queries

Accept ranges for dates and numbers, and return documents that contain values within those ranges.

Query String Queries

Accept query strings, which express query-requirements in a special syntax.

Geospatial Queries

Accept longitude-latitude coordinate pairs, in order to return documents that specify a geographical location.

Non-Analytic Queries

Accept words and phrases on which exact matches only are returned. No analysis is performed.

Special Queries

For testing purposes, return either all of the documents in an index, or none.

These query-types are explained in greater detail below. Examples are provided, using the Couchbase REST API query-syntax. (Note that Full Text Search can also be performed with the Couchbase Web Console and the Couchbase SDK.) The JSON data refers to the travel-sample bucket, and assumes that demonstration full text indexes have been created, as described in Demonstration Indexes.

To run the examples using curl, use the following syntax:

$ curl -u Administrator:password -X POST -H "Content-Type: application/json" \
  -d '{your query in JSON here...}' \
  http://localhost:8094/api/index/index_name/query

Note that the examples below show only the JSON fragments that constitute non-generic parts of the queries they describe. For actual use in a Full Text Search, these JSON fragments should be wrapped in the following generic configuration:

{
  "explain": false,
  "fields": [
    "*"
  ],
  "highlight": {},
  "query":{ your_query_details_here }
}

For more information on using the REST API to perform queries, see Searching with the REST API.

Simple Queries

Match Query

A match query analyzes input text, and uses the results to query an index. Options include specifying an analyzer, performing a fuzzy match, and performing a prefix match. By default, the analyzer used for the search text is what was set for the specified field, during index creation. Note that if the field isn’t specified, the match query will target the _all field within the index. Including content within the _all field is a setting during index creation. For information on analyzers, see Understanding Analyzers.

When fuzzy matching is used, if the single parameter is set to a non-zero integer, the analyzed text is matched with a corresponding level of fuzziness. The maximum supported fuzziness is 2.

When a prefix match is used, the prefix_length parameter specifies that for a match to occur, a prefix of specified length must be shared by the input-term and the target text-element.

The following JSON object demonstrates specification of a match query:

{
 "match": "location hostel",
 "field": "reviews.content",
 "analyzer": "standard",
 "fuzziness": 2,
 "prefix_length": 4
}

A match query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Match Phrase Query

The input text is analyzed, and a phrase query is built with the terms resulting from the analysis. This type of query searches for terms in the target that occur in the positions and offsets indicated by the input: this depends on term vectors, which must have been included in the creation of the index used for the search.

For example, a match phrase query for location for functions is matched with locate the function, if the standard analyzer is used: this analyzer uses a stemmer, which tokenizes location and locate to locat, and reduces functions and function to function. Additionally, the analyzer employs stop removal, which removes small and less significant words from input and target text, so that matches are attempted on only the more significant elements of vocabulary: in this case for and the are removed. Following this processing, the tokens locat and function are recognized as common to both input and target; and also as being both in the same sequence as, and at the same distance from one another; and therefore a match is made.

{
 "match_phrase": "very nice",
 "field": "reviews.content"
}

A match phrase query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Fuzzy Query

A fuzzy query matches terms within a specified edit (or Levenshtein) distance: meaning that terms are considered to match when they are to a specified degree similar, rather than exact. A common prefix of a stated length may be also specified as a requirement for matching.

Fuzziness is specified by means of a single integer. For example:

{
 "term": "interest",
 "field": "reviews.content",
 "fuzziness": 2
}

Fuzziness is demonstrated by means of the Java SDK, in the context of the term query (see below), in Searching with the SDK. Note that two such queries are specified, with the difference in fuzziness between them resulting in different forms of match, and different sizes of result-sets.

Prefix Query

A prefix query finds documents containing terms that start with the specified prefix.

{
 "prefix": "inter",
 "field": "reviews.content"
}

Regexp Query

A regexp query finds documents containing terms that match the specified regular expression.

{
 "regexp": "inter.+",
 "field": "reviews.content"
}

A regexp query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Wildcard Query

A wildcard query uses a wildcard expression, to search within individual terms for matches. Wildcard expressions can be any single character (?) or zero to many characters (*). Wildcard expressions can appear in the middle or end of a term, but not at the beginning.

{
 "wildcard": "inter*",
 "field": "reviews.content"
}

A wildcard query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Boolean Field Query

A boolean field query searches fields that contain boolean true or false values. A boolean field query searches the actual content of the field, and should not be confused with the boolean queries (described below, in the section on compound queries) that modify whether a query must, should, or must not be present.

{
 "bool": true,
 "field": "free_breakfast"
}

Compound Queries

Conjunction Query (AND)

A conjunction query contains multiple child queries. Its result documents must satisfy all of the child queries.

{
 "conjuncts":[
   {"field":"reviews.content", "match": "location"},
   {"field":"free_breakfast", "bool": true}
 ]
}

A conjunction query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Disjunction Query (OR)

A disjunction query contains multiple child queries. Its result documents must satisfy a configurable min number of child queries. By default this min is set to 1. For example, if three child queries — A, B, and C — are specified, a min of 1 specifies that the result documents should be those returned uniquely for A (with all returned uniquely for B and C, and all returned commonly for A, B, and C, omitted).

{
 "disjuncts":[
   {"field":"reviews.content", "match": "location"},
   {"field":"free_breakfast", "bool": true}
 ]
}

A disjunction query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Boolean Query

A boolean query is a combination of conjunction and disjunction queries. A boolean query takes three lists of queries:

  • must: Result documents must satisfy all of these queries.

  • should: Result documents should satisfy these queries.

  • must not: Result documents must not satisfy any of these queries.

{
 "must": {
   "conjuncts":[{"field":"reviews.content", "match": "location"}]},
 "must_not": {
   "disjuncts": [{"field":"free_breakfast", "bool": false}]},
 "should": {
   "disjuncts": [{"field":"free_breakfast", "bool": true}]}
}

Doc ID Query

A doc ID query returns the indexed document or documents among the specified set. This is typically used in conjunction queries, to restrict the scope of other queries’ output.

{ "ids": [ "hotel_10158", "hotel_10159" ] }

A doc ID Query is demonstrated by means of the Java SDK, in Searching with the SDK.

Range Queries

Date Range Query

A date range query finds documents containing a date value, in the specified field within the specified range. Dates should be in the format specified by RFC-3339, which is a specific profile of ISO-8601. Define the endpoints using the fields start and end. One endpoint can be omitted, but not both. The inclusive_start and inclusive_end properties in the query JSON control whether or not the endpoints are included or excluded.

{
 "start": "2001-10-09T10:20:30-08:00",
 "end": "2016-10-31",
 "inclusive_start": false,
 "inclusive_end": false,
 "field": "review_date"
}

Numeric Range Query

A numeric range query finds documents containing a numeric value in the specified field within the specified range. Define the endpoints using the fields min and max. You can omit one endpoint, but not both. The inclusive_min and inclusive_max properties control whether or not the endpoints are included or excluded. By default, min is inclusive and max is exclusive.

{
 "min": 100, "max": 1000,
 "inclusive_min": false,
 "inclusive_max": false,
 "field": "id"
}

A numeric range Query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Term Range Query

A term range query finds documents containing a term in the specified field within the specified range. Define the endpoints using the fields min and max. You can omit one endpoint, but not both. The inclusive_min and inclusive_max properties control whether or not the endpoints are included or excluded. By default, min is inclusive and max is exclusive.

{
 "min": "foo", "max": "foof",
 "inclusive_min": false,
 "inclusive_max": false,
 "field": "desc"
}

Query String Query

A query string can be used, to express a given query by means of a special syntax.

{ "query": "+nice +view" }

A query string Query is demonstrated by means of the Java SDK, in Searching with the SDK. Note also that the Full Text Searches conducted with the Couchbase Web Console themselves use query strings. (See Searching from the UI.)

Certain queries supported by FTS are not yet supported by the query string syntax. These include wildcards and regular expressions.

More detailed information is provided in Query String Queries.

Non-Analytic Queries

Term and Phrase queries support no analysis on their inputs. This means that only exact matches are returned.

In most cases, given the benefits of using analyzers, use of match and match phrase queries is preferable to that of term and phrase. For information on analyzers, see Understanding Analyzers.

Term Query

A term query is the simplest possible query. It performs an exact match in the index for the provided term.

{
  "term": "locate",
  "field": "reviews.content"
}

Term queries are also demonstrated by means of the Java SDK, in Searching with the SDK.

Phrase Query

A phrase query searches for terms occurring in the specified position and offsets. It performs an exact term-match for all the phrase-constituents, without using an analyzer.

{
  "terms": ["nice", "view"],
  "field": "reviews.content"
}

A phrase query is also demonstrated by means of the Java SDK, in Searching with the SDK.

Geospatial Queries

Geospatial queries return documents that each specify a geographical location. Each query contains either:

  • A single longitude-latitude coordinate pair; and a distance value, in miles, which determines a radius measured from the location specified by the coordinate pair. Documents are returned if they specify (by means of a longitude-latitude coordinate pair) a location that lies within the radius.

  • Two longitude-latitude coordinate pairs. These are respectively taken to indicate the upper left and lower right corners of a bounding box. Documents are returned if they specify a location that lies within the bounding box.

A geospatial query must be applied to an index that applies the geopoint type mapping to the document-field that contains the target longitude-latitude coordinate pair.

More detailed information is provided in Geospatial Queries.

Special Queries

Special queries are usually employed either in combination with other queries, or to test the system.

Match All Query

Matches all documents in an index, irrespective of terms. For example, if an index is created on the travel-sample bucket for documents of type zucchini, the match all query returns all document IDs from the travel-sample bucket, even though the bucket contains no documents of type zucchini.

{ "match_all": {} }

Match None Query

Matches no documents in the index.

{ "match_none": {} }