Understanding Queries

    +
    Full Text Search allows queries to be performed on Full Text Indexes. Queries can be performed by means of the Couchbase Web Console, the Couchbase REST API, N1QL (using search functions in the Query service), or the Couchbase SDK.

    Query-Specification Options

    Full Text Search allows a range of query options. These include:

    • Input-text and target-text can be analyzed: this transforms input-text into token-streams, according to different specified criteria, so allowing richer and more finely controlled forms of text-matching.

    • The fuzziness of a query can be specified, so that the scope of matches can be constrained to a particular level of exactitude. A high degree of fuzziness means that a large number of partial matches may be returned.

    • Multiple queries can be specified for simultaneous processing, with one given a higher boost than another; so ensuring that its results are returned at the top of the set.

    • Regular expressions and wildcards can be used in text-specification for search-input.

    • Compound queries can be designed, such that an appropriate conjunction or disjunction of the total result-set can be returned.

    All the above options, and others, are explained in detail on the page Query Types.

    For information on how to execute queries, see Performing Searches.

    Sorting Results

    The results returned from a Full Text Search can be sorted: by id, score, field, and more. Details are provided in Sorting Query Results.

    Pagination Couchbase Server 6.6.1

    The number of results obtained for a Full Text Search request can be large at times. Pagination of these results becomes essential for sorting and displaying a subset of these results. There are multiple ways to achieve pagination with settings within a search request. Pagination will fetch a deterministic set of results when the results are sorted in a certain fashion.

    size/limit, from/offset

    These settings can be used to obtain a subset of results and works deterministically when combined with a certain sort order. The default sort order is based on score (relevance) where the results are ordered from the highest to the lowest score.

    Here’s an example query that fetches results from the 11th onwards to the 15th that have been ordered by score ..

    {
      "query": {
          "match": "California",
          "field": "state"
      },
      "size": 5,
      "from": 10
    }
    search_after, search_before

    Using size/limit and offset/from would fetch at least size + from ordered results from a partition and then return the size number of results starting at offset from. Deep pagination can therefore get pretty expensive when using size + from on a sharded index due to each shard having to possibly return large resultsets (at least size + from) over the network for merging at the coordinating node before returning the size number of results starting at offset from.

    For more efficient pagination - the search_after/search_before settings can be leveraged. search_after is designed to fetch the size number of results after the key specified and search_before is designed to fetch the size number of results before the key specified. These settings allow for the client to maintain state while paginating - the sort key of the last result (for search_after) or the first result (for search_before) in the current page.

    Note that both the attributes accept an array of strings (sort keys) - the length of this array will need to be the same length of the "sort" array within the search request. Both search_after and search_before CANNOT be used witin the same search request.

    Here are some examples using search_after/search_before over sort key "_id" (an internal field that carries the document ID).

    {
      "query": {
          "match": "California",
          "field": "state"
      },
      "sort": ["_id"],
      "search_after": ["hotel_10180"],
      "size": 3
    }
    {
      "query": {
          "match": "California",
          "field": "state"
      },
      "sort": ["_id"],
      "search_before": ["hotel_17595"],
      "size": 4
    }

    Note: A Full Text Search request that doesn’t carry any pagination settings will return the first 10 results ("size: 10", "from": 0) ordered by score sequentially from the highest to lowest.

    Scoring Couchbase Server 6.6.1

    Search result scoring occurs at query time. The results for a search request are ordered by score (relevance) with the sort order being descending, unless explicitly set not to do so. Couchbase (via bleve) uses a slightly modified version of the standard tf-idf algorithm. This deviation is to normalize the score by various relevant factors.

    By setting explain to true within the search request, you will obtain the explanation of how the score was calculated for a result alongside it.

    Scores for result document hits are calculated by default, whether the score is used for ordering or not. This means calculation of various frequency and normalization information for the search terms at query time. Scoring can be disabled by setting score to none in the search request. This may be desirable in the situation when scoring (document relevancy) isn’t needed by the application. Note that using "score": "none" is expected to boost query performance in certain situations. Here’s an example search request with scoring disabled.

    {
      "query": {
          "match": "California",
          "field": "state"
      },
      "score": "none",
      "size": 100
    }

    Query Response Objects

    Every Full Text Search query provides a response object. This contains the query-result; and consists of individual child-objects that respectively contain status, a request-copy, the number of hits (or matches) achieved, and (optionally) facets, which provide aggregation information.

    Scan Consistency

    Optional

    To specify the consistency guarantee/constraint for index scanning, the accepted values are:

    ``

    This is the default mode of operation. An empty string represents the "not_bounded" setting - no timestamp vector is used in the index scan. This is the fastest mode, because it avoids the costs of obtaining the vector and waiting for the index to catch up to the vector.

    at_plus

    This implements bounded consistency. The request includes a scan_vector parameter and value, which is used as a lower bound. This can be used to implement read-your-own-writes (RYOW).

    {
      "explain": true,
      "fields": [],
      "query": {
          ...
      },
      "ctl": {
        "consistency": {
          "level": "at_plus"
        }
      }
    }

    For full information, see Handling Response Objects.