Full Text Search (FTS) Using the PHP SDK with Couchbase Server

You can use the Full Text Search service (FTS) to create queryable full-text indexes in Couchbase Server.

Couchbase offers Full-text search support, allowing you to search for documents that contain certain words or phrases. In the PHP SDK you can search full-text indexes by using the CouchbaseBucket.query(CouchbaseSearchQuery) API.

Querying a FTS index through the PHP client is performed through the CouchbaseBucket.query(CouchbaseSearchQuery) method, providing a CouchbaseSearchQuery. Building a CouchbaseSearchQuery takes two parameters, the index name to query and the actual search query itself. In this way, it is a sort of a statement. Additional search options may be specified by using the CouchbaseSearchQuery as a builder, chaining setters for each relevant option.

Depending on the third argument, this method returns an object of which the hits property represents array of either stdClass instances or plain arrays. Each of the entries represent a hit for the query and contains the index, id and score properties respectively, identifying the exact FTS index that returned the hit, the id of the document that matched and a decimal score for the match. It also contains optional sections depending on the request and the availability of all relevant settings in the FTS mapping. Those are in the explanation (an explanation of the plan followed by the FTS index to execute the query), locations (a map-like listing of the location of all matching terms inside each relevant field that was queried), fragments (a map-like listing of occurrences of the search terms in each field, with the context of the terms) and fields (a map of the complete value of each requested field). Most of these need an index configured to store the data of a searched field. In addition to hits property, any result also contains a status for the request, some execution metrics and facets if facets have been requested.

$cluster = new CouchbaseCluster('couchbase://localhost');
$bucket = $cluster->openBucket("travel-sample");
$fts = CouchbaseSearchQuery::match("term");
$result = $bucket->query(new CouchbaseSearchQuery("travel-search", $fts));
foreach ($result->hits as $search_hit) {
  var_dump($search_hit);
}

Query Types

There are many different flavors of search queries, and each can be constructed through static factory methods in the CouchbaseSearchQuery class. All of these types derive from the CouchbaseAbstractSearchQuery. It contains query classes corresponding to those listed in the FTS generic documentation.

It is important to distinguish between query options and general search options. Some options affect the search process in general (such as the limit, indicating how many results to return) while others only affect a specific query (such as fuzziness for a given query). Because multiple queries can be combined in a single search operation, query specific options can be specified only in the query object itself, while search options are specified at the level of the CouchbaseSearchQuery class, using builder methods.

$cluster = new CouchbaseCluster('couchbase://localhost');
$bucket = $cluster->openBucket("travel-sample");
$fts = CouchbaseSearchQuery::match("term");
$query = new CouchbaseSearchQuery("travel-search", $fts);
//search options:
//will show value for activity and country fields
$query->fields(array("activity", "country"))
       //will have max 3 hits
       ->limit(3);
$result = $bucket->query($query);
foreach ($result->hits as $search_hit) {
  var_dump($search_hit);
}

Here’s some sample output for the previous query against the travel-sample dataset:

object(stdClass)#6 (5) {
  ["index"]=>
  string(39) "travel-sample_683e328089809988_a4a23588"
  ["id"]=>
  string(13) "landmark_4955"
  ["score"]=>
  float(0.8133278848105)
  ["locations"]=>
  object(stdClass)#7 (1) {
    ["content"]=>
    object(stdClass)#8 (1) {
      ["term"]=>
      array(1) {
        [0]=>
        object(stdClass)#9 (4) {
          ["pos"]=>
          int(24)
          ["start"]=>
          int(159)
          ["end"]=>
          int(163)
          ["array_positions"]=>
          NULL
        }
      }
    }
  }
  ["fields"]=>
  object(stdClass)#10 (2) {
    ["activity"]=>
    string(5) "drink"
    ["country"]=>
    string(14) "United Kingdom"
  }
}
object(stdClass)#11 (5) {
  ["index"]=>
  string(39) "travel-sample_683e328089809988_37fdcf3c"
  ["id"]=>
  string(14) "landmark_15854"
  ["score"]=>
  float(0.74464445323204)
  ["locations"]=>
  object(stdClass)#12 (1) {
    ["content"]=>
    object(stdClass)#13 (1) {
      ["term"]=>
      array(1) {
        [0]=>
        object(stdClass)#14 (4) {
          ["pos"]=>
          int(21)
          ["start"]=>
          int(137)
          ["end"]=>
          int(141)
          ["array_positions"]=>
          NULL
        }
      }
    }
  }
  ["fields"]=>
  object(stdClass)#15 (2) {
    ["activity"]=>
    string(3) "see"
    ["country"]=>
    string(14) "United Kingdom"
  }
}
object(stdClass)#16 (5) {
  ["index"]=>
  string(39) "travel-sample_683e328089809988_9e13fa43"
  ["id"]=>
  string(14) "landmark_16265"
  ["score"]=>
  float(0.68760768406444)
  ["locations"]=>
  object(stdClass)#17 (1) {
    ["content"]=>
    object(stdClass)#18 (1) {
      ["term"]=>
      array(1) {
        [0]=>
        object(stdClass)#19 (4) {
          ["pos"]=>
          int(41)
          ["start"]=>
          int(209)
          ["end"]=>
          int(213)
          ["array_positions"]=>
          NULL
        }
      }
    }
  }
  ["fields"]=>
  object(stdClass)#20 (2) {
    ["activity"]=>
    string(3) "see"
    ["country"]=>
    string(14) "United Kingdom"
  }
}

Query Facets

Query facets may also be added to the general search parameters by using the addFacet($name, $facet) builder method on CouchbaseSearchQuery. You can create facet queries by instantiating facets through factory methods in the CouchbaseSearchFacet class.

$cluster = new CouchbaseCluster('couchbase://localhost');
$bucket = $cluster->openBucket("travel-sample");
$fts = CouchbaseSearchQuery::match("term");
$query = new CouchbaseSearchQuery("travel-search", $fts);
//search options:
//will show value for activity and country fields
$query->fields(array("activity", "country"))
       //will have max 3 hits
       ->limit(3)
       //will have a "countries" facet on the top 3 countries in terms of hits
       ->addFacet("countries", CouchbaseSearchFacet::term("country", 3));

$result = $bucket->query($query);
var_dump($result->facets);

Here is the facet part of the result from the query above:

array(1) {
  ["countries"]=>
  array(5) {
    ["field"]=>
    string(7) "country"
    ["total"]=>
    int(42)
    ["missing"]=>
    int(0)
    ["other"]=>
    int(0)
    ["terms"]=>
    array(3) {
      [0]=>
      array(2) {
        ["term"]=>
        string(6) "united"
        ["count"]=>
        int(21)
      }
      [1]=>
      array(2) {
        ["term"]=>
        string(7) "kingdom"
        ["count"]=>
        int(14)
      }
      [2]=>
      array(2) {
        ["term"]=>
        string(6) "states"
        ["count"]=>
        int(7)
      }
    }
  }
}

Detailed Examples

The code example below demonstrates the PHP SDK Full Text Search API. The example assumes that Couchbase Server is running, and that the username Administrator and the password password provide authorization for performing the searches. It also assumes that the travel-sample bucket has been installed. For information on creating users and managing roles, see 6.0@server:security:security-authorization.adoc. For information on installing sample buckets, see the Settings pages.

The example also assumes the existence of three specific Full Text Indexes, defined on the travel-sample bucket. These are:

  • travel-sample-index-unstored: Uses only the default settings.

  • travel-sample-index-stored: Uses default settings, with one exception: dynamic fields are stored, for the whole index.

  • travel-sample-index-hotel-description: Indexes only the description fields of hotel documents, and disables the default type mapping. The index has a custom analyzer named myUnicodeAnalyzer defined on it: the analyzer’s main characteristic is that it uses the unicode tokenizer.

See Creating Indexes for details on how to create these indexes: they can be created interactively, by means of the Couchbase Web Console; however, there may be greater efficiency in using the Couchbase REST API, as described in the section Index Creation with the REST API. The JSON objects that constitute index-definitions (for inclusion as bodies to the index-creation REST calls), are provided in Demonstration Indexes.

<?php

use \Couchbase\Cluster;
use \Couchbase\Bucket;
use \Couchbase\SearchQuery;

function printResult($label, $resultObject) {
    echo("\n");
    echo("= = = = = = = = = = = = = = = = = = = = = = =\n");
    echo("= = = = = = = = = = = = = = = = = = = = = = =\n");
    echo("\n");
    echo($label);
    echo(' (total hits: ' . $resultObject->metrics['total_hits'] . ')');
    echo("\n");

    foreach ($resultObject->hits as $row) {
        echo("id=" . $row->id . ", score=" . $row->score);
        if (property_exists($row, 'fields')) {
            echo(", fields=" . json_encode($row->fields));
        }
        if (property_exists($row, 'locations')) {
            echo(", locations=" . json_encode($row->locations));
        }
        if (property_exists($row, 'fragments')) {
            echo(", fragments=" . json_encode($row->fragments));
        }
        echo("\n");
    }
}

// Simple Text Query on a single word, targeting an index with dynamic fields
// unstored.
function simpleTextQuery(Bucket $bucket) {
    $indexName = "travel-sample-index-unstored";
    $query = new SearchQuery($indexName,
                             SearchQuery::match("swanky"));
    $query->limit(10);
    $result = $bucket->query($query);
    printResult("Simple Text Query", $result);
}

// Simple Text Query on Stored Field, specifying the field to be searched;
// targeting an index with dynamic fields stored, to ensure that field-content
// is included in the return object.
function simpleTextQueryOnStoredField(Bucket $bucket) {
    $indexName = "travel-sample-index-stored";
    $query = new SearchQuery($indexName,
                             SearchQuery::match("MDG")->field("destinationairport"));
    $query->limit(10)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $result = $bucket->query($query);
    printResult("Simple Text Query on Stored Field", $result);
}

// Simple Text Query on Non-Default Index, specifying an index that consists
// only of content derived from a specific field from a specific document-type.
function simpleTextQueryOnNonDefaultIndex(Bucket $bucket) {
    $indexName = "travel-sample-index-hotel-description";
    $query = new SearchQuery($indexName,
                             SearchQuery::match("swanky"));
    $query->limit(10);
    $result = $bucket->query($query);
    printResult("Simple Text Query on Non-Default Index", $result);
}

// Match Query with Facet, showing how query-results can be displayed either by
// row or by hits; and demonstrating use of a facet, which provides
// aggregation-data.
function textQueryOnStoredFieldWithFacet(Bucket $bucket) {
    $indexName = "travel-sample-index-stored";
    $query = new SearchQuery($indexName,
                             SearchQuery::match("La Rue Saint Denis!!")->field("reviews.content"));
    $query->limit(10)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $query->addFacet('Countries Referenced', SearchQuery::termFacet('country', 5));
    $result = $bucket->query($query);
    printResult("Match Query with Facet, Result by Row", $result);

    echo("Match Query with Facet, Result by facet:\n");
    echo(json_encode($result->facets) . "\n");
}

// DocId Query, showing results of a query on two document IDs.
function docIdQueryMethod(Bucket $bucket) {
    $indexName = "travel-sample-index-unstored";
    $query = new SearchQuery($indexName,
                             SearchQuery::docId("hotel_26223", "hotel_28960"));
    $result = $bucket->query($query);
    printResult("DocId Query", $result);
}

// Unanalyzed Term Query with Fuzziness Level, demonstrating how to query on a
// term with no analysis. Zero fuzziness is specified, to ensure that matches
// are exact. With a fuzziness factor of 2, allowing partial matches to be made.
function unAnalyzedTermQuery(Bucket $bucket, int $fuzzinessLevel) {
    $indexName = "travel-sample-index-stored";
    $query = new SearchQuery($indexName,
                             SearchQuery::term("sushi")->field("reviews.content")->fuzziness($fuzzinessLevel));
    $query->limit(50)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $result = $bucket->query($query);
    printResult("Unanalyzed Term Query with Fuzziness Level of " . $fuzzinessLevel . ":", $result);
}

// Match Phrase Query, using Analysis, for searching on a phrase.
function matchPhraseQueryOnStoredField(Bucket $bucket) {
    $indexName = "travel-sample-index-stored";
    $query = new SearchQuery($indexName,
                             SearchQuery::matchPhrase("Eiffel Tower")->field("description"));
    $query->limit(10)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $result = $bucket->query($query);
    printResult("Match Phrase Query, using Analysis", $result);
}

// Phrase Query, without Analysis, for searching on a phrase without analysis supported.
function unAnalyzedPhraseQuery(Bucket $bucket) {
    $indexName = "travel-sample-index-stored";
    $query = new SearchQuery($indexName,
                             SearchQuery::phrase("dorm", "rooms")->field("description"));
    $query->limit(10)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $result = $bucket->query($query);
    printResult("Phrase Query, without Analysis", $result);
}

// Conjunction Query, whereby two separate queries are defined and then run as
// part of the search, with only the matches returned by both included in the
// result-object.
function conjunctionQuery(Bucket $bucket) {
    $indexName = "travel-sample-index-stored";
    $firstQuery = SearchQuery::match("La Rue Saint Denis!!")->field("reviews.content");
    $secondQuery = SearchQuery::match("boutique")->field("description");
    $query = new SearchQuery($indexName,
                             SearchQuery::conjuncts($firstQuery, $secondQuery));
    $query->limit(10)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $result = $bucket->query($query);
    printResult("Conjunction Query", $result);
}

// Query String Query, showing how a query string is specified as search-input.
function queryStringQuery(Bucket $bucket) {
    $indexName = "travel-sample-index-unstored";
    $query = new SearchQuery($indexName,
                             SearchQuery::queryString("description: Imperial"));
    $query->limit(10);
    $result = $bucket->query($query);
    printResult("Query String Query", $result);
}

// Wild Card Query, whereby a wildcard is used in the string submitted for the
// search.
function wildCardQuery(Bucket $bucket) {
    $indexName = "travel-sample-index-stored";
    $query = new SearchQuery($indexName,
                             SearchQuery::wildcard("bouti*ue")->field("description"));
    $query->limit(10)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $result = $bucket->query($query);
    printResult("Wild Card Query", $result);
}

// Numeric Range Query, whereby minimum and maximum numbers are specified, and
// matches within the range returned.
function numericRangeQuery(Bucket $bucket) {
    $indexName = "travel-sample-index-unstored";
    $query = new SearchQuery($indexName,
                             SearchQuery::numericRange()->min(10100)->max(10200)->field("id"));
    $query->limit(10);
    $result = $bucket->query($query);
    printResult("Numeric Range Query", $result);
}

// Regexp Query, whereby a regular expression is submitted, to generate the
// conditions for successful matches.
function regexpQuery(Bucket $bucket) {
    $indexName = "travel-sample-index-stored";
    $query = new SearchQuery($indexName,
                             SearchQuery::regexp("[a-z]")->field("description"));
    $query->limit(10)->highlight(SearchQuery::HIGHLIGHT_HTML);
    $result = $bucket->query($query);
    printResult("Regexp Query", $result);
}

$cluster = new Cluster('couchbase://localhost');
$cluster->authenticateAs('Administrator', 'password');
$bucket = $cluster->openBucket('travel-sample');

simpleTextQuery($bucket);
simpleTextQueryOnStoredField($bucket);
simpleTextQueryOnNonDefaultIndex($bucket);
textQueryOnStoredFieldWithFacet($bucket);
docIdQueryMethod($bucket);
unAnalyzedTermQuery($bucket, 0);
unAnalyzedTermQuery($bucket, 2);
matchPhraseQueryOnStoredField($bucket);
unAnalyzedPhraseQuery($bucket);
conjunctionQuery($bucket);
queryStringQuery($bucket);
wildCardQuery($bucket);
numericRangeQuery($bucket);
regexpQuery($bucket);