A newer version of this documentation is available.

View Latest

Full Text Search (FTS) Using the Python SDK with Couchbase Server

You can use the Full Text Search service (FTS) to create queryable full-text indexes in Couchbase Server.

Couchbase offers Full-text search support, allowing you to search for documents that contain certain words or phrases. In the Python SDK you can search full-text indexes by using the iterator-based Bucket.search() API.

Querying a FTS index through the Python client is performed through the Bucket.search(). This method takes two parameters, the index name to query and the actual search query itself. Additional search options may be specified as keyword arguments.

import couchbase.fulltext as FT
results = cb.search('travel-search', FT.TermQuery('office'), limit=25)
for result in results:
    print(result['id'])

The Bucket.search() method returns an object which may be iterated over to retrieve the results. Each result is a dictionary comprising the layout defined in Handling Response Objects.

Other search result data may be accessed using the iterator’s meta and facets properties:

results = cb.search(indexname, query)
for result in results:
    handle_result(result)
print(results.meta)
print(results.facets)

Query Types

Query types may be found inside the couchbase.fulltext module. The module contains query classes corresponding to those enumerated in Query Types. Query object should be instantiated by passing the search term (usually a string) as the first argument, followed by some query modifiers.

It is important to distinguish between query options and general search options. Some options affect the search process in general (such as the limit, indicating how many results to return) while others only affect a specific query (such as fuzziness for a given query). Because multiple queries can be combined in a single search operation, query specific options can be specified only in the query object itself, while search options are specified as keyword arguments to search().

Query Facets

Query facets may also be added to the general search parameters by using the facets={} keyword argument. The facets keyword argument accepts a dictionary with facet names as keys and facets themselves as values. You can create facet queries by instantiating Facet objects found in the couchbase.fulltext module.

results = cb.search(
        'travel-search', FT.MatchQuery('wine'),
        facets={'countries': FT.TermFacet('country', limit=5)}, limit=0)

# Exhaust the iterator
for _ in results:
    pass

for info in results.facets['countries']['terms']:
    print('Got {} results from {}'.format(info['count'], info['term']))

Using Full Text Search from the Python SDK

By means of the Python SDK, Full Text Search queries can be performed on Full Text Indexes; and result-sets sorted.

A general introduction to Full Text Search, with pointers to detailed descriptions of its principal features, is provided in Full Text Search: Fundamentals.

The current page features a code example that demonstrates the Python SDK Full Text Search API. The example assumes that Couchbase Server is running, and that the username Administrator and the password password provide authorization for performing the searches. It also assumes that the travel-sample bucket has been installed. For information on creating users and managing roles, see Authorization. For information on installing sample buckets, see Install Sample Buckets.

The example also assumes the existence of three specific Full Text Indexes, defined on the travel-sample bucket. These are:

  • travel-sample-index-unstored: Uses only the default settings.

  • travel-sample-index-stored: Uses default settings, with one exception: dynamic fields are stored, for the whole index.

  • travel-sample-index-hotel-description: Indexes only the description fields of hotel documents, and disables the default type mapping. The index has a custom analyzer named myUnicodeAnalyzer defined on it: the analyzer’s main characteristic is that it uses the unicode tokenizer.

See Creating Indexes for details on how to create these indexes: they can be created interactively, by means of the Couchbase Web Console; however, there may be greater efficiency in using the Couchbase REST API, as described in the section Index Creation with the REST API. The JSON objects that constitute index-definitions (for inclusion as bodies to the index-creation REST calls), are provided in Demonstration Indexes.

The example features the following Full Text Searches on the travel-sample bucket, within Couchbase Server:

  • Simple Text Query on a single word, targeting an index with dynamic fields unstored.

  • Simple Text Query on Non-Default Index, specifying an index that consists only of content derived from a specific field from a specific document-type.

  • Simple Text Query on Stored Field, specifying the field to be searched; targeting an index with dynamic fields stored, to ensure that field-content is included in the return object.

  • Match Query with Facet, showing how query-results can be displayed either by row or by hits; and demonstrating use of a facet, which provides aggregation-data.

  • DocId Query, showing results of a query on two document IDs.

  • Unanalyzed Term Query with Fuzziness Level of 0, demonstrating how to query on a term with no analysis. Zero fuzziness is specified, to ensure that matches are exact.

  • Unanalyzed Term Query with Fuzziness Level of 2, which is almost identical to the immediately preceding query; but which this time specifies a fuzziness factor of 2, allowing partial matches to be made. The output from this query can be compared to that of the one immediately preceding.

  • Match Phrase Query, using Analysis, for searching on a phrase.

  • Phrase Query, without Analysis, for searching on a phrase without analysis supported.

  • Query String Query, showing how a query string is specified as search-input.

  • Conjunction Query, whereby two separate queries are defined and then run as part of the search, with only the matches returned by both included in the result-object.

  • Wild Card Query, whereby a wildcard is used in the string submitted for the search.

  • Numeric Range Query, whereby minimum and maximum numbers are specified, and matches within the range returned.

  • Regexp Query, whereby a regular expression is submitted, to generate the conditions for successful matches.

Detailed Example

The following example demonstrates Full Text Search queries that can be made with the Python SDK. It can be run using Nose tests.

import couchbase.fulltext as FT

class FTStringsTest:
    def setUp(self):
        cluster = couchbase.cluster.Cluster("couchbase://10.142.180.102")
        cluster.authenticate(couchbase.cluster.PasswordAuthenticator("default","password"))
        self.cb=cluster.open_bucket("travel-sample")

    @staticmethod
    def printResult(label, resultObject):
        print()
        print("= = = = = = = = = = = = = = = = = = = = = = =")
        print("= = = = = = = = = = = = = = = = = = = = = = =")
        print()
        print(label)
        print()

        for row in resultObject:
            print(row)

    def test_demo(self):
        results = self.cb.search(
            'travel-search',
            FT.MatchQuery('part', fuzziness=0, field='content'),
            limit=3,
            facets={'countries': FT.TermFacet('country', limit=3)})

        for row in results:
            pprint(row)

        print('Facet results:')

    def test_simple_text_query(self):
        indexName = "travel-sample-index-unstored"
        query = FT.MatchQuery("swanky")

        result = self.cb.search(indexName, query, limit=10)

        FTStringsTest.printResult("Simple Text Query", result)

    def test_simple_text_query_on_stored_field(self):
        indexName = "travel-sample-index-stored"
        query = FT.MatchQuery("MDG")
        query.field = "destinationairport"

        result = self.cb.query(indexName, query).limit(10).highlight()

        FTStringsTest.printResult("Simple Text Query on Stored Field", result)

    def test_simple_text_query_on_non_default_index(self):
        indexName = "travel-sample-index-hotel-description"
        query = FT.MatchQuery("swanky")

        result = self.cb.search(indexName, query, limit=10)

        FTStringsTest.printResult("Simple Text Query on Non-Default Index", result)

    def test_text_query_on_stored_field_with_facet(self):
        indexName = "travel-sample-index-stored"
        query = FT.MatchQuery("La Rue Saint Denis!!")
        query.field = "reviews.content"

        result = self.cb.search(indexName, query, limit=10, highlight_style="ansi",
                                facets={"Countries Referenced": FT.TermFacet("country", 5)})

        FTStringsTest.printResult("Match Query with Facet, Result by Row", result)

        print()
        print("Match Query with Facet, Result by hits:")
        print(result.hits())

        print()
        print("Match Query with Facet, Result by facet: ")
        print(result.facets())

    def test_doc_id_query_method(self):
        indexName = "travel-sample-index-unstored"
        query = FT.DocIdQuery(["hotel_26223", "hotel_28960"])

        result = self.cb.search(indexName, query)

        FTStringsTest.printResult("DocId Query", result)

    def test_un_analyzed_term_query(self):
        fuzzinessLevel = 5
        indexName = "travel-sample-index-stored"
        query = FT.TermQuery("sushi", field="reviews.content", fuzziness=fuzzinessLevel)

        result = self.cb.search(indexName, query, limit=50, highlight_style="ansi")

        FTStringsTest.printResult("Unanalyzed Term Query with Fuzziness Level of " + str(fuzzinessLevel) + ":", result)

    def test_match_phrase_query_on_stored_field(self):
        indexName = "travel-sample-index-stored"
        query = FT.MatchPhraseQuery("Eiffel Tower", field="description")

        result = self.cb.search(indexName, query, limit=10, highlight_style="ansi")

        FTStringsTest.printResult("Match Phrase Query, using Analysis", result)

    def test_un_analyzed_phrase_query(self):
        indexName = "travel-sample-index-stored"
        query = FT.PhraseQuery("dorm", "rooms", field="description")

        result = self.cb.search(indexName, query, limit=10, highlight_style="ansi")
        FTStringsTest.printResult("Phrase Query, without Analysis", result)

    def test_conjunction_query_method(self):
        indexName = "travel-sample-index-stored"
        firstQuery = FT.MatchQuery("La Rue Saint Denis!!", field="reviews.content")
        secondQuery = FT.MatchQuery("boutique", field="description")

        conjunctionQuery = FT.ConjunctionQuery(firstQuery, secondQuery)

        result = self.cb.search(indexName, conjunctionQuery, limit=10, highlight_style="ansi")

        FTStringsTest.printResult("Conjunction Query", result)

    def test_query_string_method(self):
        indexName = "travel-sample-index-unstored"
        query = FT.QueryStringQuery("description: Imperial")

        result = self.cb.search(indexName, query, limit=10)

        FTStringsTest.printResult("Query String Query", result)

    def test_wild_card_query_method(self):
        indexName = "travel-sample-index-stored"
        query = FT.WildcardQuery("bouti*ue").field("description")

        result = self.cb.search(indexName, query, limit=10, highlight_style="ansi")

        FTStringsTest.printResult("Wild Card Query", result)

    def test_numeric_range_query_method(self):
        indexName = "travel-sample-index-unstored"
        query = FT.NumericRangeQuery(min=10100, max=10200, field="id")

        result = self.cb.search(indexName, query, limit=10)

        FTStringsTest.printResult("Numeric Range Query", result)

    def test_regexp_query_method(self):
        indexName = "travel-sample-index-stored"
        query = FT.RegexQuery("[a-z]", field="description")

        result = self.cb.search(indexName, query, limit=10, highlight=True)

        FTStringsTest.printResult("Regexp Query", result)