Add Synonyms to a Search Index

  • concept
  • Couchbase Server 8.0
    +
    Add synonyms to a Search index to return matches for words with similar meanings when running a Search with the Search Service. A synonym is a word or phrase that has an exact or near similar definition to another word or phrase in the same language.

    As of Couchbase Server version 8.0 and later, the Search Service supports a user-defined thesaurus on each Search index. A thesaurus is divided into synonym collections that contain synonym documents. You can use a thesaurus to define what words function as synonyms inside your documents.

    For example, you could create a synonym document that defines cat as a synonym for feline. A Search query that contains the term cat would also return matches for feline. Based on your specific synonym document configuration, a Search query for feline could also return results for cat.

    Defining Synonyms

    You can define synonyms for multiple languages in a single thesaurus, divided by synonym collections. For example, you could have a separate synonym collection for English words, Spanish words, and German words.

    If your Search index is partitioned, the Search Service distributes synonym collections across your index partitions. The synonym collections are gathered together to create a single thesaurus, which the Search Service uses in Search queries.

    To use synonyms in a Search index, your synonym documents and collections must not be included as fields in your Search index definition. Synonym documents and indexed fields in a Search index must use the same analyzer to return the correct matches in search results. Synonym documents also must follow the correct syntax.

    For more information, see Create a Synonym Collection and Documents.

    Synonym searches do not run recursively. If you run a Search query for a single term, the Search Service will not run cascading searches for synonyms of that term’s synonyms.

    For example, if you defined fast with the synonyms quick and swift, a search for fast returns results for quick and swift. It does not return results for synonyms of quick or swift, like speedy or rapid.

    Exact matches to Search query terms will always score higher than synonym term matches. For fuzzy queries specifically:

    • Exact term matches have a full score of \$x\$

    • Synonyms for the exact term have a score of \$x/2\$

    • Fuzzy match terms have the score \$x * (1/(1+fuzz\i\n\ess))\$

    • Synonyms of a fuzzy term have the score \$x/2 * (1/(1+fuzz\i\n\ess))\$

    You cannot use synonyms in a Vector Search query. Any synonyms in a Search index do not affect or contribute to the results of a Vector Search query. Synonyms cannot be used in pre-filtered Vector Search queries.

    You can return results from a synonym search when running a Search query through SQL++. For example, if a Search index had a defined thesaurus with entries for quality, the following search would return any documents that contain the word quality or its synonyms:

    select meta().id from `travel-sample`.`inventory`.`landmark` as t WHERE SEARCH(t, "quality");

    Synonym Search Processing

    If you run a hybrid Search query, the Search Service executes your Vector Search query and accumulates synonyms before combining the results. For more information about the process for scoring results with the Search Service, see Scoring for Search Queries.

    For fuzzy queries, prefix queries, and wildcard queries, the Search Service:

    1. Collects synonyms to form the thesaurus

    2. Generates the terms to search for in your Search index by consulting the appropriate dictionary

    3. Runs the search