Run a Search With a Search Index
- concept
Run a Search query to search and return the contents of a Search index.
If you use the default search result sorting of _score, a document’s score determines where it appears in your search results.
| You must create a Search index before you can run a search with the Search Service. |
You can run a search against a Search index with:
-
A SQL++ query.
-
The Couchbase SDKs:
To run a Search query against multiple Search indexes at once, Create a Search Index Alias with the Web Console.
Scoring for Search Queries
As of Couchbase Server version 8.0, you can choose between 2 scoring algorithms for your Search index:
For more information about how to set your scoring model, see Set Search Index Advanced Settings.
| Scoring can also change based on whether you’re using synonyms in your Search index. For more information, see Running a Search for Synonyms. |
tf-idf Search Scoring
To determine a document’s score in search results, the Search Service can use the tf-idf algorithm.
tf-idf increases the score of a document based on term frequency, or the number of times a term occurs in a document divided by the total number of terms in the document.
It penalizes document frequency, or how often a term appears across all documents.
The tf-idf score is calculated at a partition level in a Search index.
The Search Service uses tf-idf to calculate the hit score for a document, multiplied by any boost parameters applied to each query inside the query object:
hit_score = (query_1_boost * query_1_hit_score) + (query_2_boost * query_2_hit_score)
If one of your Search queries is a Vector Search query, the calculation changes to:
hit_score = (query_1_boost * query_1_hit_score) + (knn_boost * knn_distance)
When running a hybrid search with the Server Web Console or REST API, the Search Service displays results as a disjunct (OR) between your regular Search and Vector Search queries.
When running a hybrid Search query and the tf-idf algorithm, you should add a boost value to your regular Search query to level the tf-idf score with the knn distance.
Otherwise, you might see unexpected search results.
This is because of the differences in the scoring algorithms between the 2 query types.
|
bm25 Search Scoring
Couchbase Server version 8.0
As of Couchbase Server version 8.0, you can choose to use the bm25 algorithm instead of tf-idf for your Search index.
bm25 ranks documents based on the query terms that appear in each document, regardless of proximity.
It calculates the number of times a term occurs in a document, with penalties for more common terms like tf-idf, but also includes:
-
Diminishing returns for a term that continues to frequently appear in documents, based on a saturation parameter (
k1). -
An adjustment to the resulting score, based on the total length of the current document field, divided by a normalized average length of the field across all documents (
b).
The value of k1 limits just how much a single query term’s frequency can affect the scoring of a document.
k1 reduces the effect of term repetition and the risk of documents with excessively repeated content inflating your relevance scores.
For example, spam or clickbait content would have reduced scores in bm25 over the same document sentence being scored with tf-idf.
The Search Service chooses reasonable defaults for the value of k1 and b.
Unlike tf-idf, bm25 rewards term frequency, but penalizes document frequency.
The calculation for a basic Search query is still based on the document’s score for a query multiplied by any boost parameters:
hit_score = (query_1_boost * query_1_hit_score) + (query_2_boost * query_2_hit_score)
The calculation when running a hybrid Search query, that includes a Vector Search query, is still:
hit_score = (query_1_boost * query_1_hit_score) + (knn_boost * knn_distance)
When running a hybrid search with the Server Web Console or REST API, the Search Service displays results as a disjunct (OR) between your regular Search and Vector Search queries.
bm25 supports better hybrid search results and richer result rankings.
It also gives more stable result ordering across Search index partitions, when global_scoring is enabled.
Run a Search with the Server Web Console
You can use the Server Web Console to test your Search index before you integrate search into your application.
You can enter a basic search query in the Server Web Console, or use a query object and other JSON properties for a more complex search. If your cluster is running Couchbase Server version 7.6.2 and later, the Server Web Console lets you generate a command-line curl example or edit the JSON for your query using a built-in code editor.
For more information about how to run a search with the Server Web Console, see Run A Simple Search with the Web Console.
For more information about how to configure a Search index and search for geospatial data, see Run a Geospatial Search Query with the Web Console.
Run a Search with the REST API
You can also use the REST API, curl, and HTTP to run a search.
Use a Search request JSON payload to control how the Search Service returns results.
For more information about how to run a search with the REST API, see Run a Simple Search with the REST API and curl/HTTP.
For more information about how to configure a Search index and search for geospatial data, see Run a Geospatial Search Query with the REST API and curl/HTTP.
Run a Search with a SQL++ Query
Use the Query tab to search using natural-language search and SQL++ features in the same query.
When using SQL++ with a hybrid Vector Search query, you have more flexibility in how you choose to display your search results.
When running a hybrid search with the Server Web Console or REST API, the Search Service displays results as a disjunct (OR) between your 2 search queries.
For example:
{
"query":
{
"match_phrase": "my regular query"
}
}
OR
{
"knn": [
"k": 5,
"field": "vector_field",
"vector": [0, 0, 128]
]
}
SQL++ allows you to choose whether to return search results as a conjunct (AND) or a disjunct (OR) between for hybrid search queries.
As a conjunct, the Search Service:
-
Returns matches that score highly for both the regular Search query and the Vector Search query.
-
Excludes matches that only match the Vector Search query.
For example:
SELECT meta().id FROM <key_space>
WHERE text = "content"
AND SEARCH(<key_space>, {"query": {"match": "content", "field": "text"}, "knn": {"vector": <vector_embedding>", "field": "vector_field", "k": 5}});
As a disjunct, the Search Service:
-
Returns matches for the regular Search query, followed by matches for the Vector Search query.
As a result, you could see matches for the Vector Search query that do not contain matches for the regular Search query.
For example:
SELECT meta().id FROM <key_space>
WHERE SEARCH (<key_space>, {"query": {"match": "content", "field": "text"}, "knn": {"vector": <vector_embedding>", "field": "vector_field", "k": 5}});
For more information about how to use the Search Service from a SQL++ query, see Search Functions.