Use Scalar Columns to Filter Hyperscale Vector Index Scans

  • Capella Operational
  • how-to
    +
    You can reduce the number of vectors for a query that uses a Hyperscale Vector index by adding scalar values. A Hyperscale Vector index has a single column that indexes the vector. However, you can include scalar values in the index that you can use to filter the vector search. The index stores these included values along with the vector, but they’re not indexed.

    In your queries that use a Hyperscale Vector index, you add the included scalar values as predicates in the WHERE clause. During the index scan, Couchbase Capella uses the included scalar values to filter out vectors that do not meet the filter restriction. If the filter matches the entry, Couchbase Capella performs the more expensive vector comparison operation to determine the distance between the vector value and the search vector.

    Creating a Hyperscale Vector Index with Included Scalar Values

    When you create a Hyperscale Vector index, you can add scalar values to the index using the INCLUDE clause in the CREATE VECTOR INDEX statement.

    The following example creates a new Hyperscale Vector index for the embedding-vector-dot vector column from the Hyperscale Vector Index Query Example. It also includes the brightness key from the document.

    CREATE VECTOR INDEX `color_desc_hyperscale_brightness` 
           ON `color-vector-sample`.`color`.`rgb`(`embedding_vector_dot` VECTOR)
           INCLUDE (`brightness`)
           WITH { "dimension":1536, "similarity":"L2", "description":"IVF8,SQ4" }

    This example is the same as the example from Create Hyperscale Vector Index Example, except that it adds the INCLUDE clause to add the brightness key.

    Querying a Hyperscale Vector Index with Included Scalar Values

    When you query a Hyperscale Vector index, you can use the included scalar values as predicates in the WHERE clause of your query. Couchbase Capella uses the predicates to filter the results before performing the vector comparison.

    The following example performs the same query as the example from Query with a Hyperscale Vector Index, but adds the brightness key as a predicate in the WHERE clause:

    WITH question_vec AS (
            SELECT RAW couchbase_search_query.knn[0].vector  
            FROM `color-vector-sample`.`color`.`rgb-questions` 
            USE KEYS "#87CEEB")
    SELECT b.color, b.description, b.brightness
           FROM `color-vector-sample`.`color`.`rgb` AS b
           WHERE b.brightness > 170.0
           ORDER BY APPROX_VECTOR_DISTANCE(b.embedding_vector_dot, 
           question_vec[0], "l2", 4)  LIMIT 3;

    The result of running the query is:

    [{
            "color": "sky blue",
            "description": "Sky blue is a calming and serene color that evokes 
                            feelings of tranquility. It is a light shade of blue 
                            that resembles the color of the sky on a clear day. 
                            The color is often associated with peace, relaxation, 
                            and a sense of openness. It can also represent a sense 
                            of freedom and endless possibilities, as the sky seems 
                            to stretch on forever. Sky blue is a refreshing and 
                            soothing color that can bring a sense of calmness to 
                            any space.",
            "brightness": 188.077
        },
        {
            "color": "light sky blue",
            "description": "Light sky blue is a soft and delicate color that evokes 
                            a sense of tranquility and peace. It is a shade of blue 
                            that is reminiscent of a clear, sunny day with a few fluffy 
                            clouds scattered across the sky. This color is often 
                            associated with feelings of serenity and relaxation, making 
                            it a popular choice for bedrooms and spa-like environments. 
                            The lightness of this shade adds a touch of freshness and 
                            purity, making it a perfect color for creating a calming 
                            and inviting atmosphere.",
            "brightness": 189.787
        },
        {
            "color": "pale turquoise",
            "description": "Pale turquoise is a delicate and soothing color that can be 
                           described as a soft blend of blue and green. It has a calming 
                           effect and can evoke feelings of tranquility and serenity. The 
                           color is often associated with the ocean and can bring to mind 
                           images of clear, tropical waters. It has a gentle and subtle 
                           quality, making it a popular choice for creating a peaceful and 
                           serene atmosphere.",
            "brightness": 219.163
        }
    ]

    Querying with the brightness attribute changes makes sky blue the top result compared to the results from the example in Query with a Hyperscale Vector Index. The restriction can also make the query faster. On a laptop running a three-node cluster in Docker containers, the query ran in 43 ms versus 219 ms for the query without the filter.