Pushdowns

The Query engine implements many query processing optimizations to achieve the best possible performance for N1QL queries. The GSI Indexes are a vital part of query performance and are tightly coupled with the Query engine.

Index Pushdowns are performance optimizations where the Query engine pushes more of the work down to the Indexer. Query Indexer not only indexes data, it also supports various operations such as point scans, range scans, array indexing, sort order, and pagination. The Query engine tries to leverage the indexer functionality as much as possible by pushing down operations to the indexer as part of the index scan. This helps performance, predominantly, in two ways:

  1. Minimize the amount of data transferred from Indexer nodes to Query nodes.

  2. Minimize the amount of processing done at Query nodes.

Index Projection

(Introduced in Couchbase Server 5.0)

When processing a SELECT or DML with a WHERE clause, the Query engine picks one or more qualified indexes to be used for the query. Note that each index will have document field names explicitly specified as index-keys, and some metadata fields, such as meta().id, implicitly stored in the index. In earlier releases, the Indexer used to return all index-keys available in the index for the matching documents. From Couchbase Server 5.0, the Query engine requests the exact list of fields needed for the query. For a covered query, it is the fields referred in projection, predicate, GROUP BY clauses, ORDER BY clauses, HAVING clauses, ON key, and subqueries of the query. For non-covered queries, it is just META().id.

For example, consider the following index and query:

Example 1. Index Projection

This example uses the def_route_src_dst_day index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX `def_route_src_dst_day`
ON `travel-sample`(`sourceairport`,`destinationairport`,
   (distinct (array (`v`.`day`) for `v` in `schedule` end)))
WHERE (`type` = "route") ;
Query
EXPLAIN SELECT sourceairport FROM `travel-sample`
USE INDEX (def_route_src_dst_day)
WHERE sourceairport = "SFO"
AND type = "route"
LIMIT 1;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "DistinctScan",
            "limit": "1",
            "scan": {
              "#operator": "IndexScan3",
              "covers": [
                "cover ((`travel-sample`.`sourceairport`))",
                "cover ((`travel-sample`.`destinationairport`))",
                "cover ((distinct (array (`v`.`day`) for `v` in (`travel-sample`.`schedule`) end)))",
                "cover ((meta(`travel-sample`).`id`))"
              ],
              "filter_covers": {
                "cover ((`travel-sample`.`type`))": "route"
              },
              "index": "def_route_src_dst_day",
              "index_id": "9191325cd5eda22e",
              "index_projection": {
                "entry_keys": [ (1)
                  0
                ],
                "primary_key": true (2)
              },
...

The query refers to fields sourceairport and type. The index is wider in scope, that is, it has sourceairport, destinationairport, schedule.day, and type fields.

So, for each matching document, the query requires only a subset of the data stored in the index. With index-projection support, the Query engine indicates the exact data requested as part of the index-scan. In this example,

1 The entry_keys in the EXPLAIN output indicate the exact index-key fields that should be returned in the index-scan result. This has only one entry (0) indicating the first index-key sourceairport.
2 Also, the primary_key indicates whether the index should return the primary key meta().id of the matching document.

Note that in some cases (such as when distinctScan or intersectScan are used, as in this example), the primary key meta().id may be retrieved even though the query doesn’t explicitly specify it in the query. Without this optimization, index-scan would return all the index-keys defined in the index. If the index_projection field is missing in the EXPLAIN output, then Indexer would return all index-keys.

Predicate Pushdown

The Query engine and GSI indexes support many optimizations for efficiently processing predicate push-downs. In general, this performance optimization is leveraged when the Query engine decides to use an Index-scan for processing a query, and whole or partial predicates can be pushed to the indexer to filter documents of interest to the query.

For example, in the above query Example 1 with a simple WHERE clause, the predicate (sourceairport = "SFO") is pushed to the index def_route_src_dst_day with the following single span and range. These attributes exactly define different characteristics of the index scan:

Example 2. Predicate Pushdown

The span and range from the EXPLAIN output of Example 1.

Result
...
              "spans": [
                {
                  "exact": true,
                  "range": [
                    {
                      "high": "\"SFO\"",
                      "inclusion": 3,
                      "low": "\"SFO\""
                    }
                  ]
                }
...
  • Each Span defines details about one index-key summarizing corresponding predicate conditions into a range-scan lookup for the index. In this example, the predicate condition (sourceairport = "SFO") translates to one span with one range that specifies both low and high values of "SFO" (to imply equals condition).

  • Refer to section Scans for more information.

Composite Predicate Pushdown

Compound or composite predicates are those with multiple conditions on different fields of the document. When the predicate is conjunctive with multiple AND conditions, then a single span with multiple ranges are specified in the index-scan request. When the predicate is disjunctive, then multiple spans are specified. See Scans for more details and examples on how predicate pushdown works for various types of index-scans as well as the conjunctive predicate AND and the disjunctive predicate OR.

Index key order and structure

Composite indexes have more than one index key, and the order of the index keys is important for any lookup or scan of the index, because the indexes structure all the indexed entries in linearized default collation sorted order of all the index-keys. For example, consider the following index:

CREATE INDEX `idx_age_name` ON users(age, name);
IndexKeyOrder

Various age and name values are stored in the index in a tree like structure (represented by the triangle in the above picture) with all the index key values linearly sorted as ordered pairs. For instance,

  • The above picture shows index-entries with all names in sorted order with an age of 20 followed by the entries for age 21 and related names.

  • The arrowed paths logically depicts how an index lookup or scan would find entries in the index.

  • A point lookup query for age=20 AND name="joe" may follow arrows labelled p1.

  • Similarly, a range scan for (age BETWEEN 20 and 21) AND (name="joe") may find entries of interest between the paths labelled p1 and p2 (highlighted in green).

    This range may include some unwanted entries (such as "mark", "abby", "anne") which will be filtered subsequently.
    Queries with predicates such as (age = 20) AND (name BETWEEN "joe" and "mark") will need all the entries found using range scans.

In general, when the predicate has a range condition on prefixing index-keys (such as age) may produce unwanted results from the range-scan index-lookups. In Couchbase Server 5.0, the Query service and Indexer are enhanced with complete and accurate predicate pushdown to filter such unnecessary results in the Indexer itself. This improves query performance as it saves the additional overhead in transferring the unwanted data/results to query nodes and subsequently filtering the results. This is explained with an example in the following document: Composite predicate with range-scan on prefix index-keys.

In Couchbase Server 4.x, the indexer would return such extraneous unwanted results and the Query service would filter them, thus guaranteeing accurate final query results.

Composite predicate with range-scan on prefix index-keys

The Query engine supports efficient predicate pushdown to indexes in the cases when the WHERE clause has a range predicate on any of the prefixing index-keys.

Consider the following query, which finds all destination airports within 2000 miles of LAX.

Index
CREATE INDEX `def_route_src_dst_dist`
ON `travel-sample`(`distance`,`sourceairport`,`destinationairport`)
WHERE (`type` = "route");
Query
EXPLAIN SELECT destinationairport
FROM `travel-sample`
USE INDEX (def_route_src_dst_dist)
WHERE type = "route"
AND distance < 2000 AND sourceairport = "LAX"; (1)
Results
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan3",
        "covers": [
          "cover ((`travel-sample`.`distance`))",
          "cover ((`travel-sample`.`sourceairport`))",
          "cover ((`travel-sample`.`destinationairport`))",
          "cover ((meta(`travel-sample`).`id`))"
        ],
        "filter_covers": {
          "cover ((`travel-sample`.`type`))": "route"
        },
        "index": "def_route_src_dst_dist",
        "index_id": "5f58baa4c3b564d2",
        "index_projection": {
          "entry_keys": [
            0,
            1,
            2
          ]
        },
        "keyspace": "travel-sample",
        "namespace": "default",
        "spans": [ (2)
          {
            "exact": true,
            "range": [ (3)
              {
                "high": "2000", (4)
                "inclusion": 0,
                "low": "null"
              },
              {
                "high": "\"LAX\"", (5)
                "inclusion": 3,
                "low": "\"LAX\""
              }
            ]
          }
        ],
        "using": "gsi"
      },
...

In this query:

1 The predicate has the range condition on the first index-key distance and an equality predicate on the 2nd index-key sourceairport.
2 The predicate is accurately represented and pushed-down to the indexer, as shown in the spans attribute of the EXPLAIN query plan output.
3 The range[] attribute is an array of the predicate bounds for individual index-keys involved in the compound predicate.
4 The first element of range[] corresponds to the index-key distance with (low, high) values (null, 2000) respectively.
5 The second element of range[] corresponds to the index-key sourceairport with (low, high) values ("LAX", "LAX") representing equals condition.

The Indexer processes the lookup request and exactly returns only the documents matching the predicate conditions. For example, when you enable monitoring with the configuration parameter profile = "timings" for this query, you can see that the indexer returns 165 documents, which is the same as the final result set of the query.

...
        "~children": [
          {
            "#operator": "IndexScan3",
            "#stats": {
              "#itemsOut": 165,
              "#phaseSwitches": 663,
              "execTime": "973µs",
              "kernTime": "229µs",
              "servTime": "100.721ms"
            },
...
              {
                "#operator": "FinalProject",
                "#stats": {
                  "#itemsIn": 165,
                  "#itemsOut": 165,
                  "#phaseSwitches": 496,
                  "execTime": "607µs",
                  "kernTime": "2.236ms"
                },
...

Pagination Pushdown

Pagination in N1QL queries is achieved by using the LIMIT and OFFSET clauses, and both of the operators can be pushed to indexer whenever possible. These operators may not always be pushed to Indexer, depending on the following factors:

  • Whether or not the whole predicate in the WHERE clause can be completely and accurately pushed to a single index.

  • When using IntersectScan, the Query engine uses multiple indexes to process the query. As such, LIMIT/OFFSET will need to be processed in the Query engine at a later stage of query processing, and hence cannot be pushed to the Indexer.

  • Whether or not the SELECT query has other clauses that may impact pagination, such as ORDER BY or JOIN. For example,

    • When ORDER BY key in the query is different from that of the index order, then the query layer will need to process the sort; and hence, in those cases, the pagination cannot be pushed to the indexer as shown in Example 5 below.

    • For JOIN queries, index scans can be used only for the left side keyspace. Subsequent JOIN phrases may filter some documents, after which only LIMIT/OFFSET can be applied. Hence, the pagination operators cannot be pushed when a query has JOIN clauses.

(LIMIT pushdown is supported in Couchbase Server 4.5.0 versions.)

(OFFSET pushdown is introduced in Couchbase Server 5.0.)

Example 3. Pagination with Secondary Index

When using a secondary index, both LIMIT and OFFSET operators are pushed to the index. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
EXPLAIN SELECT * FROM `travel-sample`
WHERE city = "San Francisco"
OFFSET  4000  LIMIT 10000;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "index": "def_city",
            "index_id": "796a0018ff906a1a",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "travel-sample",
            "limit": "10000", (1)
            "namespace": "default",
            "offset": "4000", (2)
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"San Francisco\"",
                    "inclusion": 3,
                    "low": "\"San Francisco\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
...
1 The IndexScan3 operator handles limit.
2 The IndexScan3 operator handles offset.
Example 4. Pagination with Primary Index

When using a primary index, both LIMIT and OFFSET operators are pushed to the Indexer. This example uses the def_primary index that comes pre-installed and can be made by the statement:

Index
CREATE PRIMARY INDEX `def_primary` ON `travel-sample`;
Query
EXPLAIN SELECT * FROM `travel-sample`
OFFSET  4000  LIMIT 10000;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "PrimaryScan3",
            "index": "def_primary",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "travel-sample",
            "limit": "10000", (1)
            "namespace": "default",
            "offset": "4000", (2)
            "using": "gsi"
          },
...
1 The PrimaryScan3 operator handles limit.
2 The PrimaryScan3 operator handles offset.
Example 5. Pagination with Different Index Order

LIMIT and OFFSET operators are not pushed to the Indexer when the index order is different from that specified in the ORDER BY. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
EXPLAIN SELECT * FROM `travel-sample`
USE INDEX(def_city)
WHERE city = "San Francisco"
ORDER BY name
OFFSET  4000  LIMIT 10000;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3", (1)
            "index": "def_city",
            "index_id": "796a0018ff906a1a",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "travel-sample",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"San Francisco\"",
                    "inclusion": 3,
                    "low": "\"San Francisco\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
...
      {
        "#operator": "Order",
        "limit": "10000",
        "offset": "4000",
        "sort_terms": [
          {
            "expr": "(`travel-sample`.`name`)"
          }
        ]
      },
      {
        "#operator": "Offset", (2)
        "expr": "4000"
      },
      {
        "#operator": "Limit", (3)
        "expr": "10000"
      },
      {
        "#operator": "FinalProject"
      }
...
1 The IndexScan3 operator does not handle offset or limit.
2 The Offset operator is handled by the Query engine.
3 The Limit operator is handled by the Query engine.

Using Index Order

The Query engine may avoid ORDER BY processing in cases where the ordering of entries in the index can be leveraged for the query. The Query engine carefully evaluates each query to decide whether ORDER BY keys are aligned with the index key order. For example, ORDER BY may not be pushed down when the ORDER BY fields are not aligned with the index-key order defining the index.

Example 6. Ascending Sort by String Field

Find all cities that start with "San", and sort the results by the city name in ascending order. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
EXPLAIN SELECT city FROM `travel-sample`
WHERE city LIKE "San%"
ORDER BY city;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`city`))",
              "cover ((meta(`travel-sample`).`id`))"
            ],
            "index": "def_city",
...
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "cover ((`travel-sample`.`city`))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject" (1)
                }
              ]
            }
          }
...
1 In this example, you can see that the query plan does not have an ORDER operator before the final projection. That means order pushdown is being leveraged, and the query is relying on the index order.
Example 7. Ascending Sort by Primary Key

Find all cities that start with "San", and sort the results by the document primary key in ascending order. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
EXPLAIN SELECT city FROM `travel-sample`
WHERE city LIKE "San%"
ORDER BY meta().id;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`city`))",
              "cover ((meta(`travel-sample`).`id`))"
            ],
            "index": "def_city",
...
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "cover ((`travel-sample`.`city`))"
                    }
                  ]
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Order", (1)
        "sort_terms": [
          {
            "expr": "cover ((meta(`travel-sample`).`id`))"
          }
        ]
      },
      {
        "#operator": "FinalProject"
      }
...
1 In this example, you can see an additional ORDER operator before the final projection, because the ORDER BY field meta().id is different from the index order key city.

Limitation

Currently the Query engine supports order pushdown only when the ORDER BY keys are aligned with the Index order. But reverse-scan of the index is not supported.

For example, in Example 6 above, you can see that the query plan does not have an ORDER operator before the final projection, because the index order is the same as the ascending order specified in the query. Similarly, in the following example Example 8, the ORDER BY clause has DESC, and that matches with the index order defined by the index def_city_desc.

However, the ASC order in Example 6 will not be able to leverage the index order in the index def_city_desc, nor will the DESC order in Example 8 be able to leverage the index order in the index def_city.

Example 8. Descending Sort by String Field

Descending variation of Example 6.

Index
CREATE INDEX def_city_desc ON `travel-sample`(`city` DESC);
Query
EXPLAIN SELECT city FROM `travel-sample`
WHERE city LIKE "San%"
ORDER BY city DESC;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`city`))",
              "cover ((meta(`travel-sample`).`id`))"
            ],
            "index": "def_city_desc",
            "index_id": "eec274a0f271479c",
            "index_order": [
              {
                "desc": true,
                "keypos": 0
              }
...
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "cover ((`travel-sample`.`city`))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
...

Operator Pushdowns

The Query engine tries to avoid unnecessary processing operators such as MIN(), MAX(), and COUNT(), which can be processed by the Indexer much more efficiently. In such cases, the Query engine pushes down necessary hints or options to the Indexer to process the following operators.

MAX() Pushdown

(Introduced in Couchbase Server 5.0)

This function returns the highest value of the input field based on the default collation rules. (For details, see Data types and Comparison Operators.)

Prior to Couchbase Server 5.5, MAX() could be pushed to the Indexer only when the index was created with matching index keys in descending order. This limitation has been removed in Couchbase Server 5.5 and later.

Example 9. MAX of a String Field

Find the alphabetically highest city name in travel-sample. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
EXPLAIN SELECT MAX(city)
FROM `travel-sample`
USE INDEX (def_city)
WHERE city is not null;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan3",
        "covers": [
          "cover ((`travel-sample`.`city`))",
          "cover ((meta(`travel-sample`).`id`))",
          "cover (max(cover ((`travel-sample`.`city`))))"
        ],
        "index": "def_city",
        "index_group_aggs": {
          "aggregates": [
            {
              "aggregate": "MAX",
              "depends": [
                0
              ],
              "expr": "cover ((`travel-sample`.`city`))",
              "id": 2,
              "keypos": 0
            }
...

MIN() Pushdown

(Introduced in Couchbase Server 5.0)

This function returns the lowest value of the input field based on the default collation rules. (For details, see Data types and Comparison Operators.)

Prior to Couchbase Server 5.5, MIN() could be pushed to the Indexer only when the index was created with matching index keys in ascending order. This limitation has been removed in Couchbase Server 5.5 and later.

Example 10. MIN of a String Field

Find the alphabetically lowest city name in travel-sample. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
EXPLAIN SELECT MIN(city)
FROM `travel-sample`
USE INDEX (def_city)
WHERE city is not null;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan3",
        "covers": [
          "cover ((`travel-sample`.`city`))",
          "cover ((meta(`travel-sample`).`id`))",
          "cover (min(cover ((`travel-sample`.`city`))))"
        ],
        "index": "def_city",
        "index_group_aggs": {
          "aggregates": [
            {
              "aggregate": "MIN",
              "depends": [
                0
              ],
              "expr": "cover ((`travel-sample`.`city`))",
              "id": 2,
              "keypos": 0
            }
...

COUNT() Pushdown

(Introduced in Couchbase Server 5.0)

This function returns the total number of non-Null values of an input field from the matching documents of an index scan.

Example 11. Count of a String Field

Find the number of cities entered in travel-sample. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
SELECT COUNT(city) AS NumberOfCities
FROM `travel-sample`
use index (def_city)
WHERE city is not null;
Result
[
  {
    "NumberOfCities": 7341
  }
]
Example 12. Count Details

The details behind Example 11.

Query
EXPLAIN SELECT COUNT(city) AS NumberOfCities
FROM `travel-sample`
use index (def_city)
WHERE city is not null;
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan3", (1)
        "covers": [
          "cover ((`travel-sample`.`city`))",
          "cover ((meta(`travel-sample`).`id`))",
          "cover (count(cover ((`travel-sample`.`city`))))"
        ],
        "index": "def_city",
        "index_group_aggs": {
          "aggregates": [
            {
              "aggregate": "COUNT",
              "depends": [
                0
              ],
              "expr": "cover ((`travel-sample`.`city`))",
              "id": 2,
              "keypos": 0
            }
...
1 In Couchbase Server 5.5 and later, the index operator IndexCountScan3 counts values so the Query Service does not need to do additional processing.

COUNT(DISTINCT) Pushdown

(Introduced in Couchbase Server 5.0)

This function returns the total number of unique non-Null values of an input field from the matching documents of an index scan.

Example 13. Count Distinct of a String Field

Find the number of unique city names in travel-sample. This example uses the def_city index that comes pre-installed and can be made by the statement:

Index
CREATE INDEX def_city ON `travel-sample`(city ASC);
Query
SELECT COUNT (DISTINCT city) AS NumberOfDistinctCities
FROM `travel-sample`
USE index (def_city)
WHERE city is not null;
Result
[
  {
    "NumberOfDistinctCities": 2301
  }
]
Example 14. Count Distinct Details

The details behind Example 13.

Query
EXPLAIN SELECT COUNT (DISTINCT city) AS NumberOfDistinctCities
FROM `travel-sample`
use index (def_city)
Result
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan3", (1)
        "covers": [
          "cover ((`travel-sample`.`city`))",
          "cover ((meta(`travel-sample`).`id`))",
          "cover (count(distinct cover ((`travel-sample`.`city`))))"
        ],
        "index": "def_city",
        "index_group_aggs": {
          "aggregates": [
            {
              "aggregate": "COUNT",
              "depends": [
                0
              ],
              "distinct": true,
              "expr": "cover ((`travel-sample`.`city`))",
              "id": 2,
              "keypos": 0
            }
...
1 In Couchbase Server 5.5 and later, the index operator IndexCountScan3 counts distinct values so the Query Service does not need to do additional processing.