Early Filters, Order, and Pagination

  • concept
  • Couchbase Server 7.1.2
    +
    When covering indexes and index pushdowns are not available, the Query Service may use early filtering, early ordering, and early pagination to improve the query response time.

    Overview

    If a query has a covering index, and the query is on a single keyspace, then the index contains all the data needed to perform the query. The Query service does not need to fetch data from the Data service. Any filters, order, or pagination (offset and limit) are performed by the Index service.

    Covering index
    Figure 1. Covering index

    If a query is not covered by an index, then the query needs to fetch data from the Data service after performing the index scan, in order to return the requested results. By default, any filters, order, or pagination are performed by the Query service after the fetch.

    Default filters, order, and pagination
    Figure 2. Default filters, order, and pagination

    Fetching data from the Data service can be expensive. Any feature that can reduce the amount of data to be fetched will improve the query response time. It would obviously be inefficient to fetch an entire dataset, and then discard a large portion of it by applying filters and pagination. Couchbase Server therefore attempts to apply filters, order, and pagination before fetching data, wherever possible.

    Index pushdowns provide one way to reduce the amount of data to be fetched. Some filters, ordering, and pagination may be pushed down to the Index service, along with grouping and aggregates. The Index service applies these operations to the result of the index scan. The Query service then only needs to fetch the remaining data.

    Filter, order, and pagination pushdown
    Figure 3. Filter, order, and pagination pushdown

    However, there are some operations that cannot be pushed down to the Index service: for example, a filter that uses a function or a LIKE operator on top of an index term. These operations must be performed by the Query service. In versions prior to Couchbase Server 7.1.2, the Query Service first performs a fetch from the Data service, and then applies any operations that could not be pushed down to the Index service. This still leads to some inefficient queries.

    In Couchbase Server 7.1.2 and later, early filtering, early ordering, and early pagination provide another line of defense, in cases where covering indexes and index pushdowns are not available. With this feature, the Query service may apply filtering, ordering, and pagination directly to the results returned by the Index service, before performing the fetch.

    Early filters, order, and pagination
    Figure 4. Early filters, order, and pagination

    Note that in practice, an uncovered query may use a mixture of the methods illustrated by Figure 2, Figure 3, and Figure 4. In addition, early filtering may be performed independently of early ordering and pagination, and vice versa.

    Early Filtering

    Early filtering may be applied independently of early ordering and pagination. For an illustration of early filtering, consider the following index.

    Index I
    CREATE INDEX idx_eat_city_name
    ON `travel-sample`.inventory.landmark(city DESC, name)
    WHERE activity = "eat";

    This index contains all the data needed to perform Example 1; it therefore acts as a covering index to this query.

    Example 1. Covered query
    Query
    EXPLAIN SELECT city, name FROM `travel-sample`.inventory.landmark
    WHERE city = "Paris" AND activity = "eat";

    The query plan shows that the query is covered. In addition, the predicate city = "Paris" is applied as a span by the Index service.

    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan3",
              "bucket": "travel-sample",
              "covers": [ (1)
                "cover ((`landmark`.`city`))",
                "cover ((`landmark`.`name`))",
                "cover ((meta(`landmark`).`id`))"
              ],
              "filter_covers": {
                "cover ((`landmark`.`activity`))": "eat"
              },
              "index": "idx_eat_city_name",
              "index_id": "e5dbf76060ed1d19",
              "index_projection": {
                "entry_keys": [
                  0,
                  1
                ]
              },
              "keyspace": "landmark",
              "namespace": "default",
              "scope": "inventory",
              "spans": [
                {
                  "exact": true,
                  "range": [ (2)
                    {
                      "high": "\"Paris\"",
                      "inclusion": 3,
                      "index_key": "`city`",
                      "low": "\"Paris\""
                    }
                  ]
                }
              ],
              "using": "gsi"
            },
    // ...
    1 The covers section shows that the query is covered.
    2 The range section shows that this predicate is applied as a span by the Index service.

    Example 2 adds an extra field to the projection. Now the query is no longer covered, and the Query service needs to fetch documents based on keys returned from the index.

    Example 2. Predicate pushdown
    Query
    EXPLAIN SELECT city, name, address FROM `travel-sample`.inventory.landmark
    WHERE city = "Paris" AND activity = "eat";

    This query does not run too slowly, since the predicate city = "Paris" is pushed down to the Index service. The index only returns document keys for documents matching the predicate.

    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan3",
              "bucket": "travel-sample",
              "index": "idx_eat_city_name",
              "index_id": "e5dbf76060ed1d19",
              "index_projection": {
                "primary_key": true
              },
              "keyspace": "landmark",
              "namespace": "default",
              "scope": "inventory",
              "spans": [
                {
                  "exact": true,
                  "range": [ (1)
                    {
                      "high": "\"Paris\"",
                      "inclusion": 3,
                      "index_key": "`city`",
                      "low": "\"Paris\""
                    }
                  ]
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Fetch", (2)
              "bucket": "travel-sample",
              "keyspace": "landmark",
              "namespace": "default",
              "scope": "inventory"
            },
    // ...
    1 The range section shows that this predicate is pushed down to the Index service.
    2 After the pushdown, the Query service fetches documents from the Data service.

    Now let’s consider the case when the filter is not on the index key exactly, but applies a function on top of the index key.

    Example 3. Early filtering
    Query
    EXPLAIN SELECT city, name, address FROM `travel-sample`.inventory.landmark
    WHERE LOWER(city) = "paris" AND activity = "eat";

    In this case, the filter cannot be pushed down to the Index service. This is where early filtering comes into play. The Query service can apply the specified filter to the results of the index scan, before fetching the matching documents from the Data service.

    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan3",
              "bucket": "travel-sample",
              "filter": "(lower(_index_key ((`landmark`.`city`))) = \"paris\")", (1)
              "index": "idx_eat_city_name",
              "index_conditions": { (2)
                "_index_condition ((`landmark`.`activity`))": "eat"
              },
              "index_id": "e5dbf76060ed1d19",
              "index_keys": [ (3)
                "_index_key ((`landmark`.`city`))",
                "_index_key ((meta(`landmark`).`id`))"
              ],
              "index_projection": {
                "entry_keys": [ (4)
                  0
                ],
                "primary_key": true
              },
              "keyspace": "landmark",
              "namespace": "default",
              "scope": "inventory",
              "spans": [
                {
                  "range": [
                    {
                      "inclusion": 0,
                      "index_key": "`city`",
                      "low": "null"
                    }
                  ]
                }
              ],
              "using": "gsi"
            },
    // ...
    1 The early index filters, including all the filters that can be evaluated on the index keys. Inside the filter, _index_key ((`landmark`.`city`)) is analogous to cover ((`landmark`.`city`)) for a covering index scan.
    2 The index_conditions section is analogous to the filter_covers section in the plan for a covered query.
    3 The index_keys section is analogous to the covers section in the plan for a covered query.
    4 The index_projection section contains the _index_key needed for evaluating the early index filters — in this case position 0 for "city".

    If necessary, the Query service may use a mixture of index pushdowns and early filtering, as shown in the following example.

    Example 4. Predicate pushdown and early filter
    Query
    EXPLAIN SELECT city, name, address FROM `travel-sample`.inventory.landmark
    WHERE city = "Paris" AND name LIKE "%Paris%" AND activity = "eat";

    The Index service evaluates the filters used to generate the index spans first, before returning to the Query service. As a result, the early filters only contain the predicates that were not pushed down to the Index service.

    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan3",
              "bucket": "travel-sample",
              "filter": "(_index_key ((`landmark`.`name`)) like \"%Paris%\")", (1)
              "index": "idx_eat_city_name",
              "index_conditions": {
                "_index_condition ((`landmark`.`activity`))": "eat"
              },
              "index_id": "e5dbf76060ed1d19",
              "index_keys": [
                "_index_key ((`landmark`.`name`))",
                "_index_key ((meta(`landmark`).`id`))"
              ],
              "index_projection": {
                "entry_keys": [
                  1
                ],
                "primary_key": true
              },
              "keyspace": "landmark",
              "namespace": "default",
              "scope": "inventory",
              "spans": [ (2)
                {
                  "range": [
                    {
                      "high": "\"Paris\"",
                      "inclusion": 3,
                      "index_key": "`city`",
                      "low": "\"Paris\""
                    },
                    {
                      "high": "[]",
                      "inclusion": 1,
                      "index_key": "`name`",
                      "low": "\"\""
                    }
                  ]
                }
              ],
              "using": "gsi"
            },
    // ...
    1 The filter name LIKE "%Paris%" does not generate an exact index span, and needs to be evaluated as an early index filter.
    2 The filter city = "Paris" generates an exact span, and thus can be pushed down to the Index service.

    Early Order and Pagination

    Early order and pagination may be applied independently of early filtering. As an illustration, consider the following index.

    Index II
    CREATE INDEX idx_landmark_city_name
    ON `travel-sample`.inventory.landmark(city DESC, name);

    This index acts as a covering index to Example 5.

    Example 5. Covered query
    Query
    EXPLAIN SELECT city, name FROM `travel-sample`.inventory.landmark
    WHERE city IS NOT MISSING
    ORDER BY city DESC, name OFFSET 100 LIMIT 5;

    The covers section of the query plan shows that the query is covered. In addition, the ordering, offset, and limit are performed by the Index service.

    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "covers": [
                    "cover ((`landmark`.`city`))",
                    "cover ((`landmark`.`name`))",
                    "cover ((meta(`landmark`).`id`))"
                  ],
                  "index": "idx_landmark_city_name",
                  "index_id": "c2a48a468ef63f23",
                  "index_order": [ (1)
                    {
                      "desc": true,
                      "keypos": 0
                    },
                    {
                      "keypos": 1
                    }
                  ],
                  "index_projection": {
                    "entry_keys": [
                      0,
                      1
                    ]
                  },
                  "keyspace": "landmark",
                  "limit": "5", (2)
                  "namespace": "default",
                  "offset": "100", (3)
                  "scope": "inventory",
                  "spans": [
                    {
                      "exact": true,
                      "range": [
                        {
                          "inclusion": 1,
                          "index_key": "`city`",
                          "low": "null"
                        }
                      ]
                    }
                  ],
                  "using": "gsi"
                },
    // ...
    1 Index order
    2 Limit
    3 Offset

    Example 6 adds an extra field to the projection. Now the query is no longer covered.

    Example 6. Order and pagination pushdown
    Query
    EXPLAIN SELECT city, name, address FROM `travel-sample`.inventory.landmark
    WHERE city IS NOT MISSING
    ORDER BY city DESC, name OFFSET 100 LIMIT 5;

    However the query runs pretty quickly, since the ordering, offset, and limit can still be pushed down to the Index service.

    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "index": "idx_landmark_city_name",
                  "index_id": "c2a48a468ef63f23",
                  "index_order": [ (1)
                    {
                      "desc": true,
                      "keypos": 0
                    },
                    {
                      "keypos": 1
                    }
                  ],
                  "index_projection": {
                    "primary_key": true
                  },
                  "keyspace": "landmark",
                  "limit": "5", (2)
                  "namespace": "default",
                  "offset": "100", (3)
                  "scope": "inventory",
                  "spans": [
                    {
                      "exact": true,
                      "range": [
                        {
                          "inclusion": 1,
                          "index_key": "`city`",
                          "low": "null"
                        }
                      ]
                    }
                  ],
                  "using": "gsi"
                },
    // ...
    1 Index order pushdown
    2 Limit pushdown
    3 Offset pushdown

    However, if we change the order of the ordering terms so that they no longer match the index key order, ordering cannot be pushed down to the Index service.

    Example 7. Early order and pagination
    Query
    EXPLAIN SELECT city, name, address FROM `travel-sample`.inventory.landmark
    WHERE city IS NOT MISSING
    ORDER BY name, city DESC OFFSET 100 LIMIT 5;

    So in this case, early ordering, limit, and offset are done by the Query service before fetch.

    [
      {
        "plan": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "index": "idx_landmark_city_name",
                  "index_id": "c2a48a468ef63f23",
                  "index_keys": [
                    "_index_key ((`landmark`.`city`))",
                    "_index_key ((`landmark`.`name`))",
                    "_index_key ((meta(`landmark`).`id`))"
                  ],
                  "keyspace": "landmark",
                  "namespace": "default",
                  "scope": "inventory",
                  "spans": [
                    {
                      "exact": true,
                      "range": [
                        {
                          "inclusion": 1,
                          "index_key": "`city`",
                          "low": "null"
                        }
                      ]
                    }
                  ],
                  "using": "gsi"
                },
                {
                  "#operator": "Order", (1)
                  "flags": 1,
                  "limit": "5", (2)
                  "offset": "100", (3)
                  "sort_terms": [
                    {
                      "expr": "_index_key ((`landmark`.`name`))"
                    },
                    {
                      "desc": "\"desc\"",
                      "expr": "_index_key ((`landmark`.`city`))"
                    }
                  ]
                },
    // ...
    1 Early ordering
    2 Early limit
    3 Early offset

    Similarly, if we change the direction of the sort from descending to ascending, the ordering cannot be pushed down to the Index service, since it does not support reverse index scan.

    Example 8. Early reverse order and pagination
    Query
    EXPLAIN SELECT city, name, address FROM `travel-sample`.inventory.landmark
    WHERE city IS NOT MISSING
    ORDER BY city ASC, name OFFSET 100 LIMIT 5;

    In this case also, early ordering, limit, and offset are done by the Query service before fetch.

    // ...
                {
                  "#operator": "Order",
                  "flags": 1,
                  "limit": "5",
                  "offset": "100",
                  "sort_terms": [ (1)
                    {
                      "desc": "\"asc\"",
                      "expr": "_index_key ((`landmark`.`city`))"
                    },
                    {
                      "expr": "_index_key ((`landmark`.`name`))"
                    }
                  ]
                },
    // ...
    1 The sort_terms section uses _index_key expressions to sort the results using the index keys.

    Limitations

    The optimizer only selects early ordering and pagination when the following conditions are met. Many of these conditions are similar to the conditions for index order pushdowns.

    • The query must be simple, referencing a single keyspace with no joins.

    • There can be no GROUP BY or aggregates in the query.

    • Index pushdown for ORDER, LIMIT and OFFSET must not be possible.

    • The ORDER BY clause and LIMIT clause must be present. The OFFSET clause is optional.

    • All expression references in the ORDER BY clause must be present in the index — similar to the covering requirement for index pushdown. However there is no requirement for exact index spans.

    • Projection alias can be used in the ORDER BY clause, if all the references in the aliased expression are available from the index.

    • Only simple IndexScan is supported for early ordering. More complex index scans such as IntersectScan, OrderedIntersectScan, UnionScan and DistinctScan are not supported.

    • When early ordering is in effect, parallelism is turned off.

    Summary

    The following table summarizes the different methods of performing filtering, order, or pagination, from most efficient to least.

    Covering index Index pushdown
    [note]
    Early filters, order, or pagination
    [note]
    Default filters, order, or pagination
    [note]

    Fetch required?

    No

    Yes

    Yes

    Yes

    Filters, order, and pagination
    performed:

    After scan

    After scan,
    before fetch

    After scan,
    before fetch

    After fetch

    …​ by:

    Index service

    Index service

    Query service

    Query service

    An uncovered query may use a mixture of these methods.