Early Filters, Order, and Pagination
- concept
When covering indexes and index pushdowns are not available, the Query Service may use early filtering, early ordering, and early pagination to improve the query response time.
Overview
If a query has a covering index, and the query is on a single keyspace, then the index contains all the data needed to perform the query. The Query service does not need to fetch data from the Data service. Any filters, order, or pagination (offset and limit) are performed by the Index service.
If a query is not covered by an index, then the query needs to fetch data from the Data service after performing the index scan, in order to return the requested results. By default, any filters, order, or pagination are performed by the Query service after the fetch.
Fetching data from the Data service can be expensive. Any feature that can reduce the amount of data to be fetched will improve the query response time. It would obviously be inefficient to fetch an entire dataset, and then discard a large portion of it by applying filters and pagination. Couchbase Server therefore attempts to apply filters, order, and pagination before fetching data, wherever possible.
Index pushdowns provide one way to reduce the amount of data to be fetched. Some filters, ordering, and pagination may be pushed down to the Index service, along with grouping and aggregates. The Index service applies these operations to the result of the index scan. The Query service then only needs to fetch the remaining data.
However, there are some operations that cannot be pushed down to the Index service: for example, a filter that uses a function or a LIKE operator on top of an index term. These operations must be performed by the Query service. If the Query Service has to perform a fetch from the Data service, and then apply any operations that could not be pushed down to the Index service, this still leads to some inefficient queries.
In Couchbase Server, early filtering, early ordering, and early pagination provide another line of defense, in cases where covering indexes and index pushdowns are not available. With this feature, the Query service may apply filtering, ordering, and pagination directly to the results returned by the Index service, before performing the fetch.
Early Filtering
Early filtering may be applied independently of early ordering and pagination. For an illustration of early filtering, consider the following index.
CREATE INDEX idx_eat_city_name
ON landmark(city DESC, name)
WHERE activity = "eat";
This index contains all the data needed to perform Example 1; it therefore acts as a covering index to this query.
EXPLAIN SELECT city, name FROM landmark
WHERE city = "Paris" AND activity = "eat";
The query plan shows that the query is covered.
In addition, the predicate city = "Paris"
is applied as a span by the Index service.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [ (1)
"cover ((`landmark`.`city`))",
"cover ((`landmark`.`name`))",
"cover ((meta(`landmark`).`id`))"
],
"filter_covers": {
"cover ((`landmark`.`activity`))": "eat"
},
"index": "idx_eat_city_name",
// ...
"spans": [
{
"exact": true,
"range": [ (2)
{
"high": "\"Paris\"",
"inclusion": 3,
"index_key": "`city`",
"low": "\"Paris\""
}
]
// ...
1 | The covers section shows that the query is covered. |
2 | The range section shows that this predicate is applied as a span by the Index service. |
Example 2 adds an extra field to the projection. Now the query is no longer covered, and the Query service needs to fetch documents based on keys returned from the index.
EXPLAIN SELECT city, name, address FROM landmark
WHERE city = "Paris" AND activity = "eat";
This query does not run too slowly, since the predicate city = "Paris"
is pushed down to the Index service.
The index only returns document keys for documents matching the predicate.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"index": "idx_eat_city_name",
"index_id": "e5dbf76060ed1d19",
"index_projection": {
"primary_key": true
},
"keyspace": "landmark",
"namespace": "default",
"scope": "inventory",
"spans": [
{
"exact": true,
"range": [ (1)
{
"high": "\"Paris\"",
"inclusion": 3,
"index_key": "`city`",
"low": "\"Paris\""
}
]
}
],
"using": "gsi"
},
{
"#operator": "Fetch", (2)
"bucket": "travel-sample",
"keyspace": "landmark",
"namespace": "default",
"scope": "inventory"
},
// ...
1 | The range section shows that this predicate is pushed down to the Index service. |
2 | After the pushdown, the Query service fetches documents from the Data service. |
Now let’s consider the case when the filter is not on the index key exactly, but applies a function on top of the index key.
EXPLAIN SELECT city, name, address FROM landmark
WHERE LOWER(city) = "paris" AND activity = "eat";
In this case, the filter cannot be pushed down to the Index service. This is where early filtering comes into play. The Query service can apply the specified filter to the results of the index scan, before fetching the matching documents from the Data service.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"filter": "(lower(_index_key ((`landmark`.`city`))) = \"paris\")", (1)
"index": "idx_eat_city_name",
"index_conditions": { (2)
"_index_condition ((`landmark`.`activity`))": "eat"
},
"index_id": "e5dbf76060ed1d19",
"index_keys": [ (3)
"_index_key ((`landmark`.`city`))",
"_index_key ((meta(`landmark`).`id`))"
],
"index_projection": {
"entry_keys": [ (4)
0
],
"primary_key": true
},
// ...
1 | The early index filters, including all the filters that can be evaluated on the index keys.
Inside the filter, _index_key ((`landmark`.`city`)) is analogous to cover ((`landmark`.`city`)) for a covering index scan. |
2 | The index_conditions section is analogous to the filter_covers section in the plan for a covered query. |
3 | The index_keys section is analogous to the covers section in the plan for a covered query. |
4 | The index_projection section contains the _index_key needed for evaluating the early index filters — in this case position 0 for "city" . |
If necessary, the Query service may use a mixture of index pushdowns and early filtering, as shown in the following example.
EXPLAIN SELECT city, name, address FROM landmark
WHERE city = "Paris" AND name LIKE "%Paris%" AND activity = "eat";
The Index service evaluates the filters used to generate the index spans first, before returning to the Query service. As a result, the early filters only contain the predicates that were not pushed down to the Index service.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"filter": "(_index_key ((`landmark`.`name`)) like \"%Paris%\")", (1)
"index": "idx_eat_city_name",
"index_conditions": {
"_index_condition ((`landmark`.`activity`))": "eat"
},
"index_id": "e5dbf76060ed1d19",
"index_keys": [
"_index_key ((`landmark`.`name`))",
"_index_key ((meta(`landmark`).`id`))"
],
// ...
"spans": [ (2)
{
"range": [
{
"high": "\"Paris\"",
"inclusion": 3,
"index_key": "`city`",
"low": "\"Paris\""
},
{
"high": "[]",
"inclusion": 1,
"index_key": "`name`",
"low": "\"\""
}
]
// ...
1 | The filter name LIKE "%Paris%" does not generate an exact index span, and needs to be evaluated as an early index filter. |
2 | The filter city = "Paris" generates an exact span, and thus can be pushed down to the Index service. |
Early Order and Pagination
Early order and pagination may be applied independently of early filtering. As an illustration, consider the following index.
CREATE INDEX idx_landmark_city_name
ON landmark(city DESC, name);
This index acts as a covering index to Example 5.
EXPLAIN SELECT city, name FROM landmark
WHERE city IS NOT MISSING
ORDER BY city DESC, name OFFSET 100 LIMIT 5;
The covers
section of the query plan shows that the query is covered.
In addition, the ordering, offset, and limit are performed by the Index service.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"covers": [
"cover ((`landmark`.`city`))",
"cover ((`landmark`.`name`))",
"cover ((meta(`landmark`).`id`))"
],
"index": "idx_landmark_city_name",
"index_id": "c2a48a468ef63f23",
"index_order": [ (1)
{
"desc": true,
"keypos": 0
},
{
"keypos": 1
}
],
"index_projection": {
"entry_keys": [
0,
1
]
},
"keyspace": "landmark",
"limit": "5", (2)
"namespace": "default",
"offset": "100", (3)
"scope": "inventory",
// ...
1 | Index order performed by index operator |
2 | Limit performed by index operator |
3 | Offset performed by index operator |
Example 6 adds an extra field to the projection. Now the query is no longer covered.
EXPLAIN SELECT city, name, address FROM landmark
WHERE city IS NOT MISSING
ORDER BY city DESC, name OFFSET 100 LIMIT 5;
However the query runs pretty quickly, since the ordering, offset, and limit can still be pushed down to the Index service.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"index": "idx_landmark_city_name",
"index_id": "c2a48a468ef63f23",
"index_order": [ (1)
{
"desc": true,
"keypos": 0
},
{
"keypos": 1
}
],
"index_projection": {
"primary_key": true
},
"keyspace": "landmark",
"limit": "5", (2)
"namespace": "default",
"offset": "100", (3)
"scope": "inventory",
// ...
1 | Index order pushdown |
2 | Limit pushdown |
3 | Offset pushdown |
However, if we change the order of the ordering terms so that they no longer match the index key order, ordering cannot be pushed down to the Index service.
EXPLAIN SELECT city, name, address FROM landmark
WHERE city IS NOT MISSING
ORDER BY name, city DESC OFFSET 100 LIMIT 5;
So in this case, early ordering, limit, and offset are done by the Query service before fetch.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"bucket": "travel-sample",
"index": "idx_landmark_city_name",
"index_id": "c2a48a468ef63f23",
"index_keys": [
"_index_key ((`landmark`.`city`))",
"_index_key ((`landmark`.`name`))",
"_index_key ((meta(`landmark`).`id`))"
],
"keyspace": "landmark",
"namespace": "default",
"scope": "inventory",
"spans": [
{
"exact": true,
"range": [
{
"inclusion": 1,
"index_key": "`city`",
"low": "null"
}
]
}
],
"using": "gsi"
},
{
"#operator": "Order", (1)
"flags": 1,
"limit": "5", (2)
"offset": "100", (3)
"sort_terms": [
{
"expr": "_index_key ((`landmark`.`name`))"
},
{
"desc": "\"desc\"",
"expr": "_index_key ((`landmark`.`city`))"
}
]
},
// ...
1 | Early ordering |
2 | Early limit |
3 | Early offset |
Similarly, if we change the direction of the sort from descending to ascending, the ordering cannot be pushed down to the Index service, since it does not support reverse index scan.
EXPLAIN SELECT city, name, address FROM landmark
WHERE city IS NOT MISSING
ORDER BY city ASC, name OFFSET 100 LIMIT 5;
In this case also, early ordering, limit, and offset are done by the Query service before fetch.
// ...
{
"#operator": "Order",
"flags": 1,
"limit": "5",
"offset": "100",
"sort_terms": [ (1)
{
"desc": "\"asc\"",
"expr": "_index_key ((`landmark`.`city`))"
},
{
"expr": "_index_key ((`landmark`.`name`))"
}
]
},
// ...
1 | The sort_terms section uses _index_key expressions to sort the results using the index keys. |
Limitations
The optimizer only selects early ordering and pagination when the following conditions are met. Many of these conditions are similar to the conditions for index order pushdowns.
-
The query must be simple, referencing a single keyspace with no joins.
-
There can be no
GROUP BY
or aggregates in the query. -
Index pushdown for
ORDER
,LIMIT
andOFFSET
must not be possible. -
The
ORDER BY
clause andLIMIT
clause must be present. TheOFFSET
clause is optional. -
All expression references in the
ORDER BY
clause must be present in the index — similar to the covering requirement for index pushdown. However there is no requirement for exact index spans. -
Projection alias can be used in the
ORDER BY
clause, if all the references in the aliased expression are available from the index. -
Only simple
IndexScan
is supported for early ordering. More complex index scans such asIntersectScan
,OrderedIntersectScan
,UnionScan
andDistinctScan
are not supported. -
When early ordering is in effect, parallelism is turned off.
Summary
The following table summarizes the different methods of performing filtering, order, or pagination, from most efficient to least.
Covering index | Index pushdown [note] |
Early filters, order, or pagination [note] |
Default filters, order, or pagination [note] |
|
---|---|---|---|---|
Fetch required? |
No |
Yes |
Yes |
Yes |
Filters, order, and pagination |
After scan |
After scan, |
After scan, |
After fetch |
… by: |
Index service |
Index service |
Query service |
Query service |
An uncovered query may use a mixture of these methods. |
Related Links
-
WHERE Clause — to apply a filter
-
ORDER BY Clause — to apply a sort
-
OFFSET Clause and LIMIT Clause — to apply pagination