A newer version of this documentation is available.

View Latest

Adaptive Index

      +

      Adaptive Indexes are a special type of GSI array index that can index all or specified fields of a document. Such an index is generic in nature, and it can efficiently index and lookup any of the index-key values. This enables efficient ad hoc queries (that may have WHERE clause predicates on any of the index-key fields) without requiring to create various composite indexes for different combinations of fields. Adaptive Index is a functional array index created using the N1QL function PAIRS().

      Basically, the idea is to be able to simply load data and start querying:

      • using a single secondary index, and

      • not worrying about creating appropriate secondary indexes for each query.

      Note that without Adaptive Indexes:

      • Only primary index can help run any ad hoc query. But using primary index can be expensive for queries with predicates on any of the non-key fields of the document.

      • Each query will need a compatible secondary index that can qualify for the predicates in the WHERE clause. See section Contrast with Composite Indexes for details.

      For instance, consider a user profile or hotel reservation search use case. A person’s profile may need to be searched based on any of the personal attributes such as first name, last name, age, city, address, job, title, company, etc. Similarly, a hotel room availability may be searched based on wide criteria, such as room facilities, amenities, price, and other features. In this scenario, traditional secondary indexes or composite indexes can’t be used effectively (see section Contrast with Composite Indexes to understand some of the concerns). Adaptive indexes can help effectively and efficiently run such ad hoc search queries.

      Syntax

      create-index ::= CREATE INDEX index-name ON keyspace-ref '(' index-key [ index-order ] [ ','
                       index-key [ index-order ] ]* ')' [ where-clause ] [ index-using ] [ index-with ]
      'CREATE' 'INDEX' index-name 'ON' keyspace-ref '(' index-key index-order? ( ',' index-key index-order? )* ')' where-clause? index-using? index-with?

      An adaptive index is a type of array index. To create an adaptive index, the overall syntax is the same as for an array index.

      index-name

      Specify a unique name to identify the index.

      Keyspace Reference

      keyspace-ref ::= keyspace-path | keyspace-partial
      keyspace-path | keyspace-partial
      keyspace-path ::= [ namespace ':' ] bucket [ '.' scope '.' collection ]
      ( namespace ':' )? bucket ( '.' scope '.' collection )?
      keyspace-partial ::= collection
      collection

      The simple name or fully-qualified name of the keyspace on which to create the index. Refer to the CREATE INDEX statement for details of the syntax.

      Index Key

      index-key ::= ( ALL | DISTINCT ) pairs-function
      ( 'ALL' | 'DISTINCT' ) pairs-function

      To create an adaptive index, the index key must be a simple array expression containing a PAIRS() function.

      PAIRS() Function

      pairs-function ::= PAIRS '(' SELF | index-key-object ')'
      'PAIRS' '(' ( 'SELF' | index-key-object ) ')'

      When the SELF keyword is used, the adaptive index is created with all fields in the documents of the keyspace.

      If you want to create an adaptive index on selected fields only, you must specify an index key object.

      Index Key Object

      index-key-object ::= '{' ['"' name '"' ':'] expr [',' ['"' name '"' ':'] expr ]* '}'
      '{' ('"' name '"' ':')? expr (',' ('"' name '"' ':')? expr )* '}'

      This is an object of name-value pairs of the document fields that should be indexed.

      name

      The field name that corresponds to expr.

      expr

      A N1QL expression that is allowed in CREATE INDEX. This must be an expression over any document fields.

      In a simplified form, the names in the object can be omitted, in which case the corresponding names will be same as expressions, and the expressions must be field names in the document that are being indexed.

      When using PAIRS() with an OBJECT construction, you need to keep in mind:

      • {a, c.a} – when evaluated, both will inherit the same name of a causing one value to overwrite the other. Both values will not be indexed. A better way to handle this is to name one with an alias explicitly, such as {a, "ca":c.a}.

      • {abs(a)} – name of the object field is null and will raise an error. A better way to handle this is to use an alias explicitly, such as {"abs_a":abs(a)}.

      Index Order

      index-order ::= ASC | DESC
      'ASC' | 'DESC'

      Specifies the sort order of the index key.

      ASC

      The index key is sorted in ascending order.

      DESC

      The index key is sorted in descending order.

      This clause is optional; if omitted, the default is ASC.

      WHERE Clause

      where-clause ::= WHERE cond
      'WHERE' cond
      cond

      Specifies WHERE clause predicates to qualify the subset of documents to include in the index.

      USING Clause

      index-using ::= USING GSI
      'USING' 'GSI'

      The index type for an adaptive index must be Global Secondary Index (GSI). The USING GSI keywords are optional and may be omitted.

      WITH Clause

      index-with ::= WITH expr
      'WITH' expr

      Use the WITH clause to specify additional options.

      expr

      An object specifying additional options for the query.

      See the CREATE INDEX statement for more details on the syntax.

      Examples

      For example, consider the travel-sample data that is shipped with the product and the following indexes. Sample buckets can be installed as explained here.

      C1
      CREATE INDEX `def_inventory_airport_airportname`
      ON `travel-sample`.`inventory`.`airport`(`airportname`) WITH { "defer_build":true }
      C2
      CREATE INDEX `def_inventory_airport_city`
      ON `travel-sample`.`inventory`.`airport`(`city`) WITH { "defer_build":true }
      C3
      CREATE INDEX `def_inventory_airport_faa`
      ON `travel-sample`.`inventory`.`airport`(`faa`) WITH { "defer_build":true }

      Here, three different indexes need to be created to help different queries whose WHERE clause predicates may refer to different fields. For instance, the following queries Q1, Q2, and Q3 will use the indexes created in C1, C2, and C3, respectively:

      Q1
      SELECT * FROM `travel-sample`.inventory.airport WHERE airportname LIKE "San Francisco%";
      Q2
      SELECT * FROM `travel-sample`.inventory.airport WHERE city = "San Francisco";
      Q3
      SELECT * FROM `travel-sample`.inventory.airport WHERE faa = "SFO";

      However, the following single adaptive index C4 can serve all three of the following queries Q1A, Q2A, and Q3A:

      C4
      CREATE INDEX `ai_airport_day_faa`
      ON `travel-sample`.inventory.airport(DISTINCT PAIRS({airportname, city, faa, type}));
      Q1A
      SELECT * FROM `travel-sample`.inventory.airport
      USE INDEX (ai_airport_day_faa)
      WHERE airportname LIKE "San Francisco%";
      Q2A
      SELECT * FROM `travel-sample`.inventory.airport
      USE INDEX (ai_airport_day_faa)
      WHERE city = "San Francisco";
      Q3A
      SELECT * FROM `travel-sample`.inventory.airport
      USE INDEX (ai_airport_day_faa)
      WHERE faa = "SFO";

      Similarly, the following adaptive index over SELF in C5 is also qualified for these queries. In fact, an adaptive index that includes all fields in the documents can serve any query on the keyspace, though it might have different performance characteristics when compared to specific indexes created for a particular query. See the section Performance Implications for details. For example, the following queries Q5 and Q5A show how the generic adaptive index C5 is used to query predicates on different fields of the "airport" documents.

      C5
      CREATE INDEX `ai_self`
      ON `travel-sample`.inventory.airport(DISTINCT PAIRS(self));
      Q5
      EXPLAIN SELECT * FROM `travel-sample`.inventory.airport
      USE INDEX (ai_self)
      WHERE faa = "SFO";
      Result
      [
        {
          "plan": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "DistinctScan",
                "scan": {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "index": "ai_self",
                  "index_id": "1243095ed73061b5",
                  "index_projection": {
                    "primary_key": true
                  },
                  "keyspace": "airport",
                  "namespace": "default",
                  "scope": "inventory",
                  "spans": [
                    {
                      "exact": true,
                      "range": [
                        {
                          "high": "[\"faa\", \"SFO\"]",
                          "inclusion": 3,
                          "low": "[\"faa\", \"SFO\"]"
                        }
                      ]
                    }
                  ],
                  "using": "gsi"
                }
      // ...
      Q5A
      EXPLAIN SELECT *
      FROM `travel-sample`.inventory.airport
      USE INDEX (ai_self)
      WHERE tz = "Europe/Paris";
      Result
      [
        {
          "plan": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "DistinctScan",
                "scan": {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "index": "ai_self",
                  "index_id": "1243095ed73061b5",
                  "index_projection": {
                    "primary_key": true
                  },
                  "keyspace": "airport",
                  "namespace": "default",
                  "scope": "inventory",
                  "spans": [
                    {
                      "exact": true,
                      "range": [
                        {
                          "high": "[\"tz\", \"Europe/Paris\"]",
                          "inclusion": 3,
                          "low": "[\"tz\", \"Europe/Paris\"]"
                        }
                      ]
                    }
                  ],
                  "using": "gsi"
                }
      // ...

      Contrast with Composite Indexes

      Traditionally, composite secondary indexes are used to create indexes with multiple index keys. For example, consider the following index in C6:

      C6
      CREATE INDEX `idx_city_faa_airport`
      ON `travel-sample`.inventory.airport(city, faa, airportname);

      Such composite indexes are very different from the adaptive index in C4 in many ways:

      1. Order of index keys is vital for composite indexes. When an index key is used in the WHERE clause, all prefixing index keys in the index definition must also be specified in the WHERE clause. For example, to use the index C6, a query to "find details of airports with FAA code SFO", must specify the prefixing index key city also in the WHERE clause just to qualify the index C6. Contrast the following query Q6 with Q3 above that uses the adaptive index in C3.

        Q6
        SELECT * FROM `travel-sample`.inventory.airport
        WHERE faa = "SFO"
        AND city IS NOT MISSING;

        The problem is not just the addition of an extraneous predicate, but the performance. The predicate on the first index key city IS NOT MISSING is highly selective (i.e. most of the index entries in the index will match it) and hence, it will result in almost a full index scan.

      2. Complication in Queries. If a document has many fields to index, then the composite index will end up with all those fields as index keys. Subsequently, queries that only need to use index keys farther in the index key order will need many unnecessary predicates referring to all the preceding index keys. For example, if the index is:

        CREATE INDEX idx_name ON `travel-sample`(field1, field2, ..., field9);

        A query that has a predicate on field9 will get unnecessarily complicated, as it needs to use all preceding index keys from field1 to field8.

      3. Explosion of number of indexes for ad hoc queries. At some point, it becomes highly unnatural and overly complicated to write ad hoc queries using composite indexes. For instance, consider a user profile or inventory search use case where a person or item may need to be searched based on many criteria.

        One approach is to create indexes on all possible attributes. If that query can include any of the attributes, then it may require creation of innumerable indexes. For example, a modest 20 attributes will result in 20 factorial (2.43×1018) indexes in order to consider all combinations of sort orders of the 20 attributes.

      Partial Adaptive Indexes

      An adaptive index may also be a partial index. For a partial adaptive index, you must ensure that any fields filtered by the WHERE clause in the index definition are also referenced by the PAIRS() function.

      For example, the following query Q7 cannot select the index defined in C7A.

      C7A
      CREATE INDEX ai_geo ON `travel-sample`.inventory.landmark
      (DISTINCT PAIRS({geo.alt, geo.lat, geo.lon}))
      WHERE activity = "see"; (1)
      Q7
      EXPLAIN SELECT META(t).id FROM `travel-sample`.inventory.landmark t
      WHERE t.geo.alt > 1000 AND t.activity = "see";
      Result
      [
        {
          "plan": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "PrimaryScan3",
                "as": "t",
                "bucket": "travel-sample",
                "index": "def_inventory_landmark_primary", (2)
      // ...
      1 The WHERE clause filters on activity, but the PAIRS() function does not include the activity field.
      2 The query does not use the incorrectly-defined partial adaptive index.

      However, the same query Q7 does select the partial adaptive index defined in C7B.

      C7B
      CREATE INDEX ai_geo_activity ON `travel-sample`.inventory.landmark
      (DISTINCT PAIRS({geo.alt, geo.lat, geo.lon, activity}))
      WHERE activity = "see"; (1)
      Result
      [
        {
          "plan": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "IntersectScan",
                "scans": [
                  {
                    "#operator": "DistinctScan",
                    "scan": {
                      "#operator": "IndexScan3",
                      "as": "t",
                      "bucket": "travel-sample",
                      "index": "ai_geo_activity", (2)
                      "index_id": "29640ebd837e32fb",
                      "index_projection": {
                        "primary_key": true
                      },
                      "keyspace": "landmark",
                      "namespace": "default",
                      "scope": "inventory",
                      "spans": [
                        {
                          "exact": true,
                          "range": [
                            {
                              "high": "[successor(\"alt\")]",
                              "inclusion": 0,
                              "low": "[\"alt\", 1000]"
                            }
                          ]
                        }
                      ],
                      "using": "gsi"
                    }
                  },
                  {
                    "#operator": "DistinctScan",
                    "scan": {
                      "#operator": "IndexScan3",
                      "as": "t",
                      "bucket": "travel-sample",
                      "index": "ai_geo_activity",
                      "index_id": "29640ebd837e32fb",
                      "index_projection": {
                        "primary_key": true
                      },
                      "keyspace": "landmark",
                      "namespace": "default",
                      "scope": "inventory",
                      "spans": [
                        {
                          "exact": true,
                          "range": [
                            {
                              "high": "[\"activity\", \"see\"]",
                              "inclusion": 3,
                              "low": "[\"activity\", \"see\"]"
                            }
      // ...
      1 The WHERE clause filters on activity, and the PAIRS() function includes the activity field.
      2 The query does an IntersectScan, including the correct partial adaptive index.

      Alternatively, you can use the SELF keyword to ensure that the fields used in the WHERE clause are included in the PAIRS() function. Refer to C5 for an example.

      An IntersectScan does not eliminate redundant queries, and this may impact performance. Refer to Performance Implications for details.

      Performance Implications

      While Adaptive Indexes are very useful, there are performance implications you need to keep in mind:

      1. If a query is not covered by a regular index, then an unnested index will not have any elimination of redundant indexes; and it will instead do an IntersectScan on all the indexes, which can impact performance.

        CREATE INDEX idx_name ON `travel-sample`.inventory.hotel(name); (1)
        CREATE INDEX idx_self ON `travel-sample`.inventory.hotel(DISTINCT PAIRS(self)); (2)
        EXPLAIN SELECT * FROM `travel-sample`.inventory.hotel WHERE name IS NOT NULL;
        Results
        [
          {
            "plan": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IntersectScan", (3)
                  "scans": [
        // ...
        1 Index on the name field.
        2 Adaptive index on the whole document.
        3 IntersectScan of idx_name AND idx_self.

        Here’s another example with a partial Adaptive Index that uses IntersectScan on the index conditions:

        CREATE INDEX idx_adpt ON `travel-sample`.inventory.landmark(DISTINCT PAIRS(self))
        WHERE city="Paris";
        
        CREATE INDEX idx_reg1 ON `travel-sample`.inventory.landmark(name) WHERE city="Paris";
        
        CREATE INDEX idx_reg2 ON `travel-sample`.inventory.landmark(city);
        
        SELECT * FROM `travel-sample`.inventory.landmark
        WHERE city="Paris" AND name IS NOT NULL;

        The above query requires only a regular index, so it uses index idx_reg1 and ignores index idx_reg2. When the adaptive index idx_adpt has only the clause city="Paris" and is used with the above query, then index idx_adpt will still use IntersectScan. Here, we have only a single adaptive index instead of a reduction in the number of indexes. To fix this, you may need to remove the index condition from the predicate while spanning generations.

      Functional Limitations

      It is important to understand that adaptive indexes are not a panacea and that they have trade-offs compared to traditional composite indexes:

      1. Adaptive Indexes are bound to the limitations of Array Indexes because they are built over Array Indexing technology. Index Joins can’t use Adaptive Indexes because Index Joins can’t use array indexes, and Adaptive Index is basically an array index.

      2. Indexed entries of the Adaptive Index are typically larger in size compared to the simple index on respective fields because the indexed items are elements of the PAIRS() array, which are basically name-value pairs of the document fields. So, it may be relatively slower when compared with equivalent simple index. For example, in the following equivalent queries, C8/Q8 may perform better than C9/Q9.

        This example uses the def_inventory_hotel_city index, which is installed with the travel-sample bucket.

        C8
        CREATE INDEX `def_inventory_hotel_city`
        ON `travel-sample`.`inventory`.`hotel`(`city`) WITH { "defer_build":true };
        Q8
        EXPLAIN SELECT city FROM `travel-sample`.inventory.hotel
        USE INDEX (def_inventory_hotel_city)
        WHERE city = "San Francisco";
        Result
        {
          "plan": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "IndexScan3",
                "bucket": "travel-sample",
                "covers": [
                  "cover ((`hotel`.`city`))",
                  "cover ((meta(`hotel`).`id`))"
                ],
                "filter": "(cover ((`hotel`.`city`)) = \"San Francisco\")",
                "index": "def_inventory_hotel_city",
                "index_id": "581febfa2f2a8923",
                "index_projection": {
                  "entry_keys": [
                    0
                  ]
                },
                "keyspace": "hotel",
                "namespace": "default",
                "scope": "inventory",
                "spans": [
                  {
                    "exact": true,
                    "range": [
                      {
                        "high": "\"San Francisco\"",
                        "inclusion": 3,
                        "low": "\"San Francisco\""
                      }
                    ]
                  }
                ],
                "using": "gsi"
              },
        // ...
        C9
        CREATE INDEX `ai_city` ON `travel-sample`.inventory.hotel(DISTINCT PAIRS({city}));
        Q9
        EXPLAIN SELECT city FROM `travel-sample`.inventory.hotel
        USE INDEX (ai_city)
        WHERE city = "San Francisco";
        Result
        {
          "plan": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "DistinctScan",
                "scan": {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "index": "ai_city",
                  "index_id": "64e238e4686486d2",
                  "index_projection": {
                    "primary_key": true
                  },
                  "keyspace": "hotel",
                  "namespace": "default",
                  "scope": "inventory",
                  "spans": [
                    {
                      "exact": true,
                      "range": [
                        {
                          "high": "[\"city\", \"San Francisco\"]",
                          "inclusion": 3,
                          "low": "[\"city\", \"San Francisco\"]" (1)
                        }
                      ]
                    }
                  ],
                  "using": "gsi"
                }
              },
        // ...
        1 Note how the index key values are represented in the spans.
      3. Adaptive index requires more storage and memory, especially in case of Memory Optimized Indexes.

        1. The size of the index and the number of indexed items in an Adaptive Index grow rapidly with the number of fields in the documents, as well as, with the number of different values for various fields in the documents or keyspace.

        2. Moreover, if the documents have nested sub-objects, then the adaptive index will index the sub-documents and related fields at each level of nesting.

        3. Similarly, if the documents have array fields, then each of array elements are explored and indexed.

        4. For example, the following queries show that a single route document in travel-sample generates 103 index items and that all route documents produce ~2.3 million items.

          SELECT array_length(PAIRS(self)) FROM `travel-sample`.inventory.route
          LIMIT 1;
          Result
          [
            {
              "$1": 103
            }
          ]
          SELECT sum(array_length(PAIRS(self))) FROM `travel-sample`.inventory.route
          LIMIT 1;
          Result
          [
            {
              "$1": 2285464
            }
          ]

        So, the generic adaptive indexes (with SELF) should be employed carefully. Whenever applicable, it is recommended to use the following techniques to minimize the size and scope of the adaptive index:

        • Instead of SELF, use selective adaptive indexes by specifying the field names of interest to the PAIRS() function. For examples, refer to C4, Q1, Q2, and Q3 above.

        • Use partial adaptive indexes with a WHERE clause that will filter the number of documents that will be indexed. For examples, refer to C5, Q5, and Q5A above.

      4. A generic adaptive index (on SELF) will be qualified for all queries on the keyspace. So, when using with other GSI indexes, this will result in more IntersectScan operations for queries that qualify other non-adaptive indexes. This may impact query performance and overall load on query and indexer nodes. To alleviate the negative effects, you may want to specify the USE INDEX clause in SELECT queries whenever possible.

      5. Adaptive Indexes cannot be used as Covered Indexes for any queries. See example Q9 above.

      6. Adaptive Indexes can be created only on document field identifiers, not on functional expressions on the fields. For example, the following query uses a default index, such as def_inventory_hotel_city, instead of the specified adaptive index ai_city1:

        CREATE INDEX `ai_city1`
        ON `travel-sample`.inventory.hotel(DISTINCT PAIRS({"city" : LOWER(city)}));
        EXPLAIN SELECT city FROM `travel-sample`.inventory.hotel
        USE INDEX (ai_city1)
        WHERE LOWER(city) = "san francisco";
        Result
        [
          {
            "plan": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "covers": [
                    "cover ((`hotel`.`city`))",
                    "cover ((meta(`hotel`).`id`))"
                  ],
                  "filter": "(lower(cover ((`hotel`.`city`))) = \"san francisco\")",
                  "index": "def_inventory_hotel_city", (1)
                  "index_id": "581febfa2f2a8923",
        // ...
        1 This does not use our specified ai_city1 index because it’s a functional index expression on the field city.
      7. Adaptive Indexes do not work with NOT LIKE predicates with a leading wildcard (see MB-23981). For example, the following query also uses a default index, such as def_city, instead of the specified adaptive index ai_city. However, it works fine for LIKE predicates with a leading wildcard.

        EXPLAIN SELECT city FROM `travel-sample`.inventory.hotel
        USE INDEX (ai_city)
        WHERE city NOT LIKE "%Francisco";
        Result
        [
          {
            "plan": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "covers": [
                    "cover ((`hotel`.`city`))",
                    "cover ((meta(`hotel`).`id`))"
                  ],
                  "filter": "(not (cover ((`hotel`.`city`)) like \"%Francisco\"))",
                  "index": "def_inventory_hotel_city", (1)
                  "index_id": "581febfa2f2a8923",
        // ...
        EXPLAIN SELECT city FROM `travel-sample`.inventory.hotel
        USE INDEX (ai_city)
        WHERE city LIKE "%Francisco";
        Result
        [
          {
            "plan": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "DistinctScan",
                  "scan": {
                    "#operator": "IndexScan3",
                    "bucket": "travel-sample",
                    "index": "ai_city", (2)
                    "index_id": "64e238e4686486d2",
        // ...
        1 Doesn’t use ai_city with NOT LIKE and leading wildcard.
        2 Uses ai_city with LIKE and leading wildcard.
      8. Adaptive indexes can’t use Covered Scans. An adaptive index can’t be a covering index, as seen in the following example:

        CREATE INDEX `ai_city2`
        ON `travel-sample`.inventory.hotel(DISTINCT PAIRS({"city" : city}));
        EXPLAIN SELECT city FROM `travel-sample`.inventory.hotel
        WHERE city = "San Francisco"; (1)
        Result
        [
          {
            "plan": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "IndexScan3",
                  "bucket": "travel-sample",
                  "covers": [
                    "cover ((`hotel`.`city`))",
                    "cover ((meta(`hotel`).`id`))"
                  ],
                  "filter": "(cover ((`hotel`.`city`)) = \"San Francisco\")",
                  "index": "def_inventory_hotel_city", (2)
                  "index_id": "581febfa2f2a8923",
        // ...
        1 No index specified in query.
        2 Doesn’t use ai_city2 as a covering index.