A newer version of this documentation is available.

View Latest

Global Secondary Indexes for N1QL

Global Secondary Index (GSI) supports a variety of OLTP like use-cases for N1QL including basic, ad-hoc, and short-running reporting queries that require filtering. For example, if you have a WHERE clause in a N1QL statement that selects a subset of your data on which the secondary index is defined, you can see a significant speedup in the N1QL query performance by using GSI.
Definition

You can define a primary or secondary index using GSIs in N1QL using the CREATE INDEX statement.

Distribution

GSIs reside on the index nodes in the cluster. Every index has an index key (used for lookup), and a partition key (used for sharding). Based on the partition key, the index is partitioned across all the index nodes. Index metadata stored on the index node knows about the distribution of the index. GSI does not use scatter-gather. Instead, based on the index metadata, it only touches the nodes that have the relevant data.

Latency

GSIs provide a lower latency compared to view indexes. For the storage layer, GSI uses ForestDB. ForestDB is Couchbase’s state-of-the-art storage engine with a modified data structure to increase read and write performance. ForestDB is written with multi-core processors and solid state drives in mind. GSIs leverage ForestDB’s buffer cache to provide low latency read-write operations.

Scaling

Couchbase Server scales indexes independent of data and queries. With multidimensional scaling, you can allocate separate hardware resources for separate services, and avoid resource contention by performing queries, maintaining indexes, and writing data to different nodes. If your application needs more indexing resources, you can either scale out your infrastructure to add more index nodes, or scale up the index services to handle more workload.

Load-balancing

To load-balance GSIs, you must manually specify the nodes on which indexes should be built on and alternatively partition based on some range. If there are replica indexes (or indexes with the same definition) available, the N1QL query engine uses the round-robin algorithm between the replicas to distribute the indexing load.

For example, create two indexes productName_index1 and productName_index2, using the following commands:

      CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type="product" USING GSI WITH {"nodes":["node1:8091"]};

      CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) WHERE type="product" USING GSI WITH {"nodes":["node2:8091"]};

The indexing load will be distributed equally between the indexes productName_index1 and productName_index2.

You can also create indexes by partitioning your indexes as shown in the following example.

     CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type="product" AND productName BETWEEN "A" AND "K" USING GSI WITH {"nodes":["node1:8091"]};

     CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) WHERE type="product" AND productName BETWEEN "K" AND "Z" USING GSI WITH {"nodes":["node2:8091"]};
Consistency

You can specify a query consistency flag per request, similar to the view API. The query consistency flag can be one of the following:

  • not_bounded (stale=ok)

    This value has the same characteristics as stale=ok in the view API.

  • request_plus (stale=false)

    This value has the same characteristics as stale=false in the view API.

When using GSI, the default consistency setting is not_bounded.

Replication

GSIs are not automatically replicated. They must be defined on other indexing nodes for high-availability.

To create a replica of a GSI, you can create an identical index definition with unique index names under 2 nodes. Queries will load balance across the indexes and if one of the indexes become unavailable, all requests are rerouted to the available remaining index. You can create more than 2 copies of the index for better redundancy and load balancing.

CREATE INDEX productName_index1 ON bucket_name(productName, ProductID)
     WHERE type="product"
     USING GSI
     WITH {"nodes":["node1:8091"]};

CREATE INDEX productName_index2 ON bucket_name(productName, ProductID)
     WHERE type="product"
     USING GSI
     WITH {"nodes":["node2:8091"]};