You are viewing the documentation for a prerelease version.

View Latest

CREATE PRIMARY INDEX

    +

    The CREATE PRIMARY INDEX statement allows you to create a primary index. Primary indexes contain a full set of keys in a given keyspace. Primary indexes are optional and are only required for running ad hoc queries on a keyspace that is not supported by a secondary index.

    CREATE PRIMARY INDEX is by default a synchronous operation, which means that every CREATE PRIMARY INDEX statement blocks until the operation finishes. Index building starts by creating a task that is queued for index build. After this phase, if you lose connectivity, the index build operation continues in the background. You can also run index creation asynchronously by using the defer_build clause. In the asynchronous mode, CREATE PRIMARY INDEX starts a task to create the primary index definition, and returns as soon as the task finishes. You can then build the index using the BUILD INDEX command.

    GSI indexers provide a status field and mark index status pending. With the GSI indexer, index status continues to report pending. This status field and other index metadata can be queried using system:indexes.

    Indexes cannot be built concurrently on a given keyspace unless the defer_build option in the CREATE PRIMARY INDEX statement is used in combination with the BUILD INDEX statement. The following error is reported if a second index creation operation is kicked off before the completion of the ongoing index creation.

    [
      {
        "code": 5000,
        "msg": "BuildIndexes - cause: Build index fails.  Some index will be retried building in the background.  For more details, please check index status.\n",
        "query_from_user": "BUILD INDEX ON ..."
      }
    ]

    You can create multiple identical secondary indexes on a bucket and place them on separate nodes for better index availability. In Couchbase Server Enterprise Edition, the recommended way to do this is using the num_replicas option. In Couchbase Server Community Edition, you need to create multiple identical indexes and place them using the nodes option. Refer to WITH Clause below for more details.

    Prerequisites

    RBAC Privileges

    Users executing the CREATE PRIMARY INDEX statement must have the Query Manage Index privilege granted on the keyspace. For more details about user roles, see Authorization.

    Syntax

    create-primary-index ::= CREATE PRIMARY INDEX [ index-name ] ON keyspace-ref [ index-using ] [ index-with ]
    'CREATE' 'PRIMARY' 'INDEX' index-name? 'ON' keyspace-ref index-using? index-with?
    index-name

    [Optional] A unique name that identifies the index. If a name is not specified, the default name of #primary is applied.

    Valid GSI index names can contain any of the following characters: A-Z a-z 0-9 # _, and must start with a letter, [A-Z a-z]. The minimum length of an index name is 1 character and there is no maximum length set for an index name. When querying, if the index name contains a # or _ character, you must enclose the index name within backticks.

    Unnamed primary indexes are dropped by using the DROP PRIMARY INDEX statement, and named primary indexes are dropped by using the DROP INDEX statement.

    Keyspace Reference

    keyspace-ref ::= keyspace-path | keyspace-partial
    keyspace-path | keyspace-partial

    Specifies the keyspace for which the index needs to be created.

    If there is a hyphen (-) inside any part of the keyspace reference, you must wrap that part of the keyspace reference in backticks (` `). Refer to the examples below.

    Keyspace Path

    keyspace-path ::= [ namespace ':' ] bucket [ '.' scope '.' collection ]
    ( namespace ':' )? bucket ( '.' scope '.' collection )?

    If the keyspace is a named collection, or the default collection in the default scope within a bucket, the keyspace reference may be a keyspace path. In this case, the query context should not be set.

    namespace

    (Optional) An identifier that refers to the namespace of the keyspace. Currently, only the default namespace is available. If the namespace name is omitted, the default namespace in the current session is used.

    bucket

    (Required) An identifier that refers to the bucket name of the keyspace.

    scope

    (Optional) An identifier that refers to the scope name of the keyspace. If omitted, the bucket’s default scope is used.

    collection

    (Optional) An identifier that refers to the collection name of the keyspace. If omitted, the default collection in the bucket’s default scope is used.

    For example, default:`travel-sample` indicates the default collection in the default scope in the travel-sample bucket in the default namespace.

    Similarly, default:`travel-sample`.inventory.airline indicates the airline collection in the inventory scope in the travel-sample bucket in the default namespace.

    Keyspace Partial

    keyspace-partial ::= collection
    collection

    Alternatively, if the keyspace is a named collection, the keyspace reference may be just the collection name with no path. In this case, you must set the query context to indicate the required namespace, bucket, and scope.

    collection

    (Required) An identifier that refers to the collection name of the keyspace.

    For example, airline indicates the airline collection, assuming the query context is set.

    USING Clause

    index-using ::= USING GSI
    'USING' 'GSI'

    In Couchbase Server 6.5 and later, the index type for a primary index must be Global Secondary Index (GSI). The USING GSI keywords are optional and may be omitted.

    WITH Clause

    index-with ::= WITH expr
    'WITH' expr

    Use the WITH clause to specify additional options.

    expr

    An object with the following properties:

    nodes

    [Optional] An array of strings, each of which represents a node name.

    In Couchbase Server Community Edition, a single primary index of type GSI can be placed on a single node that runs the indexing service. The nodes property allows you to specify the node that the index is placed on. If nodes is not specified, one of the nodes running the indexing service is randomly picked for the index.

    In Couchbase Server Enterprise Edition, you can specify multiple nodes to distribute replicas of an index across nodes running the indexing service, for example:

    CREATE PRIMARY INDEX ON keyspace_name USING GSI
    WITH {"nodes":["node1:8091", "node2:8091", "node3:8091"]};

    If specifying both nodes and num_replica, the number of nodes in the array must be one greater than the specified number of replicas otherwise the index creation will fail.

    If nodes is not specified, then the system chooses nodes on which to place the new index and any replicas, in order to achieve the best resource utilization across nodes running the indexing service. This is done by taking into account the current resource usage statistics of index nodes.

    A node name passed to the nodes property must include the cluster administration port, by default 8091. For example WITH {"nodes": ["192.0.2.0:8091"]} instead of WITH {"nodes": ["192.0.2.0"]}.
    defer_build

    [Optional] Boolean.

    true

    When set to true, the CREATE PRIMARY INDEX operation queues the task for building the index but immediately pauses the building of the index of type GSI. Index building requires an expensive scan operation. Deferring building of the index with multiple indexes can optimize the expensive scan operation. Admins can defer building multiple indexes and, using the BUILD INDEX statement, multiple indexes to be built efficiently with one efficient scan of keyspace data.

    false

    When set to false, the CREATE PRIMARY INDEX operation queues the task for building the index and immediately kicks off the building of the index of type GSI.

    num_replica

    This property is only available in Couchbase Server Enterprise Edition.

    [Optional] Integer that specifies the number of replicas of the index to create.

    The indexer will automatically distribute these replicas amongst index nodes in the cluster for load-balancing and high availability purposes. The indexer will attempt to distribute the replicas based on the server groups in use in the cluster where possible.

    If the value of this property is not less than the number of index nodes in the cluster, then the index creation will fail.

    Usage

    Primary Scan Timeout

    For a primary index scan on any keyspace size, the query engine guarantees that the client is not exposed to scan timeout if the indexer throws a scan timeout after it has returned a greater than zero sized subset of primary keys. To complete the scan, the query engine performs successive scans of the primary index until all the primary keys have been returned. It is possible that the indexer throws scan timeout without returning any primary keys, and in this event the query engine returns scan timeout to the client.

    For example, if the indexer cannot find a snapshot that satisfies the consistency guarantee of the query within the timeout limit, it will timeout without returning any primary keys.

    For secondary index scans, the query engine does not handle scan timeout, and returns index scan timeout error to the client. You can handle scan timeout on a secondary index by increasing the indexer timeout setting (See Query Settings) or preferably by defining and using a more selective index.

    Examples

    Default Collection

    The following example creates a primary index on the default collection in the default scope within the travel-sample bucket. First make sure the query context is not set.

    • Query Workbench

    • CBQ Shell

    The query context drop-down menu, with 'bucket.scope' selected
    cbq> \UNSET -query_context;
    Example 1. Create a primary index

    Create a named primary index on the travel-sample keyspace.

    CREATE PRIMARY INDEX idx_default_primary ON `travel-sample` USING GSI;
    Query Context

    The following example is similar to Example 1, but creates a primary index on the airport collection. First set the query context to `travel-sample`.inventory.

    • Query Workbench

    • CBQ Shell

    The query context drop-down menu, with 'travel-sample.inventory' selected
    cbq> \SET -query_context 'travel-sample.inventory';
    Example 2. Create a primary index on a collection with query context

    Create a named primary index on the airport collection.

    CREATE PRIMARY INDEX idx_airport_primary ON airport USING GSI;
    Named Collection

    In each of the examples that follow, the path to the required keyspace is specified by the query, so you do not need to set the query context.

    Example 3. Create a deferred primary index

    Create a named primary index using the defer_build option.

    CREATE PRIMARY INDEX idx_hotel_primary
      ON `travel-sample`.inventory.hotel
      USING GSI
      WITH {"defer_build":true};

    Query system:indexes for the status of the index.

    SELECT * FROM system:indexes WHERE name="idx_hotel_primary";

    The output from system:indexes shows the idx_hotel_primary in the pending state ("state": "deferred").

    Example 4. Build a deferred primary index

    Kick off the deferred build on the named primary index.

    BUILD INDEX ON `travel-sample`.inventory.hotel(idx_hotel_primary) USING GSI;

    Query system:indexes for the status of the index.

    SELECT * FROM system:indexes WHERE name="idx_hotel_primary";

    The output from system:indexes shows that the index has now been created.