Rebalance

Rebalance redistributes data, indexes, event processing, and query processing among available nodes.

Understanding Rebalance

When one or more nodes have been brought into a cluster (either by adding or joining), or have been taken out of a cluster (either through Removal or Failover), rebalance redistributes data, indexes, event processing, and query processing among available nodes. The cluster map is correspondingly updated and distributed to clients. The process occurs while the cluster continues to service requests for data.

See Cluster Manager, for information on the cluster map. See Manage Nodes and Clusters, for practical examples of using rebalance.

Bucket Rank

In Couchbase Server Version 7.6 and later, each bucket on the cluster (Couchbase or Ephemeral) can be assigned a rank. The value is an integer from 0 (the default) to 1000, inclusive. Buckets with higher ranks are handled by the rebalance process before buckets with lower ranks. For example, if a cluster hosts four buckets, which are named A, B, C, and D; and bucket A is explicitly assigned a rank of 10, while buckets B, C, and D are left with the default rank of 0; when rebalance occurs, the vBuckets for bucket A are addressed first; then, vBuckets for the other buckets are addressed, with the Cluster Manager making determinations as to the appropriate handling-order for those other buckets.

This assignment of rank allows a cluster’s most mission-critical data to be rebalanced with top priority.

Bucket rank can be established with either the CLI (see bucket-create and bucket-edit) or the REST API (see Creating and Editing Buckets).

Rebalance Stages

Each rebalance proceeds in sequential stages. Each stage corresponds to a Couchbase Service, deployed on the cluster. Therefore, if all services have been deployed, there are seven stages in all — one each for the Data, Query, Index, Search, Eventing, Backup, and Analytics services. When all stages have been completed, the rebalance process itself is complete.

Rebalance and the Data Service

On rebalance, vBuckets are redistributed evenly among currently available Data Service nodes. After rebalance, operations are directed to active vBuckets in their updated locations. Rebalance does not interrupt applications' data-access. vBucket data-transfer occurs sequentially: therefore, if rebalance stops for any reason, it can be restarted from the point at which it was stopped.

Note the special case provided by Swap Rebalance, where the number of nodes coming into the cluster is equal to the number of nodes leaving the cluster, ensuring that data is only moved between these nodes.

If nodes have been removed such that the desired number of replicas can no longer be supported, rebalance provides as many replicas as possible. For example, if four Data Service nodes previously supported one bucket with three replicas, and the Data Service node-count is reduced to three, rebalance provides two replicas only. If and when the missing Data Service node is restored or replaced, rebalance will provide three replicas again.

See Intra-Cluster Replication, for information on how data is distributed across nodes.

Data-Service Rebalance Phases

During the Data Service rebalance stage, vBuckets are moved in phases. The phases — which differ, depending on whether the vBucket is an active or a replica vBucket — are described below.

Rebalance Phases for Replica vBuckets

The phases through which rebalance moves a replica vBucket are shown by the following illustration.

The move has two principal phases. Phase 1 is Backfill. Phase 2 is Book-keeping.

Phase 1, Backfill, itself consists of two subphases. The first subphase comprises the movement of the replica vBucket data from its node of origin to the memory of the destination node. The second subphase comprises the writing of the replica vBucket data from the memory to the disk of the destination node. The time required for this second subphase, which only applies to Couchbase Buckets, is termed Persistence Time. The time required for the entire Backfill process, including Persistence Time, is termed Backfill Time.

Phase 2, Book-keeping, comprises various ancillary tasks required for move-completion.

The total time required for the move is calculated by adding Backfill Time to the time required for Phase 2, Book-keeping; and is termed Move Time.

Rebalance Phases for Active vBuckets

The phases in which rebalance moves an active vBucket are shown by the following illustration.

The move has four principal phases. Phase 1, Backfill, and Phase 2, Book-keeping, are identical to those required for replica vBuckets; except that the Book-keeping phase includes additional Persistence Time.

Phase 3, Active Takeover, comprises the operations required to establish the relocated vBucket as the new active copy. The time required for Phase 3 is termed Takeover Time.

Phase 4, Book-keeping, comprises a final set of ancillary tasks, required for move-completion.

The total time for the move is termed Move Time.

Limiting Concurrent vBucket Moves

Since vBucket moves are highly resource-intensive, Couchbase Server allows the concurrency of such moves to be limited: a setting is provided that determines the maximum number of concurrent vBucket moves permitted on any node. The minimum value for the setting is 1, the maximum 64, the default 4.

A move counts toward this restriction only when in the backfill phase, as described above, in Data Service Rebalance Phases. The move may be of either an active or a replica vBucket. A node’s participation in the move may be as either a source or a target.

For example, if a node is at a given time the source for two moves in backfill phase, and is the target for two additional moves in backfill phase, and the setting stands at 4, the node may participate in the backfill phase of no additional moves, until at least one of its current moves has completed its backfill phase.

The setting may be established by means of the Couchbase Web Console, the Couchbase CLI, or the REST API.

A higher setting may improve rebalance performance, at the cost of higher resource consumption; in terms of CPU, memory, disk, and bandwidth. Conversely, a lower setting may degrade rebalance performance, while freeing up such resources. Note, however, that rebalance performance can be affected by many additional factors; and that in consequence, changing this parameter may not always have the expected effects. Note also that a higher setting, due to its additional consumption of resources, may degrade the performance of other systems, including the Data Service.

Accessing Rebalance Reports

Couchbase Server creates a report on every rebalance that occurs. The report contains a JSON document, which can be inspected in any browser or editor. The document provides summaries of the concluded rebalance activity, as well as details for each of the vBuckets affected: in consequence, the report may be of considerable length.

On conclusion of a rebalance, its report can be accessed in any of the following ways:

By means of Couchbase Web Console, as described in Add a Node and Rebalance.
By means of the REST API, as described in Getting Cluster Tasks.
By accessing the directory /opt/couchbase/var/lib/couchbase/logs/rebalance on any of the cluster nodes. A rebalance report is maintained here for (up to) the last five rebalances performed. Each report is provided as a *.json file, whose name indicates the time at which the report was run — for example, rebalance_report_2020-03-17T11:10:17Z.json.

A complete account of the report-content is provided in the Rebalance Reference.

Rebalance and Other Services

Rebalance affects different services differently. The effects on services other than the Data Service are described below.

Index Service

The Index Service maintains a cluster-wide set of index definitions and metadata, which allows the redistribution of indexes and index replicas during a rebalance.

The rebalance process takes into account the nodes' CPU, RAM, and disk bandwidth to limit its effect on database performance.

Index Rebalance Methods

By default, Couchbase Server rebuilds the indexes in their new locations during a rebalance. If your cluster uses Standard Index Storage, you can choose to enable file-based rebalance which moves index files between nodes instead of rebuilding them. Copying the index files is faster than having the target node rebuild the index from scratch. However, if you enable file-based rebalancing, you lose the ability to choose which Index Service node handles a specific index.

The first rebalance after you enable file-based index rebalance does not use the file-based rebalancing method. Instead, the rebalance process rebuilds the indexes. During this rebuild process, Couchbase Server adds metadata to the index files that enables later rebalances to move them between nodes. After this initial rebalance, all later rebalances move the index files as long you as you keep file-base rebalance enabled.

If a file-base rebalance fails, you can try starting a new rebalance. The second rebalance does not re-transfer any index that the failed rebalance finished moving. Instead, it moves just the indexes that had not moved or were in the process of moving when the error occurred.

You enable file-based rebalance using settings located under the under the Index Storage Mode on the Couchbase Server Web Console’s Settings page. You can also enable it using the REST API’s settings/indexes endpoint.

Index Redistribution

Couchbase Server can redistribute indexes during rebalance. Redistributing indexes can improve performance by offloading heavily loaded nodes. A rebalance automatically redistributes indexes in the following situations:

Rebalance when you add an index node: Rebalance always moves indexes off of nodes that you’re removing from the cluster to the remaining nodes. A rebalance does not affect indexes that reside on nodes that you’re not removing.
Rebalance when you add and remove index nodes: A swap rebalance moves indexes from nodes you’re removing and places them on the nodes you’re adding.

You can change a setting to have a rebalance redistribute indexes in more cases. This setting is named Optimize Index Placement On Rebalance on the Couchbase Server Web Console’s Settings page. You can also change this setting using the Settings REST API redistributeIndexes argument.

In Couchbase Server 7.2 and later, the redistribution setting affects both partitioned and non-partitioned indexes.

Enabling the index redistribution setting causes a rebalance to redistribute indexes in the following situations:

Rebalance after you add an index node: Rebalance optimizes index placement across all index nodes in the cluster, including on the new index nodes.
Rebalance after you add or remove a non-index node: Rebalance moves indexes from heavily loaded nodes to nodes with free resources to balance distribution.
Rebalance during an index server group repair: A group failure in a multiple server group database can force all replicas into a single group. In this case, rebalance redistributes the replicas to support high availability across server groups after the server group repair.

If after you drop Index Service nodes, the remaining nodes cannot handle all of the index replicas, Couchbase Server drops some of the replicas. If you later add additional Index Service nodes to the cluster, Couchbase Server replaces the dropped replicas.

Index Rebuild Batch Size

When Couchbase Server rebalances indexes by rebuilding them, it groups the rebuilds in batches. This batching limits the overhead of rebuilding the indexes on the cluster. The default batch size is 3, which means that a rebalance rebuilds up to three indexes at the same time. This setting has no effect if you have enabled file-based index rebalancing.

Users with Full Admin or Cluster Admin roles can change the batch size using the REST API. See Modify Index Batch Size.

Search Service

The Search Service automatically partitions its indexes across all Search nodes in the cluster, ensuring optimal distribution, following rebalance.

To achieve this, in versions of Couchbase Server prior to 7.1, by default, partitions needing to be newly created were entirely built, on their newly assigned nodes. In 7.1+, by default, new partitions are instead created by the transfer of partition files from old nodes to new nodes: this significantly enhances performance. This is an Enterprise-only feature, which requires all Search Service nodes either to be running 7.1 or later; or to be running 7.0.2, with the feature explicitly switched on.

Community Edition clusters that are upgraded to Enterprise Edition 7.1+ thus gain this feature in its default setting. Community Edition clusters that are upgraded to Enterprise Edition 7.0.2 can have this feature switched on, subsequent to upgrade.

During file transfer, should an unresolvable error occur, file transfer is automatically abandoned, and partition build is used instead.

The file-transfer feature can be enabled and disabled by means of the REST API. See Rebalance Based on File Transfer.

Query Service

When a node is removed and rebalanced, the Query Service will allow existing queries and transactions to complete before shutting down, which may result in the rebalancing operation taking longer to complete. The Query Service diagnostic log on the node(s) being removed will contain messages indicating how many transactions and queries are still running. Any new connection attempts to nodes that are shutting down will receive error 1180 (E_SERVICE_SHUTTING_DOWN), and may receive error 1181 (E_SERVICE_SHUT_DOWN) in the brief period between the completion of the last statement or transaction and the service exiting. Such rejected requests will have HTTP status code 503 (service unavailable) set.

Eventing Service

When an Eventing Service node has been added or removed, rebalance causes the mutation (vBucket processing ownership) and timer event processing workload to be redistributed among available Eventing Service nodes. The Eventing Service continues to process mutations both during and after rebalance. Checkpoint information ensures that no mutations are lost.

Analytics Service

The Analytics Service uses shadow data, which is a copy of all or some of the data maintained by the Data Service. By default, the shadow data is not replicated; however, it may be partitioned across all cluster nodes that run the Analytics Service. Starting with Couchbase Server 7.1, the shadow data and its partitions may be replicated up to 3 times. Each replica resides on an Analytics node: a given Analytics node can host a replica partition, or the active partition on which replicas are based.

If there are no Analytics replicas, and an Analytics node fails over, the Analytics Service stops working cluster-wide: ingestion of shadow data stops and no Analytics operations can be run. In this case:

If the Analytics node is recovered, the Analytics Service is resumed and ingestion of shadow data resumes from the point before the node failed over.
If the Analytics node is removed, the Analytics Service becomes active again after rebalance, but ingestion of shadow data must begin again from scratch.

If there are Analytics replicas, and an Analytics node fails over, the Analytics Service continues to work: one of the replicas is promoted to serve the shadow data that was stored on the failed over node. The Analytics Service only needs to rebuild any shadow data that isn’t already ingested from the Data Service, depending on the state of the promoted replica. In this case:

If the Analytics node is recovered, the shadow data on the recovered node is updated from the promoted replica, and it becomes the active partition again.
If the Analytics node is removed, the shadow data is redistributed among the remaining Analytics nodes in the cluster.

If no Analytics Service node has been removed or replaced, shadow data is not affected by rebalance. In consequence of rebalance, the Analytics Service receives an updated cluster map, and continues to work with the modified vBucket-topology.

Backup Service

A rebalance causes the scheduler for the Backup Service to stop running. This means that no new backup tasks are triggered until the rebalance has concluded; at which point, the scheduler restarts, and reconstructs the task schedule. Then, the triggering of Backup Service tasks is resumed.

Note that a rebalance has the effect of restarting the Backup Service whenever the service has previously been stopped, due to loss of its leader: for information, see Backup-Service Architecture.

Rebalance Failure-Handling

Rebalance failures can optionally be responded to automatically, with up to 3 retries. The number of seconds required to elapse between retries can also be configured. For information on configuration options, see General Settings. For information on failure-notifications, and options for cancelling rebalance-retries, see Automated Rebalance Failure Handling.