Rebalance redistributes data and indexes among available nodes.
When one or more nodes have been brought into a cluster (either by adding or joining), or have been taken out of a cluster (either through Removal or Failover), rebalance redistributes data and indexes among available nodes. The cluster map is correspondingly updated and distributed to clients. The process occurs while the cluster continues to service requests for data.
On rebalance, vBuckets are redistributed evenly among currently available Data Service nodes. After rebalance, operations are directed to active vBuckets in their updated locations. Rebalance does not interrupt applications' data-access. vBucket data-transfer occurs sequentially: therefore, if rebalance stops for any reason, it can be restarted from the point at which it was stopped.
Note the special case provided by Swap Rebalance, where the number of nodes coming into the cluster is equal to the number of nodes leaving the cluster, ensuring that data is only moved between these nodes.
If nodes have been removed such that the desired number of replicas can no longer be supported, rebalance provides as many replicas as possible. For example, if four Data Service nodes previously supported one bucket with three replicas, and the Data Service node-count is reduced to three, rebalance provides two replicas only. If and when the missing Data Service node is restored or replaced, rebalance will provide three replicas again.
See Intra-Cluster Replication, for information on how data is distributed across nodes.
vBuckets are moved in stages. The stages — which differ, depending on whether the vBucket is an active or a replica vBucket — are described below.
The stages through which rebalance moves a replica vBucket are shown by the following illustration.
The move has two principal phases. Phase 1 is Backfill. Phase 2 is Book-keeping.
Phase 1, Backfill, itself consists of two phases. The first comprises the movement of the replica vBucket data from its node of origin to the memory of the destination node. The second comprises the writing of the replica vBucket data from the memory to the disk of the destination node. The time required for this second phase, which only applies to Couchbase Buckets, is termed Persistence Time. The time required for the entire Backfill process, including Persistence Time, is termed Backfill Time.
Phase 2, Book-keeping, comprises various ancillary tasks required for move-completion. This includes additional Persistence Time, which is required as part of Book-keeping for a vBucket move.
The total time required for the move is calculated by adding Backfill Time to the time required for Phase 2, Book-keeping; and is termed Move Time.
The stages in which rebalance moves an active vBucket are shown by the following illustration.
The move has four principal phases. Phase 1, Backfill, and Phase 2, Book-keeping, are identical to those required for replica vBuckets.
Phase 3, Active Takeover, comprises the operations required to establish the relocated vBucket as the new active copy. The time required for Phase 3 is termed Takeover Time.
Phase 4, Book-keeping, comprises a final set of ancillary tasks, required for move-completion, including persistence.
The total time for the move is termed Move Time.
Rebalance affects different services differently. The effects on services other than the Data Service are described below.
The Index Service maintains a cluster-wide set of index definitions and metadata, which allows the redistribution of indexes and index replicas from removed nodes to nodes that continue as part of the cluster. Indexes that reside on non-removed nodes are unaffected by rebalance.
The rebalance process takes account of nodes' CPU and RAM utilization, and achieves the best resource-balance possible. Note that rebalance does not move indexes or replicas: instead, it rebuilds them in their new locations, using the latest data from the Data Service. If more index replicas exist than can be handled by the number of existing nodes, replicas are dropped: the numbers are automatically made up subsequently, if additional Index Service nodes are added to the cluster.
During rebalance, no index node is removed until index-building has completed on alternative nodes. This ensures uninterrupted access to indexes.
The Search Service automatically partitions its indexes across all Search nodes in the cluster, ensuring that during rebalance, the distribution across all nodes is balanced.
The addition or removal of Query Service nodes during rebalance is immediately effective: an added node is immediately available to serve queries; while a removed node is immediately unavailable, such that ongoing queries are interrupted, requiring the handling of errors or timeouts at application-level.
When an Eventing Service node has been added or removed, rebalance causes vBucket processing ownership to be redistributed among available Eventing Service nodes. After rebalance, the service continues to process mutations: checkpoint information ensures that no mutations are lost.
The Analytics Service uses shadow data, which is a single copy of a subset of the data maintained by the Data Service. The shadow data is not replicated; however, its single copy is partitioned across all cluster nodes that run the Analytics Service. If an Analytics node is permanently removed or replaced, all shadow data must be rebuilt, if and when the Analytics Service is restarted.
If no Analytics Service node has been removed or replaced, shadow data is not affected by rebalance. In consequence of rebalance, the Analytics Service receives an updated cluster map, and continues to work with the modified vBucket-topology.
Rebalance failures can optionally be responded to automatically, with up to 3 retries. The number of seconds required to elapse between retries can also be configured. For information on configuration options, see General Settings. For information on failure-notifications, and options for cancelling rebalance-retries, see Automated Rebalance Failure Handling.