Fail a Node Over and Rebalance

March 9, 2025

+ 12

Nodes can be failed over, and thereby removed safely from a cluster in the event of unavoidable downtime, without any break in the serving of data to applications.

Understanding Failover

A complete conceptual description of failover is provided in Failover. There are two basic types: graceful and hard.

Graceful allows a Data Service node to be removed from the cluster proactively, in an orderly and controlled fashion (say, for the purposes of system-maintenance). It is manually initiated when the entire cluster is in a healthy state, and all active and replica vBuckets on all nodes are available.
Hard: The ability to drop a node from the cluster reactively, because the node has become unavailable or unstable. It is manually or automatically initiated, and should be applied after the point at which active vBuckets have been lost. If applied to an available node running the Data Service, ongoing writes and replications may be interrupted.

The automatic initiation of hard failover is known as automatic failover, and is configured by means of the General settings screen, in the Settings area of Couchbase Web Console; or by means of equivalent CLI and REST API commands.

Note that when a node is flagged for failover (as opposed to removal), replica vBuckets are lost when rebalance occurs, following node-removal. By contrast, removal creates new copies of replica vBuckets that would otherwise be lost (thereby creating greater competition for the memory resources of the remaining nodes).

This section shows how graceful and hard failover can be initiated.