A newer version of this documentation is available.

View Latest

Recovering a Node

After a node has been failed over, it can be recovered: that is, added back into the cluster from which it was failed over, by means of rebalancing.

Performing Node Recovery

After a node has been failed over, there are two principal options:

  • The node can be removed from the cluster, by means of the rebalance operation.

  • The node can be recovered, and thereby added back into the cluster — again by means of the rebalance operation.

These options are displayed by Couchbase Web Console as follows:

recoveryOptions

Removal of a failed-over node is appropriate when you have no intention or ability to recover the failed-over node: for example, when full hardware-replacement is required. Since the REMOVAL pending rebalance notification is now displayed, to remove the node, simply left-click on the Rebalance button, at the upper-right.

Alternatively, Recovery of the node may be more appropriate: for example, when the node’s availability has been restored, and no other key aspect of the node’s integrity has been impaired. To recover a failed-over node, you must select a recovery option, before performing a rebalance. Two recovery options are provided, by the buttons at the lower right; which are Add Back: Full Recovery and Add Back: Delta Recovery. After you have selected the appropriate option, you then Rebalance.

For example, if you left-click on Add Back: Full Recovery, the display changes as follows:

fullRecoveryPending

This now indicates that a Full Recovery will occur when you left-click the Rebalance button.

The Full and Delta recovery-options are described below.

Full Recovery

Full recovery involves removing all pre-existing data from, and assigning new data to, the node that is being recovered. Therefore, when this option has been specified, left-clicking on Rebalance causes the following:

  • All existing vBuckets and documents are removed from the node.

  • If GSI Indexes reside on the node, they are left unmodified during the rebalance process.

  • A new set of vBuckets and documents is assigned to the node.

  • When the node’s vBuckets are all up to date, and the rebalance process concludes, the node recommences the serving of data. If GSI Indexes reside on the node, they become active, and are updated by the Index Service as appropriate.

Delta Recovery

Delta recovery maintains and resynchronizes a node’s pre-existing data. Therefore, when this option has been specified, left-clicking on Rebalance causes the following:

  • No existing vBucket or document is removed from the node.

  • All existing data is loaded into memory.

  • The point at which mutations to node-data most recently stopped is determined. Then, vBuckets are duly updated from that point; based on the data-changes that have since occurred, elsewhere on the cluster.

  • If GSI Indexes reside on the node, they are left unmodified during the rebalance process.

  • When the node’s vBuckets are all up to date, and the rebalance process concludes, the node recommences the serving of data. If GSI Indexes reside on the node, they become active, and are updated by the Index Service as appropriate (this includes updates that correspond to whatever mutations were made by the Data Service while the node was in a failed over state).

Delta-Recovery Requirements

Delta recovery can only be performed when:

  • The node to be recovered is healthy; likewise the cluster.

  • The node to be recovered is in a failed over state.

  • Delta recovery is possible for all Buckets on the node. Note that the correspondence of Buckets, vBuckets, and documents on the node remains entirely unchanged by Delta recovery.

Delta-Recovery Failures

In some cases, when Delta recovery is attempted, and all the requirements listed immediately above have been met, the operation nevertheless either fails or defaults to Full recovery. This indicates that one or more of the following conditions exist:

  • Cluster-topology has changed since the node was last available within the cluster.

  • The node was hard failed over, and is marked for removal.

  • The rebalance was configured to perform the Delta recovery while simultaneously moving other nodes in or out of the cluster, and the numbers of nodes intended respectively to leave and join the cluster were unequal. (Note that in this case, a Swap Rebalance can be performed instead.)

  • Bucket-operations were performed while Delta recovery was pending: this changed configurations, and has made Delta recovery impossible.

  • Either the node being recovered or another node in the cluster has crashed, or become otherwise unavailable, at some point during the process of recovery and rebalance.

Recovery Performance

In many cases, Delta recovery is faster than Full recovery; since a significant quantity of usable data already resides on the node, and therefore does not require network-transfer: only updates made since the node’s last-recorded mutation need to be accessed from other nodes in the cluster.

However, in cases where the node’s memory-footprint is extremely large, and data-size exceeds bucket memory-quotas, the memory-management overhead potentially entailed by Delta recovery might imply Full recovery’s taking less time overall.

Performing Recovery with CLI

To select a recovery option with the Couchbase CLI, see the recovery command. To perform the subsequent rebalance, see the rebalance command.

Performing Recovery with REST

To select a recovery option with the Couchbase REST API, see Setting Recovery Type. To perform the subsequent rebalance, see Rebalancing Nodes