Clusters and Availability

One or more instances of Couchbase Server constitute a cluster, which replicates data across server-instances, and across clusters; and so ensures high availability.

Clusters

A Couchbase cluster consists of one or more instances of Couchbase Server, each running on an independent node. Data and services are shared across the cluster.

When Couchbase Server is being configured on a node, it can be specified either as its own, new cluster, or as a participant in an existing cluster. Thus, once a cluster exists, successive nodes can be added to it; each node running Couchbase Server. When a cluster has multiple nodes, the Couchbase Cluster Manager runs on each node: this manages communications between nodes, and ensures that all nodes are healthy. The Cluster Manager provides information on the cluster to the user interface of Couchbase Web Console.

Services can be configured to run on all or some nodes in a cluster. For example, given a cluster of five nodes, a small dataset might require the Data Service on only one of the nodes; a large on four or five. Alternatively, a heavy query workload might require the Query Service to run on multiple nodes, rather than just one. This ability to scale services individually promotes optimal hardware-resource utilization.

Availability

Data is automatically distributed across a cluster by Couchbase Server: applications are not involved. Each defined bucket is stored by the Data Service as 1024 vBuckets (virtual buckets), which are spread evenly across all available Data Service nodes. Documents are stored intact within vBuckets. vBuckets can also be replicated, across the cluster, by means of the Database Change Protocol; with each replica always existing on a node different from that of its original.

Couchbase Server automatically handles the addition and removal of nodes, and the failure of nodes; such that no data-loss occurs. vBuckets and their replicas are redistributed across available nodes whenever a change of configuration is detected.

High Availability is achieved by means of Cross Datacenter Replication (XDCR); whereby the contents of a bucket can be selectively replicated to a bucket maintained on a remote cluster. Additionally, Server Group Awareness allows groups of nodes to be defined, within the cluster, to protect the cluster against significant infrastructure-failure.