Cross Data Center Replication (XDCR)
Cross Data Center Replication (XDCR) allows data to be replicated across clusters that are potentially located in different data centers.
Cross Data Center Replication (XDCR) replicates data between clusters: this provides protection against data center failure, and also provides high-performance data-access for globally distributed, mission-critical applications.
XDCR replicates data from a specific bucket on the source cluster to a specific bucket on the target cluster. Data from the source bucket is pushed to the target bucket by means of an XDCR agent, running on the source cluster, using the Database Change Protocol. Any bucket (Couchbase or Ephemeral) on any cluster can be specified as a source or a target for one or more XDCR definitions. Note, however, that if an Ephemeral bucket configured to eject data when its RAM-quota is exceeded is used as a source for XDCR, not all data written to the bucket is guaranteed to be replicated by XDCR. (See Buckets, for information on ejection.)
Cross Data Center Replication differs from intra-cluster replication in the following, principal ways:
As indicated by their respective names, intra-cluster replication replicates data across the nodes of a single cluster; while Cross Data Center Replication replicates data across multiple clusters, each potentially in a different data center.
Whereas intra-cluster replication is configured and performed with reference to only a single bucket (to which all active and replica vBuckets will correspond), XDCR requires two buckets to be administrator-specified, for a replication to occur: one is the bucket on the source cluster, which provides the data to be replicated; the other is the bucket on the target cluster, which receives the replicated data.
Whereas intra-cluster replication is configured at bucket-creation, XDCR is configured following the creation of both the source and target buckets.
The starting, stopping, and pausing of XDCR all occur independently of whatever intra-cluster replication is in progress on either the source or target cluster. While running, XDCR continuously propagates mutations from the source to the target bucket.
Prior to XDCR management, source and target clusters should be appropriately prepared, as described in Prepare for XDCR. Then, XDCR is managed in three stages:
Define a reference to a remote cluster, which will be the target for Cross Data Center Replication. See Create a Reference.
Define and start a replication, which continuously transfers mutations from a specified source bucket to a specified target bucket. See Create a Replication.
Couchbase provides three options for managing these stages, which are by means of:
Couchbase Web Console, which provides a graphical user interface for interactive configuration and management of replications.
CLI, which provides commands and flags that allow replications to be managed from the command line.
REST API, which underlies both the Web Console and CLI, and can be expressed either as a
curlcommand on the command line, or within a program or script.
For procedures that cover all main XDCR management tasks, performed with all three of the principal tools, see XDCR Management Overview.
XDCR allows replication to occur between source and target clusters in either of the following ways:
Unidirectionally: The data contained in a specified source bucket is replicated to a specified target bucket. Although the replicated data on the source could be used for the routine serving of data, it is in fact intended principally as a backup, to support disaster recovery.
Bidirectionally: The data contained in a specified source bucket is replicated to a specified target bucket; and the data contained in the target bucket is, in turn, replicated back to the source bucket. This allows both buckets to be used for the serving of data, which may provide faster data-access for users and applications in remote geographies.
Note that XDCR provides only a single basic mechanism from which replications are built: this is the unidirectional replication. A bidirectional topology is created by implementing two unidirectional replications, in opposite directions, between two clusters; such that a bucket on each cluster functions as both source and target.
Used in different combinations, unidirectional and bidirectional replication can support complex topologies; an example being the ring topology, where multiple clusters each connect to exactly two peers, so that a complete ring of connections is formed:
Filtering Expressions can be used in XDCR replications. Each is a regular expression that is applied to the document keys on the source cluster: those document keys returned by the filtering process correspond to the documents that will be replicated to the target. For information, See XDCR Advanced Filtering.
Optionally, deletion filters can be applied to a replication: these control whether the deletion of a document at source causes deletion of a replica that has been created. Each filter covers a specific deletion-context. For information, see Deletion Filters.
XDCR only replicates data: it does not replicate views or indexes. Views and indexes can only be replicated manually, or by administrator-provided automation: when the definitions are pushed to the target server, the views and indexes are regenerated there.
When encountered on the source cluster, non-UTF-8 encoded document IDs are automatically filtered out of replication: they are therefore not transferred to the target cluster.
For each such ID, the warning output
xdcr_error.* is written to the log files of the source cluster.
When a replication starts, it examines the specified source bucket, and determines which documents to replicate:
If XDCR Advanced Filtering is not applied, each document in the source bucket is replicated to the target.
If XDCR Advanced Filtering is applied, each document in the source bucket is examined; but only those documents that meet the specified filtering-criteria are replicated.
Once this initial process is complete, only mutated documents will be considered for replication. Mutated means one of the following:
Deleted or expired
Documents that are not mutated are never re-examined by the ongoing replication. For such re-examination to occur, either the current replication must be restarted, or a new replication must be configured. For more information, see Filter-Expression Editing.
Replication of a deleted or expired document means that the document will be correspondingly deleted or expired on the target. Note that this is the default behavior; although options are provided for not replicating deletion or expiration mutations — so that the replicated documents are not removed. See the reference information for the CLI xdcr-replicate command.
When throughput is high, multiple simultaneous XDCR replications are likely to compete with one another for system resources. In particular, when a replication starts, its initial process may be highly consumptive of memory and bandwidth, since all documents in the source bucket are being handled.
To manage system resources in these circumstances, each replication can be assigned a priority of High, Medium, or Low:
High. No resource constraints are applied to the replication. This is the default setting.
Medium. Resource constraints are applied to the replication while its initial process is underway, if the replication is in competition with one or more High priority replications. Subsequently, it is treated as a High priority replication.
Low. Resource constraints are applied to the replication whenever it is in competition with one or more High priority replications.
In some cases, especially when bidirectionally replicated data is being modified by applications in different locations, conflicts may arise: meaning that the data of one or more documents has been differently modified more or less simultaneously, requiring resolution. XDCR provides options for conflict resolution, based on either sequence number or timestamp, whereby conflicted data can be saved consistently on source and target. For more information, See XDCR Conflict Resolution.
In the event of data-loss, the cbrecovery tool can be used to restore data. The tool accesses remotely replicated buckets, previously created with XDCR, and copies appropriate subsets of their data back onto the original source cluster.
By means of intra-cluster replication, Couchbase Server allows one or more replicas to be created for each vBucket on the cluster. This helps to ensure continued data-availability in the event of node-failure.
However, if multiple nodes within a single cluster fail simultaneously, one or more active vBuckets and all their replicas may be affected; meaning that lost data cannot be recovered locally.
In such cases, provided that a bucket affected by such failure has already been established as a source bucket for XDCR, the lost data may be retrieved from the bucket defined on the remote server as the corresponding replication-target. This retrieval is achieved from the command-line, by means of the Couchbase cbrecovery tool.
For a sample step-by-step procedure, see Recover Data with XDCR.
XDCR configuration requires that the administrator provide a username and password appropriate for access to the target cluster. When replication occurs, the password is automatically supplied, along with the data. By default, XDCR transmits both password and data in non-secure form. Optionally however, a secure connection can be enabled between clusters, in order to secure either password alone, or both password and data. The password received by the destination cluster can be authenticated either locally or externally, as described in Authentication.
A secure XDCR connection is enabled either by SCRAM-SHA or by TLS — depending on the administrator-specified connection-type, and the server-version of the destination cluster. Use of TLS involves certificate management: for information on preparing and using certificates, see Manage Certificates.
Two administrator-specified connection-types are possible:
Half Secure: Secures the specified password only: it does not secure data. The password is secured by hashing with SCRAM-SHA, when the destination cluster is running Couchbase Enterprise Server 5.5 or later; and by TLS encryption, when the destination cluster is running a pre-5.5 Couchbase Enterprise Server. The root certificate of the destination cluster must be provided, for a successful TLS connection to be achieved.
Before attempting to enable half-secure replications, see the important information provided in SCRAM SHA and XDCR.
Full Secure: Handles both authentication and data-transfer via TLS.
For step-by-step procedures, see Secure a Replication.
The performance of XDCR can be fine-tuned, by means of configuration-settings, specified when a replication is defined. These settings modify compression, source and target nozzles (worker threads), checkpoints, counts, sizes, network usage limits, and more. For detailed information, see XDCR Advanced Settings.
The flush operation deletes data on a local bucket: this operation is disabled if the bucket is currently the source for an ongoing replication. If the target bucket is flushed during replication, the bucket becomes temporarily inaccessible, and replication is suspended.
If either a source or a target bucket needs to be flushed after a replication has been started, the replication must be deleted, the bucket flushed, and the replication then recreated.
Couchbase Server provides the ability to monitor ongoing XDCR replications, by means of the Couchbase Web Console. Detailed information is provided in Monitor a Replication.