Inter-Sync Gateway Replication

Use inter-Sync Gateway replication to keep clusters in different mobile data centers in sync.
Inter-Sync Gateway replication supports resilient, secure, scalable bidirectional synchronization of data cloud-to-edge.

Related topics: Overview | Run | Manage | Monitor | Conflict

Other Topics: Legacy Pre-3.0 Configuration | Admin REST API

Context Clarification

This content relates only to inter-Sync Gateway replication in Sync Gateway 2.8+. For documentation on pre-2.8 inter-Sync Gateway replication (also known as SG Replicate) — see the documentation for the appropriate release.

Introduction

Couchbase Sync Gateway’s Inter-Sync Gateway Replication feature supports cloud-to-edge synchronization use cases, where data changes must be synchronized between a centralized cloud cluster and a large number of edge clusters whilst still enforcing fine grained access control. This is an increasingly important enterprise-level requirement.

In the architecture diagram (Figure 1), the replicator on the active Sync Gateway node ensures that any database changes made to documents in either Sync Gateway database glossary icon instance are replicated to the other Sync Gateway instance, in accordance with the replication’s configuration — see replications for configuration details.

Figure 1. Inter-Sync Gateway Replication architecture

Use Cases

Cloud-to-edge synchronization

In this multi-cloud deployment mode large numbers of multiple edge clusters sync with one or more clusters in cloud data centers. Each edge can operate autonomously without network connectivity to the cloud data centers ^[1].

A typical architecture for this use cases is shown in Figure 2

Figure 2. Cloud-to-edge synchronization

Active-to-active Mobile Synchronization

Here edge clusters containing active Sync Gateway nodes are replicated between geographically separate cloud-based Sync Gateway deployments. An ideal use-case for inter-Sync Gateway replication, which was designed to keep clusters in different data centers in sync.

The Couchbase Server Cross Data Center Replication API (XDCR) is similarly used to replicate between Couchbase Server clusters. However, the inter-Sync Gateway replication feature is specifically designed for Sync Gateway deployments and it must be used for replication between mobile clusters. ^[2]

A typical architecture for this use case is shown in Figure 3

Figure 3. Active-to-active mobile synchronization

Replication Characteristics

Context

Sync Gateway supports the ability to run replications between Sync Gateway clusters using a new websockets based protocol, with the active replicator glossary icon synchronizing changes between two Sync Gateway databases.

All replications are based on a replication definition glossary icon provided either to the Admin REST API or in Sync Gateway’s configuration file (JSON).

Replications always involve at least one local database. Sync Gateway does not enable replication between two remote nodes, because replications are defined at database level and so at least one database will be local.

All replications take place at the document level (but see also, Delta Sync).

Sync Gateway nodes can opt-out of participating in the replication process using the database-level parameter sgreplicate_enabled.

Related configuration elements: Database Configuration | replications | remote | sgreplicate_enabled
For legacy versions see: Legacy Pre-3.0 Configuration

Protocol

Inter-Sync Gateway replications are based on websockets. This is the exact same protocol that is used for replication with Couchbase Lite 2.x clients.

For users on releases prior to 2.8, SG Replicate provides a HTTP-based replication — see the appropriate release documentation.

The bi-directional, persistent, nature of websocket connections is ideal for applications such as a continuous Sync Gateway replication, which is constantly waiting-for and synchronizing change events.

Types of Replication

Replications are either: adhoc replication glossary icon (REST API only) or persistent replication. They can also be configured to run in one of two-modes: continuous or one-shot.

Persistent

Persistent replications survive Sync Gateway node restarts and continue running automatically unless configured not to. They can be configured to run in either continuous or one-shot mode.
Ad hoc
Ad hoc replications are transient, existing only for the period of the replication. They provide a convenient way to:
- Run one off replications (for example, when troubleshooting)
- Run on-demand replications after Sync Gateway is started. For instance a replication that needs to be to be run only periodically can be configured as an ad hoc replication by an automated script scheduled to run when needed.

Related configuration elements: replications | continuous | adhoc

Delta Sync

This content relates only to ENTERPRISE EDITION

With delta-sync enabled on the replication and both databases involved, only the changed data items are transferred.

You can configure replications to use delta-sync by:

Setting "enable_delta_sync": true in the replication definition
Setting "delta-sync": { "enabled": true} on both databases in their respective database definitions.

Push replications to pre-2.8 targets do not use Delta Sync

Related configuration elements: Database Configuration | replications | database.delta_sync | enable_delta_sync

Directionality

Replications are bi-directional. You can push, pull or push and pull between the two database endpoints.

Related configuration elements: replications | direction

Security

Transport level security is provided for. Use the appropriate prefix in URL (WSS for websockets).

Authentication

Support for Basic Authentication using username and password credentials is provided

Access Control

Data access control is provided by Sync Gateway’s sync function glossary icon and the username/password credentials. All replicated documents pass through this function ensuring that access permissions are adhered to.

Related configuration elements: Database Configuration | database.sync

Network Resilience

Inter-Sync Gateway replications will automatically attempt to restart whenever the node restarts.

Network resiliency is built-in for continuous replications. They respond to network issues such as lost connections, by applying a persistent exponential backoff glossary icon policy to attempt reconnection.

The max_backoff_time determines the maximum wait time between retries. When the limit is reached retries are made every max_backoff_time minutes. Set "max_backoff_time": 0 to prevent indefinite retries. Exponential backoff retries will be attempted for up to 5 minutes and then stop if the connection has not been re-established

Related replication definition elements: max_backoff_time

High Availability

Overview

This content relates only to ENTERPRISE EDITION

Enterprise
Community

Inter-Sync Gateway Replication provides built-in High Availability (HA) support. It uses node distribution to ensure all running replications are uniformly distributed across all available nodes, regardless of their originating node.

A replication runs on only one node at any given time. When a node fails, the system automatically distributes that node’s replications across any available alternative nodes (providing the replication has been configured on multiple nodes).

To use high-availability, configure the same replication on at least two Sync Gateway nodes.

Even though automatic node-distribution is not available in COMMUNITY EDITION, you can make your replications more highly-available.

Simply define the same replication on multiple nodes. They will then run on each of those nodes.

This redundancy provides some resiliency if a node fails. As, although no automatic distribution of replications is done,if the replication is running on multiple nodes then it will continue running on any surviving nodes.

See also: Examples of Expected behavior below.

Node Distribution

The goal of node distribution is to maintain an optimal balance of replications across the cluster.,with any given replication runs on only one node at any give time.

To achieve this Sync Gateway automatically balances, as equally as possible, the number of replications running on each node.

Where multiple replications are configured on multiple nodes, Sync Gateway automatically distributes these replications across all the available nodes for which the replications are configured. It continually monitors and redistributes replications as the number of available nodes and the number of running replications in a cluster changes.

The nodes' processing load and bandwidth usage is minimized by ensuring that a replicator runs on only one node at any given time — even where it has been configured to run on multiple nodes. This avoids the redundant exchange of data arising from duplicate replication.

Configuration Requirements

To configure a replication to be highly available, include its database and replication definition in the sync gateway configuration on each node in the cluster that you want to be able to run it. At least two nodes are required.

Node distribution will automatically elect an appropriate node to run them on and take care of redistributing them if a node fails.

Related configuration elements`: Legacy Pre-3.0 Configuration | Admin REST API

Expected Failure Behavior

If a node fails, then Sync Gateway will take any replications configured on multiple nodes, redistribute them across all available remaining nodes and restart them. Node distribution will continually seek to maintain an optimal distribution of replications across available nodes.

Examples of Expected behavior

This section provides examples of expected behavior in differing scenarios. It provides a comparison of how behavior differs between ENTERPRISE EDITION and COMMUNITY EDITION.

The following scenarios are covered, each involves a sync gateway cluster with multiple nodes:

Homogenous configuration — see Example 1
Homogenous configuration with non-replicating node — see Example 2
Heterogenous configuration — see Example 3
Adding more nodes — see Example 4
Failing node — see Example 5

Example 1. Homogenous configuration

Scenario

The cluster comprises three Sync Gateway nodes
The same sync gateway configuration is applied across all nodes
All nodes are configured to run Replication Id 1

Sync Gateway automatically designates one of the three nodes to run Replication Id 1.
If a node goes down, Sync Gateway elects one of the remaining nodes to continue Replication Id 1.

Sync Gateway runs Replication Id 1 on all nodes in the cluster.

Example 2. Homogenous configuration with non-replicating node

Scenario

The cluster comprises three Sync Gateway nodes.
Each node has the same sync gateway configuration, with one exception. The configuration on Node 3 has opted out of replication (sgreplicate_enabled=false)
All Three nodes are configured to run Replication Id 1.

Sync Gateway automatically designates either Node 1 or Node 2 to run the Replication Id 1.
If either Node 1 or Node 2 fails, Sync Gateway elects the non-failing node.

Sync Gateway runs Replication Id 1 on all nodes in cluster.
The system ignores the opt-out flag (sgreplicate_enabled).

Example 3. Heterogenous configuration

Scenario

The cluster comprises three Sync Gateway nodes
Both Node 1 and Node 2 are configured to run Replication Id 1
Node 3 is configured to run Replication Id 2 but not Replication Id 1

Sync Gateway automatically distributes Replication Id 1 and Replication Id 2 so that each runs on one of Node 1, Node 2 or Node 3, with no node running both replications simultaneously.
If any node fails whilst running either replication, Sync Gateway elects a non-failing node to continue that replication on. Where two nodes remain the node not running a replication will be chosen.

Sync Gateway runs Replication Id1 on Node 1 and Node 2 in the cluster
Sync Gateway runs Replication Id 2 on Node 3

Note:

If Node 3 fails, then Replication Id 2 will not be continued on either of the remaining nodes as it is not configured on them
Similarly, if either or both of the other nodes (Node 1 and Node 2) fails, Node 3 will not be a candidate to run the corresponding replication.

Example 4. Adding more nodes

Scenario

The cluster comprises a single Sync Gateway node
Node 1 is configured to run Replication Id 1 and Replication Id 2
LATER . . . Node 2 is added to the cluster to run Replication Id 1 and Replication Id 2.

Sync Gateway designates Node 1 run both Replication Id 1 and Replication Id 2
LATER . . . when Node 2 is added . . .
- Sync Gateway select one of the Node 1 replications to run on Node 2; let’s say it chooses Replication Id 2
- Sync Gateway stops Replication Id 2 on Node 1
- Sync Gateway starts Replication Id 2 on Node 2.

Sync Gateway designates Node 1 to run both Replication Id 1 and Replication Id 2
WHEN . . . Node 2 is added . . . Sync Gateway designates it to run both Replication Id 1 and Replication Id 2

Example 5. Failing node

Scenario

The cluster comprises three Sync Gateway nodes with a homogeneous configuration
All three nodes are configured to run Replication Id 1, Replication Id 2 and Replication Id 3
LATER . . . Node 3 goes down

Sync Gateway automatically distributes the replications, one to each of the nodes

Lets assume the following distribution:
- Node 1 runs Replication Id 1
- Node 2 runs Replication Id 2
- Node 3 runs Replication Id 3

WHEN . . . Node 3 goes down . . . Sync Gateway elects either Node 1 or Node 2 to continue running Replication Id 3

Sync Gateway runs all three replications (Replication Id 1 , Replication Id 2 and Replication Id 3) on all three nodes in the cluster (Node 1, Node 2 and Node 3)

WHEN . . . Node 3 goes down . . . Node 1 and Node 2 continue to run Replication Id 1, Replication Id 2 and Replication Id 3

Monitoring Node Distribution

Use the _replicationStatus endpoint to access information about which replications are running on which nodes — see: _replicationStatus(replicationID) | _replicationStatus(replicationID)?action={action}

This information is also collected and available in the log files.

Conflict Resolution

Inter-Sync Gateway pull replications support automatic conflict resolution by default (no conflict mode).

In Pull replications the active Sync Gateway detects and resolves conflicts based on its configured conflict_resolution_type and conflict resolver policy. This policy will determine the winner or return an error if it cannot.

Conflicts are not resolved in push replications though. The passive end of the push simply detects and rejects any conflicting revisions (409 Conflict response).

Both approaches reflect the way conflicts are handled by Couchbase Lite clients. Not surprising since in both instances Couchbase Lite is acting like the active node in an inter-sync gateway exchange.

Conflicts are only resolved during a pull replication. If conflicts are likely, you should configure a pushAndPull replication when using Conflict Free mode.

Alternatively: Run the replicator from the other side; flipping the direction (to pull) and the resolution policy (for example localWins becomes remoteWins).

See also — our blog post: Document Conflicts & Resolution in Couchbase Mobile

For ENTERPRISE EDITION, a custom conflict resolver policy is available, providing additional flexibility by allowing users to provide their own conflict resolution logic — see: Inter Sync Gateway Sync - Custom Conflict Resolution

Learn more …

Reference material …

Community

Mobile Forum | Blog | Blog (Mobile) | Tutorials

Conflict Related Blogs

1. This architecture is also known as ship-to-shore or hub-and-spoke.

2. If one-directional XDCR is used alongside Sync Gateway, then Sync Gateway cluster must be in read-only mode.

Inter-Sync Gateway Replication

Introduction

Use Cases

Cloud-to-edge synchronization

Active-to-active Mobile Synchronization

Replication Characteristics

Context

Protocol

Types of Replication

Delta Sync

Directionality

Security

Authentication

Access Control

Network Resilience

High Availability

Overview

Node Distribution

Configuration Requirements

Expected Failure Behavior

Examples of Expected behavior

Monitoring Node Distribution

Conflict Resolution

Related Content