Inter-Sync Gateway Replication

    +

    Use inter-Sync Gateway replication to keep clusters in different mobile data centers in sync.
    Inter-Sync Gateway replication supports resilient, secure, scalable bidirectional synchronization of data cloud-to-edge.

    Related inter-syncgateway topics: Overview | Run | Manage | Monitor | Conflict

    Other related topics: Configuration Properties | Admin REST API

    Context Clarification

    This content relates only to inter-Sync Gateway replication in Sync Gateway 2.8+. For documentation on pre-2.8 inter-Sync Gateway replication (also known as SG Replicate) — see SG-Replicate

    Introduction

    Couchbase Sync Gateway’s Inter-Sync Gateway Replicationglossary icon feature supports cloud-to-edgeglossary icon synchronization use cases, where data changes must be synchronized between a centralized cloud cluster and a large number of edge clusters whilst still enforcing fine grained access control. This is an increasingly important enterprise-level requirement.

    In the architecture diagram (Figure 1), the replicator on the active Sync Gateway node ensures that any database changes made to documents in either Sync Gateway databaseglossary icon instance are replicated to the other Sync Gateway instance, in accordance with the replication’s configuration — see replications for configuration details.

    icr replication overview
    Figure 1. Inter-Sync Gateway Replication architecture

    Use Cases

    Cloud-to-edge synchronization

    In this multi-cloud deployment mode large numbers of multiple edge clusters sync with one or more clusters in cloud data centers. Each edge can operate autonomously without network connectivity to the cloud data centers [1].

    A typical architecture for this use cases is shown in Figure 2

    icr cloud to edge200712
    Figure 2. Cloud-to-edge synchronization

    Active-to-active Mobile Synchronization

    Here edge clusters containing active Sync Gateway nodes are replicated between geographically separate cloud-based Sync Gateway deployments. An ideal use-case for inter-Sync Gateway replication, which was designed to keep clusters in different data centers in sync.

    The Couchbase Server Cross Data Center Replication API (XDCR) is similarly used to replicate between Couchbase Server clusters. However, the inter-Sync Gateway replication feature is specifically designed for Sync Gateway deployments and it must be used for replication between mobile clusters. [2]

    A typical architecture for this use case is shown in Figure 3

    icr active mobile sync200713
    Figure 3. Active-to-active mobile synchronization

    Replication Characteristics

    Context

    Sync Gateway supports the ability to run replications between Sync Gateway clusters using a new websockets based protocol, with the active replicatorglossary icon synchronizingglossary icon changes between two Sync Gateway databasesglossary icon.

    All replications are based on a replication definitionglossary icon provided either to the Admin REST API or in Sync Gateway’s configuration file (JSON).

    Replications always involve at least one local database. Sync Gateway does not enable replication between two remote nodes, because replications are defined at database level and so at least one database will be local.

    All replications take place at the document level (but see also, Delta Sync).

    Sync Gateway nodes can opt-out of participating in the replication process using the database-level parameter sgreplicate_enabled.

    Related configuration elements: databases | replications | remote | sgreplicate_enabled

    Protocol

    Inter-Sync Gateway replications are based on websockets. This is the exact same protocol that is used for replication with Couchbase Lite 2.x clients.

    For users on releases prior to 2.8, SG Replicate provides a HTTP-based replication — see the appropriate documentation here — SG-Replicate (deprecated).

    The bi-directional, persistent, nature of websocket connections is ideal for applications such as a continuous Sync Gateway replication, which is constantly waiting-for and synchronizing change events.

    Types of Replication

    Replications are either: adhoc replicationglossary icon (REST API only) or persistent replicationglossary icon. They can also be configured to run in one of two-modes: continuousglossary icon or one-shotglossary icon.

    • Persistent

      Persistent replications survive Sync Gateway node restarts and continue running automatically unless configured not to. They can be configured to run in either continuousglossary icon or one-shotglossary icon mode.

    • Ad hoc

      Ad hoc replications are transient, existing only for the period of the replication. They provide a convenient way to:

      • Run one off replications (for example, when troubleshooting)

      • Run on-demand replications after Sync Gateway is started. For instance a replication that needs to be to be run only periodically can be configured as an ad hoc replication by an automated script scheduled to run when needed.

    Related configuration elements: replications | continuous | adhoc

    Delta Sync

    This content relates only to ENTERPRISE EDITION

    With delta-sync enabled on the replication and both databases involved, only the changed data items are transferred.

    You can configure replications to use delta-sync by:

    • Setting "enable_delta_sync": true in the replication definition

    • Setting "delta-sync": { "enabled": true} on both databases in their respective database definitions.

    Push replications to pre-2.8 targets do not use Delta Sync

    Related configuration elements: databases | replications | this_db.delta_sync | enable_delta_sync

    Directionality

    Replications are bi-directional. You can push, pull or push and pull between the two database endpoints.

    Related configuration elements: replications | direction

    Security

    Transport level security is provided for. Use the appropriate prefix in URL (WSS for websockets).

    Authentication

    Support for Basic Authentication using username and password credentials is provided

    Access Control

    Data access control is provided by Sync Gateway’s sync functionglossary icon and the username/password credentials. All replicated documents pass through this function ensuring that access permissions are adhered to.

    Related configuration elements: databases | sync
    Related how-to: Sync Function | Sync Function

    Network Resilience

    Inter-Sync Gateway replications will automatically attempt to restart whenever the node restarts.

    Network resiliency is built-in for continuous replications. They respond to network issues such as lost connections, by applying a persistent exponential backoffglossary icon policy to attempt reconnection.

    The max_backoff_time determines the maximum wait time between retries. When the limit is reached retries are made every max_backoff_time minutes. Set "max_backoff_time": 0 to prevent indefinite retries. Exponential backoff retries will be attempted for up to 5 minutes and then stop if the connection has not been re-established


    Related replication definition elements: max_backoff_time

    High Availability

    Overview

    This content relates only to ENTERPRISE EDITION
    • Enterprise

    • Community

    Inter-Sync Gateway Replication provides built-in High Availability (HA) support. It uses node distribution to ensure all running replications are uniformly distributed across all available nodes, regardless of their originating node.

    A replication runs on only one node at any given time. When a node fails, the system automatically distributes that node’s replications across any available alternative nodes (providing the replication has been configured on multiple nodes).

    To use high-availability, configure the same replication on at least two Sync Gateway nodes.

    Even though automatic node-distribution is not available in COMMUNITY EDITION, you can make your replications more highly-available.

    Simply define the same replication on multiple nodes. They will then run on each of those nodes.

    This redundancy provides some resiliency if a node fails. As, although no automatic distribution of replications is done,if the replication is running on multiple nodes then it will continue running on any surviving nodes.

    Node Distribution

    The goal of node distribution is to maintain an optimal balance of replications across the cluster.,with any given replication runs on only one node at any give time.

    To achieve this Sync Gateway automatically balances, as equally as possible, the number of replications running on each node.

    Where multiple replications are configured on multiple nodes, Sync Gateway automatically distributes these replications across all the available nodes for which the replications are configured. It continually monitors and redistributes replications as the number of available nodes and the number of running replications in a cluster changes.

    The nodes' processing load and bandwidth usage is minimized by ensuring that a replicator runs on only one node at any given time — even where it has been configured to run on multiple nodes. This avoids the redundant exchange of data arising from duplicate replication.

    Configuration Requirements

    To configure a replication to be highly available, include its database and replication definition in the sync gateway configuration on each node in the cluster that you want to be able to run it. At least two nodes are required.

    Node distribution will automatically elect an appropriate node to run them on and take care of redistributing them if a node fails.

    Related configuration elements`: Configuration Properties | Admin REST API

    Expected Failure Behavior

    If a node fails, then Sync Gateway will take any replications configured on multiple nodes, redistribute them across all available remaining nodes and restart them. Node distribution will continually seek to maintain an optimal distribution of replications across available nodes.

    Examples of Expected behavior

    This section provides examples of expected behavior in differing scenarios. It provides a comparison of how behavior differs between ENTERPRISE EDITION and COMMUNITY EDITION.

    The following scenarios are covered, each involves a sync gateway cluster with multiple nodes:

    • Homogenous configuration — see Example 1

    • Homogenous configuration with non-replicating node — see Example 2

    • Heterogenous configuration — see Example 3

    • Adding more nodes — see Example 4

    • Failing node — see Example 5

    Example 1. Homogenous configuration
    Scenario
    • The cluster comprises three Sync Gateway nodes

    • The same sync gateway configuration is applied across all nodes

    • All nodes are configured to run Replication Id 1

    • Sync Gateway automatically designates one of the three nodes to run Replication Id 1.

    • If a node goes down, Sync Gateway elects one of the remaining nodes to continue Replication Id 1.

    Sync Gateway runs Replication Id 1 on all nodes in the cluster.

    Example 2. Homogenous configuration with non-replicating node
    Scenario
    • The cluster comprises three Sync Gateway nodes.

    • Each node has the same sync gateway configuration, with one exception. The configuration on Node 3 has opted out of replication (sgreplicate_enabled=false)

    • All Three nodes are configured to run Replication Id 1.

    • Sync Gateway automatically designates either Node 1 or Node 2 to run the Replication Id 1.

    • If either Node 1 or Node 2 fails, Sync Gateway elects the non-failing node.

    • Sync Gateway runs Replication Id 1 on all nodes in cluster.

    • The system ignores the opt-out flag (sgreplicate_enabled).

    Example 3. Heterogenous configuration
    Scenario
    • The cluster comprises three Sync Gateway nodes

    • Both Node 1 and Node 2 are configured to run Replication Id 1

    • Node 3 is configured to run Replication Id 2 but not Replication Id 1

    • Sync Gateway automatically distributes Replication Id 1 and Replication Id 2 so that each runs on one of Node 1, Node 2 or Node 3, with no node running both replications simultaneously.

    • If any node fails whilst running either replication, Sync Gateway elects a non-failing node to continue that replication on. Where two nodes remain the node not running a replication will be chosen.

    • Sync Gateway runs Replication Id1 on Node 1 and Node 2 in the cluster

    • Sync Gateway runs Replication Id 2 on Node 3

    Note:

    • If Node 3 fails, then Replication Id 2 will not be continued on either of the remaining nodes as it is not configured on them

    • Similarly, if either or both of the other nodes (Node 1 and Node 2) fails, Node 3 will not be a candidate to run the corresponding replication.

    Example 4. Adding more nodes
    Scenario
    • The cluster comprises a single Sync Gateway node

    • Node 1 is configured to run Replication Id 1 and Replication Id 2

    • LATER . . . Node 2 is added to the cluster to run Replication Id 1 and Replication Id 2.

    • Sync Gateway designates Node 1 run both Replication Id 1 and Replication Id 2

    • LATER . . . when Node 2 is added . . .

      • Sync Gateway select one of the Node 1 replications to run on Node 2; let’s say it chooses Replication Id 2

      • Sync Gateway stops Replication Id 2 on Node 1

      • Sync Gateway starts Replication Id 2 on Node 2.

    • Sync Gateway designates Node 1 to run both Replication Id 1 and Replication Id 2

    • WHEN . . . Node 2 is added . . . Sync Gateway designates it to run both Replication Id 1 and Replication Id 2

    Example 5. Failing node
    Scenario
    • The cluster comprises three Sync Gateway nodes with a homogeneous configuration

    • All three nodes are configured to run Replication Id 1, Replication Id 2 and Replication Id 3

    • LATER . . . Node 3 goes down

    Sync Gateway automatically distributes the replications, one to each of the nodes

    • Lets assume the following distribution:

      • Node 1 runs Replication Id 1

      • Node 2 runs Replication Id 2

      • Node 3 runs Replication Id 3

    WHEN . . . Node 3 goes down . . . Sync Gateway elects either Node 1 or Node 2 to continue running Replication Id 3

    Sync Gateway runs all three replications (Replication Id 1 , Replication Id 2 and Replication Id 3) on all three nodes in the cluster (Node 1, Node 2 and Node 3)

    WHEN . . . Node 3 goes down . . . Node 1 and Node 2 continue to run Replication Id 1, Replication Id 2 and Replication Id 3

    Monitoring Node Distribution

    Use the _replicationStatus endpoint to access information about which replications are running on which nodes — see: _replicationStatus(replicationID) | _replicationStatus(replicationID)?action={action}

    This information is also collected and available in the log files.

    Conflict Resolution

    Inter-Sync Gateway pull replications support automatic conflict resolution by default (no conflict mode).

    In Pull replications the active Sync Gateway detects and resolves conflicts based on its configured conflict_resolution_type and conflict resolver policy. This policy will determine the winner or return an error if it cannot.

    Conflicts are not resolved in push replications though. The passive end of the push simply detects and rejects any conflicting revisions (409 Conflict response).

    Both approaches reflect the way conflicts are handled by Couchbase Lite clients. Not surprising since in both instances Couchbase Lite is acting like the active node in an inter-sync gateway exchange.

    Conflicts are only resolved during a pull replication. If conflicts are likely, you should configure a pushAndPull replication when using Conflict Free mode.

    Alternatively: Run the replicator from the other side; flipping the direction (to pull) and the resolution policy (for example localWins becomes remoteWins).

    See also — our blog post: Document Conflicts & Resolution in Couchbase Mobile

    For ENTERPRISE EDITION, a custom conflict resolver policy is available, providing additional flexibility by allowing users to provide their own conflict resolution logic — see: Custom Conflict Resolution Policy.

    For more on conflict resolution — see: Conflict Resolution


    1. This architecture is also known as ship-to-shore or hub-and-spoke.
    2. If one-directional XDCR is used alongside Sync Gateway, then Sync Gateway cluster must be in read-only mode.