Inter-Sync Gateway Replication

    +

    Description — Use inter-Sync Gateway replication to keep clusters in different mobile data centers in sync.
    Abstract — Inter-Sync Gateway replication supports resilient, secure, scalable bidirectional synchronization of data between cloud and edge data centers. It incorporates content presented under Manage>Replicate>Inter-Sync Gateway to give an integrated view.
    Related Content — Configuration File | Admin REST API | Initialize Inter-Sync Gateway Replications

    Context Clarification

    This content relates only to inter-Sync Gateway replication in Sync Gateway 2.8+. For documentation on pre-2.8 inter-Sync Gateway replication (also known as SG Replicate) — see Inter-Sync Gateway Replication pre-2.8

    Introduction

    Couchbase Sync Gateway’s Inter-Sync Gateway Replicationglossary icon feature supports cloud-to-edgeglossary icon synchronization use cases, where data changes must be synchronized between a centralized cloud cluster and a large number of edge clusters whilst still enforcing fine grained access control. This is an increasingly important enterprise-level requirement.

    In the architecture diagram (Figure 1), the replicator on the active Sync Gateway node ensures that any database changes made to documents in either Sync Gateway databaseglossary icon instance are replicated to the other Sync Gateway instance, in accordance with the replication’s configuration — see replications for configuration details.

    icr replication overview
    Figure 1. Inter-Sync Gateway Replication architecture

    Use Cases

    Cloud-to-edge synchronization

    In this multi-cloud deployment mode large numbers of multiple edge clusters sync with one or more clusters in cloud data centers. Each edge can operate autonomously without network connectivity to the cloud data centers [1].

    A typical architecture for this use cases is shown in Figure 2

    icr cloud to edge200712
    Figure 2. Cloud-to-edge synchronization

    Active-to-active Mobile Synchronization

    Here edge clusters containing active Sync Gateway nodes are replicated between geographically separate cloud-based Sync Gateway deployments. An ideal use-case for inter-Sync Gateway replication, which was designed to keep clusters in different data centers in sync.

    The Couchbase Server Cross Data Center Replication API (XDCR) is similarly used to replicate between Couchbase Server clusters. However, the inter-Sync Gateway replication feature is specifically designed for Sync Gateway deployments and it must be used for replication between mobile clusters.

    A typical architecture for this use case is shown in Figure 3

    icr active mobile sync200713
    Figure 3. Active-to-active mobile synchronization

    Replication Characteristics

    Context

    Sync Gateway supports the ability to run replications between Sync Gateway clusters using a new websockets based protocol, with the active replicatorglossary icon synchronizingglossary icon changes between two Sync Gateway databasesglossary icon.

    All replications are based on a replication definitionglossary icon provided either to the Admin REST API or in Sync Gateway’s configuration file (JSON).

    Replications always involve at least one local database. Sync Gateway does not enable replication between two remote nodes, because replications are defined at database level and so at least one database will be local.

    All replications take place at the document level (but see also, Delta Sync).

    Sync Gateway nodes can opt-out of participating in the replication process using the database-level parameter sgreplicate_enabled.

    Related configuration elements: databases | replications | remote | sgreplicate_enabled

    Protocol

    Inter-Sync Gateway replications are based on websockets. This is the exact same protocol that is used for replication with Couchbase Lite 2.x clients.

    For users on releases prior to 2.8, SG Replicate provides a HTTP-based replication — see the appropriate documentation here — Inter-Sync Gateway Replication pre-2.8 (deprecated).

    The bi-directional, persistent, nature of websocket connections is ideal for applications such as a continuous Sync Gateway replication, which is constantly waiting-for and synchronizing change events.

    Types of Replication

    Replications are either: adhoc replicationglossary icon (REST API only) or persistent replicationglossary icon. They can also be configured to run in one of two-modes: continuousglossary icon or one-shotglossary icon.

    • Persistent

      Persistent replications survive Sync Gateway node restarts and continue running automatically unless configured not to. They can be configured to run in either continuousglossary icon or one-shotglossary icon mode.

    • Ad hoc

      Ad hoc replications are transient, existing only for the period of the replication. They provide a convenient way to:

      • Run one off replications (for example, when troubleshooting)

      • Run on-demand replications after Sync Gateway is started. For instance a replication that needs to be to be run only periodically can be configured as an ad hoc replication by an automated script scheduled to run when needed.

    Related configuration elements: replications | continuous | adhoc

    Delta Sync

    This content relates only to ENTERPRISE EDITION

    With delta-sync enabled on the replication and both databases involved, only the changed data items are transferred.

    You can configure replications to use delta-sync by:

    • Setting "enable_delta_sync": true in the replication definition

    • Setting "delta-sync": { "enabled": true} on both databases in their respective database definitions.

    Push replications to pre-2.8 targets do not use Delta Sync

    Related configuration elements: databases | replications | this_db.delta_sync | enable_delta_sync

    Directionality

    Replications are bi-directional. You can push, pull or push and pull between the two database endpoints.

    Related configuration elements: replications | direction

    Security

    Transport level security is provided for. Use the appropriate prefix in URL (WSS for websockets).

    Authentication

    Support for Basic Authentication using username and password credentials is provided

    Access Control

    Data access control is provided by Sync Gateway’s sync functionglossary icon and the username/password credentials. All replicated documents pass through this function ensuring that access permissions are adhered to.

    Related configuration elements: databases | sync
    Related how-to: Sync Function | Use Sync functions?

    Network Resilience

    Inter-Sync Gateway replications will automatically attempt to restart whenever the node restarts.

    Network resiliency is built-in for continuous replications. They respond to network issues such as lost connections, by applying a persistent exponential backoffglossary icon policy to attempt reconnection.

    The max_backoff_time determines the maximum wait time between retries. When the limit is reached retries are made every max_backoff_time minutes. Set "max_backoff_time": 0 to prevent indefinite retries. Exponential backoff retries will be attempted for up to 5 minutes and then stop if the connection has not been re-established


    Related replication definition elements: max_backoff_time

    High Availability

    Overview

    This content relates only to ENTERPRISE EDITION
    • Enterprise

    • Community

    Inter-Sync Gateway Replication provides built-in High Availability (HA) support. It uses node distribution to ensure all running replications are uniformly distributed across all available nodes, regardless of their originating node.

    A replication runs on only one node at any given time. When a node fails, the system automatically distributes that node’s replications across any available alternative nodes (providing the replication has been configured on multiple nodes).

    To use high-availability, configure the same replication on at least two Sync Gateway nodes.

    Even though automatic node-distribution is not available in COMMUNITY EDITION, you can make your replications more highly-available.

    Simply define the same replication on multiple nodes. They will then run on each of those nodes.

    This redundancy provides some resiliency if a node fails. As, although no automatic distribution of replications is done,if the replication is running on multiple nodes then it will continue running on any surviving nodes.

    Node Distribution

    The goal of node distribution is to maintain an optimal balance of replications across the cluster.,with any given replication runs on only one node at any give time.

    To achieve this Sync Gateway automatically balances, as equally as possible, the number of replications running on each node.

    Where multiple replications are configured on multiple nodes, Sync Gateway automatically distributes these replications across all the available nodes for which the replications are configured. It continually monitors and redistributes replications as the number of available nodes and the number of running replications in a cluster changes.

    The nodes' processing load and bandwidth usage is minimized by ensuring that a replicator runs on only one node at any given time — even where it has been configured to run on multiple nodes. This avoids the redundant exchange of data arising from duplicate replication.

    Configuration Requirements

    To configure a replication to be highly available, include its database and replication definition in the sync gateway configuration on each node in the cluster that you want to be able to run it. At least two nodes are required.

    Node distribution will automatically elect an appropriate node to run them on and take care of redistributing them if a node fails.

    Related configuration elements`: Configuration File | Admin REST API

    Expected Failure Behavior

    If a node fails, then Sync Gateway will take any replications configured on multiple nodes, redistribute them across all available remaining nodes and restart them. Node distribution will continually seek to maintain an optimal distribution of replications across available nodes.

    Examples of Expected behavior

    This section provides examples of expected behavior in differing scenarios. It provides a comparison of how behavior differs between ENTERPRISE EDITION and COMMUNITY EDITION.

    The following scenarios are covered, each involves a sync gateway cluster with multiple nodes:

    • Homogenous configuration — see Example 1

    • Homogenous configuration with non-replicating node — see Example 2

    • Heterogenous configuration — see Example 3

    • Adding more nodes — see Example 4

    • Failing node — see Example 5

    Example 1. Homogenous configuration
    Scenario
    • The cluster comprises three Sync Gateway nodes

    • The same sync gateway configuration is applied across all nodes

    • All nodes are configured to run Replication Id 1

    • Sync Gateway automatically designates one of the three nodes to run Replication Id 1.

    • If a node goes down, Sync Gateway elects one of the remaining nodes to continue Replication Id 1.

    Sync Gateway runs Replication Id 1 on all nodes in the cluster.

    Example 2. Homogenous configuration with non-replicating node
    Scenario
    • The cluster comprises three Sync Gateway nodes.

    • Each node has the same sync gateway configuration, with one exception. The configuration on Node 3 has opted out of replication (sgreplicate_enabled=false)

    • All Three nodes are configured to run Replication Id 1.

    • Sync Gateway automatically designates either Node 1 or Node 2 to run the Replication Id 1.

    • If either Node 1 or Node 2 fails, Sync Gateway elects the non-failing node.

    • Sync Gateway runs Replication Id 1 on all nodes in cluster.

    • The system ignores the opt-out flag (sgreplicate_enabled).

    Example 3. Heterogenous configuration
    Scenario
    • The cluster comprises three Sync Gateway nodes

    • Both Node 1 and Node 2 are configured to run Replication Id 1

    • Node 3 is configured to run Replication Id 2 but not Replication Id 1

    • Sync Gateway automatically distributes Replication Id 1 and Replication Id 2 so that each runs on one of Node 1, Node 2 or Node 3, with no node running both replications simultaneously.

    • If any node fails whilst running either replication, Sync Gateway elects a non-failing node to continue that replication on. Where two nodes remain the node not running a replication will be chosen.

    • Sync Gateway runs Replication Id1 on Node 1 and Node 2 in the cluster

    • Sync Gateway runs Replication Id 2 on Node 3

    Note:

    • If Node 3 fails, then Replication Id 2 will not be continued on either of the remaining nodes as it is not configured on them

    • Similarly, if either or both of the other nodes (Node 1 and Node 2) fails, Node 3 will not be a candidate to run the corresponding replication.

    Example 4. Adding more nodes
    Scenario
    • The cluster comprises a single Sync Gateway node

    • Node 1 is configured to run Replication Id 1 and Replication Id 2

    • LATER . . . Node 2 is added to the cluster to run Replication Id 1 and Replication Id 2.

    • Sync Gateway designates Node 1 run both Replication Id 1 and Replication Id 2

    • LATER . . . when Node 2 is added . . .

      • Sync Gateway select one of the Node 1 replications to run on Node 2; let’s say it chooses Replication Id 2

      • Sync Gateway stops Replication Id 2 on Node 1

      • Sync Gateway starts Replication Id 2 on Node 2.

    • Sync Gateway designates Node 1 to run both Replication Id 1 and Replication Id 2

    • WHEN . . . Node 2 is added . . . Sync Gateway designates it to run both Replication Id 1 and Replication Id 2

    Example 5. Failing node
    Scenario
    • The cluster comprises three Sync Gateway nodes with a homogeneous configuration

    • All three nodes are configured to run Replication Id 1, Replication Id 2 and Replication Id 3

    • LATER . . . Node 3 goes down

    Sync Gateway automatically distributes the replications, one to each of the nodes

    • Lets assume the following distribution:

      • Node 1 runs Replication Id 1

      • Node 2 runs Replication Id 2

      • Node 3 runs Replication Id 3

    WHEN . . . Node 3 goes down . . . Sync Gateway elects either Node 1 or Node 2 to continue running Replication Id 3

    Sync Gateway runs all three replications (Replication Id 1 , Replication Id 2 and Replication Id 3) on all three nodes in the cluster (Node 1, Node 2 and Node 3)

    WHEN . . . Node 3 goes down . . . Node 1 and Node 2 continue to run Replication Id 1, Replication Id 2 and Replication Id 3

    Monitoring Node Distribution

    Use the _replicationStatus endpoint to access information about which replications are running on which nodes — see: _replicationStatus(replicationID) | _replicationStatus(replicationID)?action={action}

    This information is also collected and available in the log files.

    Conflict Resolution

    Automatic Conflict Resolution

    Inter-Sync Gateway replication supports automatic conflict resolution to resolve conflicting document changes.

    It delivers this by applying one of its built-in conflict resolver policiesglossary icon, which can be easily included in your own replications.

    The goal of automatic conflict resoluton is to return a winning revision based on the consistent application of the configured conflict resolver policyglossary icon.

    The default conflict resolver policy is to always returns a winner determined by the automatic conflict resolution policyglossary icon.

    For ENTERPRISE EDITION, a Custom Conflict Resolver policy is available, providing additional flexibility by allowing users to provide their own conflict resolution logic.

    Conflict Response on Active Replicator

    As soon as the active Sync Gateway database detects a conflict in a replicated document revision, it initiates its configured conflict resolver policy to determine a winning revision. This policy assesses the conflicting revisions and either determines the winning revision or returns an error if it fails while doing so.

    Conflict Response by Passive Replicator

    When a passive Sync Gateway database detects a conflict it responds to the active with a 409 response and rejects the revision. It is expected that the active Sync Gateway will pull the conflicting revision from the passive, resolve it on the active, and then subsequently push the resolved conflict back up.

    How Resolution Works

    Pull Replications

    For Pull replications the active Sync Gateway is responsible for detecting and resolving conflicts based on the configured conflict_resolution_type  — see configuration item: conflict_resolution_type.

    This is also how conflicts are handled when Couchbase Lite clients pull down documents to Sync Gateway.

    Note: Resolved conflicts are only transferred from active to passive Sync Gateways if a replication is setup between them.

    Push Replications
    Passive Sync Gateway

    The passive Sync Gateway will automatically detect and reject conflicting revisions being pushed to it.

    Note that conflicts are not resolved. The revision is rejected and the document returned — with a 409 Conflict — response to the active Sync Gateway.

    Active Sync Gateway

    It is the responsibility of the active sync Gateway to address rejected revisions in accordance with its specified conflict_resolution_type.

    This approach is the same as that adopted when Couchbase Lite clients push documents to Sync Gateway.

    Configure Conflict Resolution

    Invoke automatic conflict resolution by specifying the required conflict resolver policy in the replication definitionglossary icon. The specified policy is applied whenever a conflict is detected.

    Example 6. Using automatic conflict resolution
    • default

    • localWins

    • remoteWins

    "databases:"
      // other config as necessary
      "this_db:"
        // other config as necessary
        "sgreplicate_enabled": "true",
        "replications": [
            {
              "replication_id": "replication1",
              "direction": "push_and_pull",
              "continuous": true,
              "filter": "sync_gateway/bychannel",
              "query_params": [
                  "channel1",
                  "channel2"
              ],
              "conflict_resolution_type": "default",
              // other config as necessary
            }
        ]
    // other config as necessary
    "databases:"
      // other config as necessary
      "this_db:"
        // other config as necessary
        "sgreplicate_enabled": "true",
        "replications": [
            {
              "replication_id": "replication1",
              "direction": "push_and_pull",
              "continuous": true,
              "filter": "sync_gateway/bychannel",
              "query_params": [
                  "channel1",
                  "channel2"
              ],
              "conflict_resolution_type": "localWins",
              // other config as necessary
            }
        ]
    // other config as necessary
    "databases:"
      // other config as necessary
      "this_db:"
        // other config as necessary
        "sgreplicate_enabled": "true",
        "replications": [
            {
              "replication_id": "replication1",
              "direction": "push_and_pull",
              "continuous": true,
              "filter": "sync_gateway/bychannel",
              "query_params": [
                  "channel1",
                  "channel2"
              ],
              "conflict_resolution_type": "remoteWins",
              // other config as necessary
            }
        ]
    // other config as necessary

    Build a Conflict Resolution Policy [EE]

    This content relates only to ENTERPRISE EDITION

    Overview

    Custom conflict resolution is handled by the active Sync Gateway using a user-provided custom conflict resolverglossary icon. This Javascript function is embedded in the replication configuration.

    The predefined conflict resolver policies are also available as Javascript functions that you can call from within that custom_conflict_resolver function This is useful when you want to apply greater selectivity to the automatic conflict resolution process. For example, you want to apply a 'remote wins' policy only for a specific type of document - see the 'Use Policies' tab in Example 9.

    Conflict Resolution Approaches

    There are two ways to handle conflicts in your custom_conflict_resolver, you can either:

    • Choose a winning revision from among the conflicting revisions (see Example 9), or

    • Merge conflicting revision to create a new winning revision; losing revisions are tomb-stoned.

      However, users should avoid overly-complex resolver logic that may impact performance.

    Approaches to Error Handling

    Your custom conflict resolver function should not terminate the replication when it encounters exceptions or errors. Instead, you should log sufficient information to aid troubleshooting and recovery.

    For example, your custom conflict resolver function should:

    • Skip the document causing the issue

    • Log a suitable warning level message. Include at least the skipped document’s Id and the sequence Id of the revision in error.

    Refer to log files when troubleshooting conflict resolution errors, to identify the document id and revision sequence in error.

    Example 7. Some Error Scenarios and Recommended Resolutions
    Unexpected data in the remote document

    You should update the remote document to fix the issue. Doing so will cause replication of the update.

    Unexpected data in the local document

    You should update the local document to fix the issue. This will not trigger a pull-replication. Do a no-op-update [2] of the remote document, which will trigger replication and conflict resolution.

    Fault in conflict resolution javascript function

    Fix the Javascript logic and then either:

    • Do a no op update [2] of the remote document. This triggers a pull replication and subsequent conflict resolution.

    • Reset the replication (using _replicationstatus/reset endpoint). Not recommended as it introduces significant duplicate processing in resyncing previously synced documents.

    Conflict Resolver Structure

    This example shows the basic structure of the conflict resolver function as it would be defined in the configuration file.

    Example 8. Conflict resolver structure
    
    "custom_conflict_resolver": "`function(conflict) { (1)
      //  . . .
      //  . . . application logic to determine winner
      //  . . .
      return conflict.LocalDocument;  (2)
    }`" (3)
    1 The conflict structure comprises both conflicting documents.
    type Conflict struct {
    	LocalDocument  Body `json:"LocalDocument"`
    	RemoteDocument Body `json:"RemoteDocument"`
    }
    LocalDocument

    This LocalDocument object encapsulates the body and metadata of the local conflicting document revision being replicated. Its content matches the JSON stored at the local Sync Gateway.

    RemoteDocument

    The RemoteDocument object, encapsulates the body and metadata of the remote conflicting document revision being replicated. Its content matches the JSON stored at the remote Sync Gateway.

    2 You should return one of:
    • conflict.LocalDocument

    • conflict.RemoteDocument

    • a new document body comprising the merged local and remote documents

    • a nil body, which will be resolved as a delete

    3 The conflict resolver function is enclosed by backticks (``)

    Sample Conflict Resolvers

    Example 9. Simple conflict resolvers
    • Use Built-in Policies

    • Nominate a Winner

    • Merge a Winner

    This example uses the built-in resolver functions to resolve the conflict based-on the document type.

    So, documents of type a-doc-type-1 are always resolved in favor of the remote revision. All other document types are resolved in accordance with the default resolver policy.

    "replications": [
      {
      "replication_id": "replication1",
      // other config as required
      "conflict_resolution_type": "custom",
      "custom_conflict_resolver": `
        function(conflict) {
          if  (conflict.LocalDocument.type == "a-doctype-1") &&
              (conflict.RemoteDocument.type == "a-doctype-1")
           {
             // Invoke the built in default resolver logic
             return defaultPolicy(conflict);
           }
          else {
            // Otherwise resolve in favor of remote document
              return conflict.RemoteDocument;
            }
        }
        `
      // other config as required
      }
    ]

    This example selects a winner based on relative priorities and builds a return response of its own rather than using either the localWins or remoteWins policy, although it does rely on the default resolver policy as a backstop.

    "replications": [
      {
        // . . . preceding replication details as required
      },
      {
        "replication_id": "replication2",
        // . . .   other config as required
        "conflict_resolution_type": "custom",
        "custom_conflict_resolver": `
          function(conflict) {
            // Custom conflict resolution policy based on priority
            if (conflict.LocalDocument.body.priority > conflict.RemoteDocument.body.priority) {
              // Choose a local winner
              // Optionally apply application logic to manipulate
              // the local object before returning it as the winner
              return conflict.LocalDocument;
            } else if (local.body.priority < remote.body.priority) {
                // Choose a remote winner
                // Optionally apply application logic to manipulate
                // the remote object before returning it as the winner
              return conflict.RemoteDocument;
              }; //end if
            } //end func()
            // Apply the default policy as a catch all
            return defaultPolicy(conflict);
        }` // end resolver property
      }, // end replication2
      {
        // . . . further replication details as required
      }
    ]
    // . . .   other config as required

    This example creates a winner by merging changes from the local and remote documents to create a new document object, which is returned as the winner.

    If both document.types are non-null and the local document.type is usedefault, the merge path is overridden and the default resolver policy is applied.

    "custom_conflict_resolver":`
      function(conflict) {
          if (  (conflict.LocalDocument.type != null) &&
                (conflict.RemoteDocument.type != null) &&
                (conflict.LocalDocument.type == "usedefault"))
          {
              console.log("Will use default policy");
              // Resolve using built-in policy
              return defaultPolicy(conflict);
          }
          else
          {
            // Merge local and remote docs
            var remoteDoc = conflict.RemoteDocument;
            console.log("full remoteDoc doc: "+JSON.stringify(remoteDoc));
            var localDoc = conflict.LocalDocument;
            console.log("full localDoc doc: "+JSON.stringify(localDoc));
            var mergedDoc = extend({}, localDoc, remoteDoc);
    
            console.log("full mergedDoc doc: "+JSON.stringify(mergedDoc));
            // Resolve using this merged doc as the winner
            return mergedDoc;
    
            function extend(target) {
                var sources = [].slice.call(arguments, 1);
                sources.forEach(function (source) {
                    for (var prop in source) {
                        target[prop] = source[prop];
                    }
                });
                return target;
            } // end function extend()
          } // end if
      }` // end function()

    1. This architecture is also known as ship-to-shore or hub-and-spoke.
    2. No-op update — refers to a change to the document body that has no impact on the app logic but will trigger an import by the Sync Gateway. One option could be to include a property used specifically for this purpose (i.e. a counter that can be incremented in response to conflict resolver errors).