Initialize Inter-Sync Gateway Replications
Initializing and running inter-Sync Gateway replication
Other Topics: Legacy Pre-3.0 Configuration | Admin REST API
Context Clarification
This content relates only to inter-Sync Gateway replication in Sync Gateway 2.8+. For documentation on pre-2.8 inter-Sync Gateway replication (also known as SG Replicate) — see the documentation for the appropriate release. |
Introduction
Replications are initialized by submitting a replication definition using either:
-
A 'JSON' configuration file (
sync-gateway-config.json
) -
The Admin REST API, using a utility such as
curl
, or an application such as Postman.
Wherever they are defined, the elements of a replication definition are the same, with the exception of the
adhoc
Admin REST API endpoint used to specify that the replication is ad hoc [1].
-
Replication highlights
-
Running highlights
-
There are two types of replication: persistent and ad hoc (REST API only).
-
Replications of both types can run in one-shot or continuous replications modes.
-
All replications involve at least one local database.
-
Replications can be configured to purge documents when channel access is revoked (a removal notification is received).
-
Persistent continuous replications can be:
-
Reset — a checkpoint can be reset to zero
-
Updated — only the parameter values provided in the PUT request body will be updated
-
-
Persistent and ad hoc replications can be:
-
Removed — only the replication_id is needed to delete ongoing continuous or one-shot replications.
-
-
ENTERPRISE EDITION only:
-
Replications can use delta-sync mode, whereby only the changed data-items are replicated.
-
-
Multiple identical replicators can be initiated on a Sync Gateway node provided each has a unique
replication_Id
. -
inter-Sync Gateway replications introduced in Sync Gateway 2.8 as well as SG-Replicate can run on the same node, but you must ensure that they each have a different
replication_id
. -
The user under which replication is being run must have read and write access to the data being replicated.
-
Exponential backoff when connection lost; this can be customized using the max_backoff_time configuration setting.
-
replications will continue trying to connect for 30 minutes following authentication failure (including user-invalid/doesn’t exist).
-
Running replications can be stopped. Stopped replications can be (re)Started.
-
If ALL the Sync Gateway nodes in a source or target Sync Gateway cluster go down in the middle of continuous replication, by default, the system should pick up from the last document that was successfully processed by both sides when the replication/cluster is restarted
-
REST ONLY
-
POST databases/{tkn-db}/_replication creates a replication using the {rep-id} specified in the body or if none specified, a unique UUID.
-
PUT databases/{tkn-db}/_replication/{rep-id} upserts replication {rep-id}.
-
-
ENTERPRISE EDITION only:
-
Replications are distributed even across all available Sync Gateway nodes and so are not guaranteed to run on their originating node.
-
If a multi-node Sync Gateway cluster loses a subset of sync gateway nodes, the remaining nodes continue replication uninterrupted IF they have been configured to handle the replication (continuous and one-shot replications).
-
Replication Definition
All replications are 'initialized' by a replication definition in the configuration file or Admin REST API and operate within the context of a local database.
-
Configured replications use the
database.{db-name}.replications
property to add a replication definition to a local database. -
REST API replications specify the local database and replication identity in the API POST/PUT request. Providing the replication definition parameters in the request body as a JSON string.
Both scenarios are covered in Example 2. It summarizes the replication definition elements[2], which are covered in more detail in Database Configuration.
Database-level Settings
A number of database-level options are also especially relevant to Inter-Sync Gateway Replication, including:
-
sgreplicate_enabled — use this ENTERPRISE EDITION setting to allow the database to participate in Inter-Sync Gateway Replications.
-
database.delta_sync — use this setting to enable delta-sync replication on the database, it must be set if you want to use delta-sync in your replication definition.
-
sgreplicate_websocket_heartbeat_secs — use this setting to override the default (5 minute) heartbeat interval for websocket ping frames for this database.
-
database.sync — use this setting to specify the sync function logic — this is an essential part of access-control.
-
unsupported.sgr_tls_skip_verify — use this unsupported option to make development an testing easier by skipping verification of TLS certificates.
Replication-level Settings
-
Summary of Parameters
-
Configured Example
-
REST API Example
This table summarize all the available configurable items.
Data schema for the replication model
Name | Description | Schema |
---|---|---|
adhoc |
" About Use the Admin REST API’s Behavior Ad hoc replications behave the same as normal replications, but they are automatically removed when their status changes to stopped. This will usually be on completion, but may also be as a result of user action. Constraints This parameter is NOT available to configured replications; only those initialized using the Admin REST API." |
boolean |
batch_size |
About Use the optional |
integer |
cancel |
About Use this parameter on,y when you want to want to cancel an existing active replication. Constraints * This parameter is NOT available in configured replications; only those initialized using the Admin REST API. * NOTE that the body of the request must be the same as the replication’s replication definition for the cancellation request to be honoured.
For example, if you requested continuous replication, the cancellation request must also contain the continuous field. |
boolean |
conflict_resolution_type |
About The The default behavior is that automatic conflict resolution policy is applied. Valid options
- Behavior * default - Selecting * remoteWins - Selecting Example ---- "conflict_resolution_type":"remoteWins" ---- Constraints * Replications created prior to version 2.8 will default to |
string |
continuous |
About The Behavior * Constraints * Optional for stops and removes |
boolean |
custom_conflict_resolver |
About The optional Options The property is mandatory when Using Provide the required logic in a Javascript function, as a string within backticks (see also the description for the The function takes one parameter The function returns a document Example ---- "custom_conflict_resolver":` function(conflict) { console.log("full remoteDoc doc: "+JSON.stringify(conflict.RemoteDocument)); return conflict.RemoteDocument; }` ---- Constraints Using complex |
string |
direction |
About The mandatory The property value is referenced by the remote property. Behavior * Constraints Replications created prior to version 2.8 derive their direction from the source/target url of the replication. |
string |
enable_delta_sync |
About The optional Options * Behavior The optional * If Constraints * Applies ONLY to Enterprise Edition deployments.
* Depends upon the setting of the database level parameter |
boolean |
filter |
About Use the optional Options A common value used when replicating from Sync Gateway is `sync_gateway/bychannel.
This option limits the pull replication to a specific set of channels.
You can specify the required channels using Behavior Works in conjunction with Example ---- "filter":"sync_gateway/bychannel" ---- Constraints OPTIONAL for stops and removes (even if defined during creation) |
string |
initial_state |
About The optional Behavior All replications are configured to start on Sync Gateway launch. So, if omitted, the state defaults to 'Running'. Constraints* Replications created prior to version 2.8 will all default to a state of 'Running'. |
string |
max_backoff_time |
The *max_backoff_time*property specifies the time-period (in minutes) during which Sync Gateway will attempt to reconnect lost or unreachable remote targets. On disconnection, Sync Gateway will do an exponential backoff up to the specified value, after which it will attempt to reconnect indefinitely every max_backoff_time minutes. If a zero value is specified, then Sync Gateway will do an exponential backoff up to an interval of five minutes before stopping the replication. NOTE - this value defaults to five minutes for replications created prior to version 2.8. |
integer |
password |
About Use Behavior These details are used to authenticate credentials and approve access to data. Once provided and recorded, the password data is redacted and will not be displayed in either the configuration file or Admin REST API. A string of |
string |
purge_on_removal |
About The optional Options
- Behavior If Constraints * Applies only to PULL replications, including the PULL portion of a PUSHANDPULL replication. * Replications created prior to version 2.8 must be run with |
boolean |
query_params |
About The Behavior This property works in conjunction with Using You can use Example [source,json] ---- "filter":"sync_gateway/bychannel", "query_params": { "channels":["channel.user1"] }, ---- Constraints OPTIONAL for stops and removes (even if defined during creation) |
< string > array |
remote |
About The remote property represents the endpoint of a database for the remote Sync Gateway. That is, it identifies the remote Sync Gateway database that is the subject of this replication’s push, pull or pushAndPull action. Typically the endpoint will include URI, Port and Database name elements. You can also include user credentials in the URL, in the form Example `"remote": "http://user:password@example.com:4985/db1-remote" ` Format * a string containing a valid URL for a (remote) Sync Gateway database. * an object whose url property contains the Sync Gateway database URL. Behavior Dependent upon setting of direction. If direction is : - pull, 'remote' defines the remote cluster from which data is pulled - push, 'remote' defines the remote cluster to which data is pushed - pushAndPull, 'remote' defines the push configuration. Example [source,json] ---- "remote": "http://www.example.com:4984/sample-database", ---- |
string |
replication_id |
About The replication_id property specifies either: * For NEW replications, the ID to be assigned to the the replication. If no replication_id is specified, Sync Gateway will assign a random UUID to new replications. * For existing replications, this is the ID of the required replication. * If cancel=true, this is the id of the active replication task to be cancelled. Constraints If this is specified in the body of a POST or PUT request then it must be the same value as specified in the request URL. |
string |
username |
About Use Behavior These details are used to authenticate credentials and approve access to data Once provided and recorded, the username data is redacted and will not be displayed in either the configuration file or Admin REST API. A string of |
string |
This is an example of a replication definition. Its purpose is to illustrate all configurable items in use and so should not be considered a working example.
It creates a replication with the replication_ID of db1-rep-id1-pull-oneshot
on a local database db1-local_
, pulling data from a remote database db1-remote
.
"databases": {
" db1": { (1)
"bucket":"db1",
"server": "couchbase://cb-server",
// ... other DB config ..
"sgreplicate_enabled": true, (2)
"replications":
"db1-rep-id1-pull-oneshot": (3)
"direction": "pull", (4)
"remote": "https://example.com:4984/remote_db1",
"user": "user1", (5)
"password": "password",
"batch_size": 1000, (6)
"conflict_resolution_type": "custom", (7)
"custom_conflict_resolver": "", (8)
"continuous": false, (9)
"enable_delta_sync": false, (10)
"filter": "sync_gateway/bychannel", (11)
"query_params": ["channel.user1"] (12)
"max_backoff_time": 5, (13)
"purge_on_removal": false (14)
"state": "running" (15)
}
}
1 | All replications are defined at database level within the context of a local database (e.g. DB1 ) |
2 | Opt in to replication |
3 | Define the replication_id |
4 | Pull changes from the remote database at the specified url. |
5 | Authenticate with the provided credentials. This user must have read and write access to both the local and remote databases. |
6 | Batch together up to 1000 revisions at a time. This improve replication performance but consumes more memory resources. |
7 | Apply a custom conflict resolution policy. |
8 | Provide a working Javascript function to apply the required resolution policy. |
9 | By setting continuous=false, we are creating a one-shot replication. We could also have omitted this parameter as it defaults to false . |
10 | Don’t use delta-sync; the default behavior. |
11 | Filter documents by channel. |
12 | Replicate only those documents tagged with the channel names "user1". |
13 | Wait no more than 5 minutes between retries after network failure; default behavior. |
14 | Don’t purge following removal of a channel; the default behavior. |
15 | Start the replicator immediately and on Sync Gateway node re(start);. We could also have omitted this parameter as this is the default behavior. |
This is an example of a replication definition as you might submit it to the Admin REST API.using curl
.
Its purpose is to illustrate all configurable items in use and so should not be considered a working example.
It creates a replication with the replication_ID of db1-rep-id1-pull-oneshot
on a local database db1-local_
, pulling data from a remote database db1-remote
.
curl --location --request POST 'http://localhost:4985/db1-local/_replication/db1-rep-id1-pull-oneshot' \ (1)
--header 'Content-Type: application/json' \
--dataraw '{
"replication_id": "db1-rep-id1-pull-oneshot" (2)
"direction": "pull", (3)
"remote": "https://example.com:4984/remote_db1",
"user": "user1", (4)
"password": "password",
"batch_size": 1000, (5)
"conflict_resolution_type": "custom", (6)
"custom_conflict_resolver": "", (7)
"continuous": false, (8)
"enable_delta_sync": false, (9)
"filter": "sync_gateway/bychannel", (10)
"query_params": ["channel.user1"] (11)
"max_backoff_time": 5, (12)
"purge_on_removal": false (13)
"state": "running" (14)
"adhoc": false, (15)
"cancel": false (16)
}'
1 | All replications take place at database level and in the context of a local database. Here we are setting the replication in the context of db1-local |
2 | Define the replication_id |
3 | Pull changes from the remote database at the specified url. |
4 | Authenticate with the provided credentials. This user must have read and write access to both the local and remote databases. |
5 | Batch together up to 1000 revisions at a time. This improve replication performance but consumes more memory resources. |
6 | Apply a custom conflict resolution policy. |
7 | Provide a working Javascript function to apply the required resolution policy. |
8 | By setting continuous=false, we are creating a one-shot replication. We could also have omitted this parameter as it defaults to false . |
9 | Don’t use delta-sync; the default behavior. |
10 | Filter documents by channel. |
11 | Replicate only those documents tagged with the channel names "user1". |
12 | Wait no more than 5 minutes between retries after network failure; default behavior. |
13 | Don’t purge following removal of a channel; the default behavior. |
14 | Start the replicator immediately and on Sync Gateway node re(start);. We could also have omitted this parameter as this is the default behavior. |
15 | Setting adhoc=false marks this as a persistent replication. The definition will survive Sync Gateway node restarts. This the default behaviour if this parameter is omitted.+ |
16 | Set cancel=true to cancel an initialized replication; otherwise you can omit this parameter. |
Generic Constraints
Replication
All active nodes in an active cluster must be running Sync Gateway version 2.8+. |
- ENTERPRISE EDITION
-
All replications are distributed evenly across available nodes. This means they cannot be guaranteed to run on the node from which they originate.
- Access rights
-
The user running the replication must have read and write access to the data being replicated. This is not enforced by the system. Use your
sync
function to ensure a consistent approach is applied across all clusters. - Mixing Inter-Sync Gateway Replication Versions
-
Versions of inter-Sync Gateway replications pre- and post-2.8 can legitimately be in use at the same time, especially during transition. However, you should avoid initializing identical pre-2.8 (SG Replicate) and 2.8+ replications.
Running Configured Replications
Replications in the configuration file start automatically whenever Sync Gateway is (re)started.
Unless you inhibit this by adding an "initial_state": "stopped"
parameter to the replication definition — see: initial_state.
You can manually start 'stopped' replication using Starting a replication.
-
Continuous
-
One-shot
// . . . other configuration entries
"db1-rep-id1-pull-cont":
"replication_id": "db1-rep-id1-pull-cont",
"direction": "pull",
"continuous": true (1)
"purge-on-removal": true,
"remote": "http://user:password@example.com:4985/db1-remote",
"filter":"sync_gateway/bychannel",
"query_params": {
"channels": ["channel1.user1"]
}
// . . . other configuration entries
1 | Make this a continuous replication that remains running, listening for changes to process.
Because it is also persistent, it will start automatically following Sync Gateway node restarts (state defaults to running ). |
2 | The remote URL can also include the credentials for an existing Sync Gateway user on the remote server. |
//. . . other configuration entries
"db1-rep-id3-pull-oneshot":
"replication_id": "db1-rep-id3-pull-oneshot", (1)
"direction": "pull",
"remote": "http://user1:password@example.com:4985/db1-remote", (2)
"filter": "sync_gateway/bychannel",
"query_params": { "channels": ["channel1.user1"] }
// . . . other configuration entries
1 | This a a one-shot replication because the continuous parameter defaults to false . |
2 | The remote URL can also include the credentials for an existing Sync Gateway user on the remote server. |
Running Admin REST API Replications
Replications initialized by sending a POST
, or PUT
, request to the _replication
endpoint will start running automatically, unless the "initial_state": "stopped"
parameter is specified. with a JSON object defining the replication parameters — as shown in Example 4.
You can run multiple replications simultaneously with different replication topologies, provided both databases being synchronized have the same sync function.
You can submit requests using the curl
utility (as in these examples) or an application such as Postman.
-
Continuous Pull Replication
-
One-shot
-
Ad-hoc
This example initializes a persistent, continuous, replication between a local database and one on a remote Sync Gateway instance.
curl --location --request POST 'http://localhost:4985/db1-local/_replication/' \
--header 'Content-Type: application/json' \
--dataraw '{
"replication_id": "db1-rep-id1-pull-cont",
"direction": "pull",
"continuous": true (1)
"purge-on-removal": true,
"remote": "http://user:password@example.com:4985/db1-remote",
"filter":"sync_gateway/bychannel",
"query_params": {
"channels": ["channel1.user1"]
}
}'
This example initializes a persistent, one-shot, replication between a local database and one on a remote Sync Gateway instance.
The replication will run once, after a short delay to allow the Rest API to start.
It will then run once after each Sync Gateway restart and-or when manually initiated using the _replicationStatus
endpoint — see Inter Sync Gateway Sync - Manage.
curl --location --request POST 'http://localhost:4985/db1-local/_replication/' \
--header 'Content-Type: application/json' \
--dataraw '{
"replication_id": "db1-rep-id3-pull-oneshot", (1)
"direction": "pull",
"remote": "http://user1:password@example.com:4985/db1-remote", (2)
"filter": "sync_gateway/bychannel",
"query_params": { "channels": ["channel1.user1"] }
}'
1 | This a a one-shot replication because the continuous parameter defaults to false . |
2 | The remote URL can also include the credentials for an existing Sync Gateway user on the remote server. |
curl --location --request POST 'http://localhost:4985/db1-local/_replication/' \
--header 'Content-Type: application/json' \
--dataraw '{
"replication_id": "db1-rep-id1-pull-adhoc",
"adhoc": true, (1)
"direction": "pull",
"purge-on-removal": true,
"remote": "http://user:password@example.com:4985/db1-remote",
"filter":"sync_gateway/bychannel",
"query_params": {
"channels": ["channel1.user1"]
}
}'
1 | Run this replication as an ad hoc one. It will run once only, process all changes but not survive Sync Gateway restarts |