Revisions
About Sync Gateway’s use of Revisions, Revision Trees and Revision Caches.
Revisions are at the heart of Couchbase Mobile’s ability to respond flexibly and securely to changing data from server to edge.
Introduction
Generation
Documents and buckets (collections of, usually related, documents) are the basic units of data within Couchbase.
Remember that within Couchbase Mobile, each document comprises:
-
A Document ID
-
A current revision ID
-
A JSON body
-
Metadata
Binary data such as images, audio and other multimedia objects are stored separately from the document in an entity known as a blob (or attachment).
Each change to a document (even its creation and deletion) is recorded as a revisions. Changes to blobs do not generate revisions.
Format
Couchbase creates a revision whenever a document is created, updated or deleted. Each revision is given a unique Revision ID in addition to the Document ID.
The revisions are contained within a document’s metadata, as a revision tree.
Sync Gateway uses a revision id to resolve conflicts arising when making concurrent changes to replicated copies of distributed data. It comprises two parts:
-
A generation ID
This is a sequential auto-incrementing number. It is specific to the database on which the document resides. Couchbase Lite generates simple integers. Sync Gateway generates more complex long base64 values.
The contents of remote revision IDs are implementation dependent. Do not base any processing logic on their contents. -
A hash derived from the document contents
Structure
The revisions for each document form a revision tree within its metadata.
This revision tree comprises all revisions made to the document throughout its lifetime to date, in sequence. The current revision (the most recent version of the document) being the tip of the tree, the leaf node.
A revision tree’s growth is unlimited. So Couchbase periodically removes obsolete revisions to maintain performance levels. This process is known as Revision Pruning.
Revision Pruning
- In the section
Pruning is the process of removing obsolete revisions. It automatically runs whenever a new revision is generated.
Use the Admin Rest API endpoint for Database Configuration to provision any configuration changes to properties described in this content. |
Algorithm
Although fundamentally the same, the pruning algorithm works slightly differently between Sync Gateway and Couchbase Lite.
On Sync Gateway, the pruning algorithm is applied to the shortest, non-tombstoned branch in the revision tree.
The algorithm allows the branch to retain a configurable number of revisions (revs_limit) and removes all older revisions.
Controls
You can vary the number of retained revisions using the Configuration File’s revs_limit setting.
So, for example, with a revs_limit
of 1,000 the algorithm will keep the last 1,000 revisions in the shortest non-tombstoned branch and remove any others from that branch.
Do not set |
The default and minimum values of revs_limit
are dependent on whether
allow conflicts
is set True or False — see Table 1.
The process to remove obsolete revisions is called pruning and runs automatically every time a revision is added. Although fundamentally the same, the pruning algorithm works slightly differently between Sync Gateway and Couchbase Lite. On Sync Gateway, the pruning algorithm is applied to the shortest, non-tombstoned branch in the revision tree.
If there are conflicting revisions, the document may end up with disconnected branches after the pruning process.
In the animation below, the document has a conflicting branch (revisions 4'
- 1001'
).
When the shortest branch (in this case the conflicting branch) reaches the 1003rd update, it gets is cut off.
The revision tree is not in a corrupted state and the logic that chooses the winning revision still applies.
But it may make it impossible to do certain merges (n-way merge) to resolve conflicts and will occupy disk space that could have been freed if the conflict was resolved early on.
If the revision tree gets into this state then the only option to resolve the conflict is to pick a winning branch and tombstone all the non-winning conflicting branches.
Setting the revs_limit to a value below 100 when allow_conflicts = true may adversely affect the conflict resolution process, as there may be insufficient revision history to resolve a given conflict.
|
Release |
Revs Limit |
Allow Conflicts setting |
|
---|---|---|---|
True |
False |
||
2.6+ |
default |
+ 100 |
+ 50 |
minimum |
+ 20 |
+ 1 |
|
2.0-2.5 |
default |
+ 100 |
+ 1000 |
minimum |
+ 50 |
+ 1 |
|
1.x |
default |
+ 1000 |
+ 1000 |
minimum |
+ 20 |
+ 20 |
Constraints
The default and minimum values of revs_limit
are dependent on whether allow_conflicts is True or False.
The presence of multiple unresolved conflicts in a revision tree can impair the pruning process. It may result in obsolete revisions not being pruned or in the premature pruning of revisions.
Learn More
To learn more about revision pruning and database size management in general see our blog: Pruning — Managing DB Sizes in Couchbase Mobile.
Caching
- In this section
-
Control | Size | Sharding | Delta Sync | Disabling the Cache
Whenever a document is accessed its revision tree (or at least some portion of its revision tree) is also cached.
Control
You can control the size of the revision cache using the database.cache.rev_cache settings within the configuration file, specifically:
Size
Use the rev_cache.size setting to specify the total number of document revisions to be cached in memory for all (recently accessed) documents.
When the revision cache is full, Sync Gateway will remove older document revisions to make room for newer ones.
By adjusting this setting you can fine-tune Sync Gateway’s memory consumption. This can be useful when working on servers with limited memory and in cases when Sync Gateway creates and-or updated many new documents relative to the number of read operations.
Sharding
This content relates only to ENTERPRISE EDITION |
The Community Edition is configured with the default value and ignores any rev_cache.shard_count value in the configuration file.
You can control the number of shards into which Sync Gateway will split its revisions cache by using the rev_cache.shard_count
More shards means lower cache contention when accessing distinct revisions, at the cost of some memory overhead per-shard.
Do not change the default database.cache.rev_cache.shard_count unless advised to do so by Couchbase Support — see: Couchbase Support Policy. |
Delta Sync
This content relates only to ENTERPRISE EDITION |
When executing a write operation with delta_sync enabled the revision body is backed up in the bucket and retained for database.delta_sync.rev_max_age_seconds, during which time it is available for the calculation of future revision deltas.
As a result, new deltas can only be generated for read requests that come in within the database.delta_sync.rev_max_age_seconds time window.
Storing backed up revision bodies for delta sync incurs additional bucket storage, the size of which equates to:
(doc_size * updates_per_day * rev_max_age_seconds) / 86400
— see Example 1.
Enabling delta sync would take up an additional 400 KB of storage on Couchbase Server, assuming:
-
An average document size of 4 KB
-
100 writes/day
-
The default {
rev_max_age_seconds
} value
Equating to: (4 * 100 * 86400)/86400 = 400 KB
Setting database.delta_sync.rev_max_age_seconds to zero will generate deltas opportunistically on pull replications, with no additional storage requirements.
Disabling the Cache
This content relates only to ENTERPRISE EDITION |
Disabling the revision cache can be useful when there are very large documents or if you expect a very low cache hit rate. Otherwise it can negatively impact the latency of replications.
Do not disable the revision cache, unless advised to do so by Couchbase Support — see: Couchbase Support Policy. |
To disable the revision cache entirely, set rev_cache.size to zero. Community Edition ignores a zero setting.
Compacting
Attachments added post 3.0 are automatically removed from the bucket upon reference removal, document delete or document purge. This contrasts with the behavior of Legacy attachments, which can remain in the bucket even after their reference removal, document delete or document purge.
The compaction garbage collection process (/{db}/_compact
) can be used to remove these legacy attachments and reclaim the underlying storage.
You can run the garbage collection process in one of two modes:
-
tombstone
Purges the JSON bodies of non-leaf revisions. -
attachment
Removes redundant legacy attachments.
The legacy attachment compaction process scans all documents in the bucket, removing unreferenced attachments.
See the REST API call endpoint {db}/_compact.