Monitoring
Sync Gateway 2.5 includes a number of new metrics available through the Admin REST API (/_expvars
).
Those new metrics cover performance, resource utilization and health checks of the Sync Gateway nodes which are increasingly important as deployments scale to support a large number of connected Couchbase Lite clients.
The following curl
command requests the monitoring stats on the Admin REST API.
curl -X GET "http://localhost:4985/_expvar" -H "accept: application/json"
The response is a JSON object with the following schema.
{
"syncgateway": {
"global": {
"resource_utilization": {
"admin_net_bytes_recv": 0,
"admin_net_bytes_sent": 0,
"error_count": 0,
"go_memstats_heapalloc": 0,
"go_memstats_heapidle": 0,
"go_memstats_heapinuse": 0,
"go_memstats_heapreleased": 0,
"go_memstats_pausetotalns": 0,
"go_memstats_stackinuse": 0,
"go_memstats_stacksys": 0,
"go_memstats_sys": 0,
"goroutines_high_watermark": 0,
"num_goroutines": 0,
"process_cpu_percent_utilization": 0,
"process_memory_resident": 0,
"pub_net_bytes_recv": 0,
"pub_net_bytes_sent": 0,
"system_memory_total": 0,
"warn_count": 0
}
},
"per_db": {
"$dbname": {
"cache": {
"chan_cache_active_revs": 0,
"chan_cache_hits": 0,
"chan_cache_max_entries": 0,
"chan_cache_misses": 0,
"chan_cache_num_channels": 0,
"chan_cache_removal_revs": 0,
"chan_cache_tombstone_revs": 0,
"num_skipped_seqs": 0,
"rev_cache_hits": 0,
"rev_cache_misses": 0
},
"cbl_replication_pull": {
"attachment_pull_bytes": 0,
"attachment_pull_count": 0,
"dcp_caching_count": 0,
"dcp_caching_time": 0,
"max_pending": 0,
"num_pull_repl_active_continuous": 0,
"num_pull_repl_active_one_shot": 0,
"num_pull_repl_caught_up": 0,
"num_pull_repl_since_zero": 0,
"num_pull_repl_total_continuous": 0,
"num_pull_repl_total_one_shot": 0,
"request_changes_count": 0,
"request_changes_time": 0,
"rev_processing_time": 0,
"rev_send_count": 0,
"rev_send_latency": 0
},
"database": {
"crc32c_match_count": 0,
"dcp_caching_count": 0,
"dcp_caching_time": 0,
"dcp_received_count": 0,
"dcp_received_time": 0,
"doc_reads_bytes_blip": 0,
"doc_writes_bytes": 0,
"doc_writes_bytes_blip": 0,
"num_doc_reads_blip": 0,
"num_doc_reads_rest": 0,
"num_doc_writes": 0,
"num_replications_active": 0,
"num_replications_total": 0,
"sequence_get_count": 0,
"sequence_released_count": 0,
"sequence_reserved_count": 0,
"warn_channels_per_doc_count": 0,
"warn_grants_per_doc_count": 0,
"warn_xattr_size_count": 0
},
"security": {
"auth_failed_count": 0,
"auth_success_count": 0,
"num_access_errors": 0,
"num_docs_rejected": 0,
"total_auth_time": 0
}
}
},
"per_replication": {
"$replname": {
"sgr_docs_checked_sent": 0,
"sgr_num_attachments_transferred": 0,
"sgr_num_attachment_bytes_transferred": 0,
"sgr_num_docs_failed_to_push": 0,
"sgr_num_docs_pushed": 0
}
}
}
}
syncgateway
Monitoring stats
global
Global Sync Gateway stats
resource_utilization
Resource utilization stats
admin_net_bytes_recv
Total number of bytes received on the network interface that the Sync Gateway admin interface is bound to, since start-up.
By default, that is the number of bytes received on 127.0.0.1
since start-up.
Throughput on admin interface = admin_net_bytes_recv
/ admin_net_bytes_sent
admin_net_bytes_sent
Total number of bytes sent on the network interface that the Sync Gateway admin interface is bound to, since start-up.
By default, that is the number of bytes sent on 127.0.0.1
since start-up.
Throughput on admin interface = admin_net_bytes_recv
/ admin_net_bytes_sent
process_cpu_percent_utilization
CPU utilization (%).
The CPU usage calculation is performed based on user and system CPU time and doesn’t include components such as iowait
.
Therefore, process_cpu_percent_utilization
differs from the %Cpu
value returned when running the top
command.
pub_net_bytes_recv
Total number of bytes received on the network interface that the Sync Gateway public interface is bound to, since start-up.
By default, that is the number of bytes received on 0.0.0.0
since start-up.
Throughput on public interface = pub_net_bytes_recv
/ pub_net_bytes_sent
pub_net_bytes_sent
Total number of bytes sent on the network interface that the Sync Gateway public interface is bound to, since start-up.
By default, that is the number of bytes sent on 0.0.0.0
since start-up.
Throughput on public interface = pub_net_bytes_recv
/ pub_net_bytes_sent
$dbname
Stats relative to a database declared in the config file.
cache
Stats relative to caching
abandoned_seqs
The number of skipped sequences that were not found after 60 minutes and were abandoned.
chan_cache_hits
Channel cache requests fully served by the cache.
Channel Cache Hit Ratio = chan_cache_hits
/ (chan_cache_hits
+ chan_cache_misses
)
chan_cache_max_entries
Size of the largest channel cache.
Helps with channel cache tuning, and as a hint on cache size variation (when compared to average cache size).
chan_cache_misses
Channel cache requests not fully served by the cache.
Channel Cache Hit Ratio = chan_cache_hits
/ (chan_cache_hits
+ chan_cache_misses
)
chan_cache_num_channels
Number of channels being cached.
Insight into total number of channels being cached - provides insight into potential max cache size (num channels * max_cache_size), as well as node usage.
chan_cache_removal_revs
The number of removal revisions in the channel cache.
Acts as a reminder that removals must be considered when tuning the channel cache size. Also helps users understand whether they should be tuning tombstone retention policy (metadata purge interval), and running compact.
chan_cache_tombstone_revs
The number of tombstone revisions in the channel cache.
Acts as a reminder that tombstones and removals must be considered when tuning the channel cache size. Also helps users understand whether they should be tuning tombstone retention policy (metadata purge interval), and running compact.
num_skipped_seqs
Number of skipped sequences.
Helps with channel cache tuning, and as a hint on cache size variation (when compared to average cache size).
cbl_replication_pull
dcp_caching_count
This metric can be used to calculate the time between seeing a change on the DCP feed and when it’s available in the channel cache.
DCP cache latency = dcp_caching_time
/ dcp_caching_count
dcp_caching_time
This metric can be used to calculate the time between seeing a change on the DCP feed and when it’s available in the channel cache.
DCP cache latency = dcp_caching_time
/ dcp_caching_count
max_pending
High watermark for number of documents buffered during feed processing, waiting on a missing earlier sequence.
num_pull_repl_active_continuous
Gauge representing the number of continuous pull replications in the active state.
num_pull_repl_active_one_shot
Gauge representing the number of one-shot pull replications in the active state.
num_pull_repl_caught_up
Gauge representing the number of replications which have caught up to the latest changes.
request_changes_count
This metric can be used to calculate the latency of _changes
request.
_changes
request latency = request_changes_time
/ request_changes_count
request_changes_time
This metric can be used to calculate the latency of _changes
request.
_changes
request latency = request_changes_time
/ request_changes_count
rev_processing_time
The total amount of time processing revisions.
This metric can be used with rev_send_count
to calculate the average processing time per revision.
average processing time per revision = rev_processing_time
/ rev_send_count
.
rev_send_count
The total amount of time processing revisions.
This metric can be used with rev_send_count
to calculate the average processing time per revision.
average processing time per revision = rev_processing_time
/ rev_send_count
.
rev_send_latency
In a pull replication, Sync Gateway sends a /_changes
request to the client.
The client responds with the list of revisions that it wants to receive.
rev_send_latency
is measuring the time between the client asking for some revisions via the /_changes
response, and Sync Gateway sending that revision to the client.
Measuring time from the /_changes response means that this stat will vary significantly depending on the changes batch size.
A larger batch size will result in a spike of this stat, even if the processing time per revision is unchanged.
A more useful stat might be the average processing time per revision (rev_processing_time / rev_send_count).
|
database
Stats relative to the database
crc32c_match_count
Count of instances during import when the document cas had changed, but the document body was not changed.
dcp_caching_count
Count of DCP mutations added to Sync Gateway’s channel cache. Can be used with dcp_caching_time to monitor cache processing latency.
dcp_caching_time
Time between DCP mutation arriving at Sync Gateway and being added to channel cache (aggregate).
dcp_received_time
Time between document write and document being received by Sync Gateway over DCP. If the document was written prior to Sync Gateway starting the feed, is measured as the time since the feed was started. Can be used to monitor DCP feed processing latency.
doc_reads_bytes_blip
Total number of bytes read via Couchbase Lite 2.x replication since Sync Gateway startup.
doc_writes_bytes
Total number of bytes written as part of document writes since Sync Gateway startup.
doc_writes_bytes_blip
Total number of bytes written as part of Couchbase Lite 2.x document writes since Sync Gateway startup.
num_doc_reads_blip
Count of the number of documents read via Couchbase Lite 2.x replication since Sync Gateway startup.
num_doc_reads_rest
Count of the number of documents read via the REST API since Sync Gateway startup. Includes Couchbase Lite 1.x replication.
Security
Stats relative to security
auth_failed_count
Number of unsuccessful authentications. Useful to monitor the number of authentication errors.
auth_success_count
Number of successful authentications. Useful to monitor the number of authenticated requests.
num_access_errors
Count of documents rejected by write access functions (requireAccess/requireRole/requireUser).
per_replication
Stats for each replication between Sync Gateway instances declared in the config file.