Monitoring
The connector exposes several performance metrics. Here’s a summary of the most important metrics and how to interpret them.
In its default configuration, the connector runs an embedded web server.
Metrics are exposed in Dropwizard JSON format at http://localhost:31415/metrics?pretty
.
Omit the ?pretty for a more compact response.
|
A simple health check might fetch that URL and assert the response has an HTTP status code of 200 (OK). A more sophisticated check should parse the JSON response and inspect the values of whichever metrics you consider important.
Metrics are also exposed in Prometheus format at http://localhost:31415/metrics/prometheus
.
HTTP Server Configuration
To disable the embedded HTTP server or to use an alternate port, edit the connector config and search for the httpPort
property in the [metrics]
section.
Setting the port to -1
will disable the HTTP server.
Logging Configuration
In addition to exposing metrics over HTTP, by default the connector periodically logs the metrics to $CBES_HOME/logs/cbes-metrics.log
.
In the [metrics]
section of the connector config file you can configure the interval at which metrics written to the log by modifying the logInterval
property.
Metrics logging can be disabled by setting the interval to '0m'
.
The location of the metrics log file is configurable in $CBES_HOME/config/log4j2.xml
.
Metrics Reference
This section describes the metrics exposed by the connector and how to interpret them.
Gauges
A gauge is a single value that represents the current state of something.
cbes.esWaitMs
-
Perhaps the most important metric for implementing a health check, this gauge reports the duration in milliseconds of the current in-flight Elasticsearch bulk request. This includes the time it takes to retry the request, if necessary. An exceptionally long duration might indicate the connector is stalled in a retry loop.
cbes.backlog
-
An estimate of the number of Couchbase document changes yet to be processed. This is a general indication of how well the connector is keeping up with changes in Couchbase. Note that the count only includes changes in the Couchbase partitions handled by this connector instance. This value is dynamic; it goes up when changes happen in Couchbase, and goes down as the changes are processed by the connector.
The cbes.backlog
metric maybe significantly less accurate when the connector is configured to replicate only a specific scope or specific collections. cbes.writeQueue
-
Reports the number of document events currently buffered in memory. (The write queue is implicitly bounded by the
flowControlBuffer
config property which determines the buffer size.)
Meters
A meter records the rate at which an event occurs, and also the total number of occurrences. It reports a mean rate since the connector started, as well as 1-, 5-, and 15-minute exponentially weighted moving averages.
cbes.throughputBytes
-
An estimate of the number of bytes the connector has written to Elasticsearch.
cbes.bulkRetry
-
Recorded whenever an Elasticsearch bulk request is retried due to a temporary failure.
cbes.docWriteRetry
-
Recorded for each document being retried. (For each
bulkRetry
event, one or moredocWriteRetry
events are recorded, indicating how many failures there were in the bulk request.) cbes.docRejected
-
Recorded when there is a permanent indexing failure. These failures usually result in an entry being added to the rejection log Elasticsearch index.
cbes.rejectionLogFail
-
Recorded when the connector is unable to add a record to the rejection log Elasticsearch index.
cbes.esConnFail
-
Recorded when the connector fails to establish a connection to Elasticsearch.
cbes.saveStateFail
-
Recorded when the connector fails to persist a replication checkpoint document to Couchbase.
Timers
A timer combines a meter with a histogram of event durations, providing insight into the percentiles. The histogram is backed by an exponentially decaying reservoir, representing roughly the past 5 minutes of data.
cbes.latency
-
The time between when the connector is notified of a database change and when the change is written to Elasticsearch. Bear in mind the connector will not receive the event until there is room in its flow control buffer. Although this metric is not an absolute measurement of end-to-end latency, it is still useful as an indicator of connector performance.
cbes.bulkIndexPerDoc
-
The duration of an Elasticsearch bulk request (including retries), divided by the number of items in the bulk request.
cbes.retryDelay
-
Time spent waiting after a temporary indexing failure before the request is retried.
Undocumented Metrics
The connector exposes several other metrics that are useful for troubleshooting. However, only the metrics described in this document are considered part of the connector’s public API. Undocumented metrics should be considered "uncommitted", meaning they may be modified or removed in a patch release without advance notice.