Monitoring

The connector exposes several performance metrics. Here’s a summary of the most important metrics and how to interpret them.

In its default configuration, the connector runs an embedded web server. Metrics are exposed at http://localhost:31415/metrics?pretty.

Omit the ?pretty for a more compact response.

A simple health check might fetch that URL and assert the response has an HTTP status code of 200 (OK). A more sophisticated check should parse the JSON response and inspect the values of whichever metrics you consider important.

HTTP Server Configuration

To disable the embedded HTTP server or to use an alternate port, edit the connector config and search for the httpPort property in the [metrics] section. Setting the port to -1 will disable the HTTP server.

Logging Configuration

In addition to exposing metrics over HTTP, by default the connector periodically logs the metrics to $CBES_HOME/logs/cbes-metrics.log.

In the [metrics] section of the connector config file you can configure the interval at which metrics written to the log by modifying the logInterval property. Metrics logging can be disabled by setting the interval to '0m'.

The location of the metrics log file is configurable in $CBES_HOME/config/log4j2.xml.

Metrics Reference

The connector uses the Dropwizard Metrics library to capture and report performance information. This section describes the metrics exposed by the connector and how to interpret them.

Gauges

A gauge is a single value that represents the current state of something.

cbes.esWaitMs

Perhaps the most important metric for implementing a health check, this gauge reports the duration in milliseconds of the current in-flight Elasticsearch bulk request. This includes the time it takes to retry the request, if necessary. An exceptionally long duration might indicate the connector is stalled in a retry loop.

cbes.backfill

This is the number of Couchbase documents not yet replicated to Elasticsearch at the time the connector started. The value reported by this gauge never changes.

cbes.backfillRemaining

Shows how many documents the connector still needs to process in order to catch up with the state of the bucket when the connector started. The value only decreases, and never goes below zero.

cbes.backfillEstTimeLeft

A rough estimate of the time it will take for the connector to finish processing the remaining backfill.

cbes.writeQueue

Reports the number of document events currently buffered in memory. (The write queue is implicitly bounded by the flowControlBuffer config property which determines the buffer size.)

Meters

A meter records the rate at which an event occurs, and also the total number of occurrences. It reports a mean rate since the connector started, as well as 1-, 5-, and 15-minute exponentially weighted moving averages.

cbes.throughputBytes

An estimate of the number of bytes the connector has written to Elasticsearch.

cbes.bulkRetry

Recorded whenever an Elasticsearch bulk request is retried due to a temporary failure.

cbes.docWriteRetry

Recorded for each document being retried. (For each bulkRetry event, one or more docWriteRetry events are recorded, indicating how many failures there were in the bulk request.)

cbes.docRejected

Recorded when there is a permanent indexing failure. These failures usually result in an entry being added to the rejection log Elasticsearch index.

cbes.rejectionLogFail

Recorded when the connector is unable to add a record to the rejection log Elasticsearch index.

cbes.esConnFail

Recorded when the connector fails to establish a connection to Elasticsearch.

cbes.saveStateFail

Recorded when the connector fails to persist a replication checkpoint document to Couchbase.

Timers

A timer combines a meter with a histogram of event durations, providing insight into the percentiles. The histogram is backed by an exponentially decaying reservoir, representing roughly the past 5 minutes of data.

cbes.latency

The time between when the connector is notified of a database change and when the change is written to Elasticsearch. Bear in mind the connector will not receive the event until there is room in its flow control buffer. Although this metric is not an absolute measurement of end-to-end latency, it is still useful as an indicator of connector performance.

cbes.bulkIndexPerDoc

The duration of an Elasticsearch bulk request (including retries), divided by the number of items in the bulk request.

cbes.retryDelay

Time spent waiting after a temporary indexing failure before the request is retried.

Undocumented Metrics

The connector exposes several other metrics that are useful for troubleshooting. However, only the metrics described in this document are considered part of the connector’s public API. Undocumented metrics should be considered "uncommitted", meaning they may be modified or removed in a patch release without advance notice.