Monitoring Reference

Capella Operational

reference

This reference lists the metric graphs displayed in the Capella UI Monitoring dashboards.

In the Capella UI, the Monitoring dashboards display a set of metric graphs, enabling users to monitor system performance in real time.

For more information about Capella’s monitoring dashboards, see View Monitoring Dashboards. For more information about App Service’s monitoring dashboards, see Monitor through the UI.

This monitoring reference lists:

The Graph Name as displayed in the Capella UI.
A Description of what this metric graph entails.
The Metric calculation method for this metric. For more information about the metrics used, see Metrics Reference.

The monitoring dashboards show the following metrics:

Graph Name Description Metric

Graph Name	Description	Metric
App Endpoint: Total Auth Failures	Total Auth Failures is the total number of authentication failures per app endpoint.	`sum(sgw_security_auth_failed_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Total Auth Successes	Total Auth Successes is the total number of successful authentication per app endpoint.	`sum(sgw_security_auth_success_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Total Requested Deltas	Total Requested Deltas is the total number of deltas requested per app endpoint.	`sum(sgw_delta_sync_deltas_requested{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Total Deltas Sent	Total Deltas Sent is the total number of deltas sent per app endpoint.	`sum(sgw_delta_sync_deltas_sent{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Total Documents Imported	Total Documents Imported is the total number of documents imported per app endpoint.	`sum(sgw_shared_bucket_import_import_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Total Documents Read	Total Documents Read is the total number of documents read per app endpoint.	`sum(sgw_database_num_doc_reads_blip{databaseId="<databaseId>",tenantId="<tenantId>"} + sgw_database_num_doc_reads_rest{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Total Documents Rejected	Total Documents Rejected is the total number of documents rejected per app endpoint.	`sum(sgw_security_num_docs_rejected{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Total Documents Written	Total Documents Written is the total number of documents written per app endpoint.	`sum(sgw_database_num_doc_writes{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Endpoint: Active Pull Only Replications	Active Pull Only Replications is the total number of active pull only replication operations performed per app endpoint.	`sum(sgw_replication_pull_num_pull_repl_active_one_shot{databaseId="<databaseId>",tenantId="<tenantId>"} + sgw_replication_pull_num_pull_repl_active_continuous{databaseId="<databaseId>",tenantId="<tenantId>"})`
App Service: Bytes Received by Node	Total bytes received on the primary network interface by node.	`sum by (couchbaseNode) (node_network_receive_bytes_total{databaseId="<databaseId>",device="eth0",syncgatewayId!="",tenantId="<tenantId>"})`
App Service: Bytes Sent by Node	Total bytes sent on the primary network interface by node.	`sum by (couchbaseNode) (node_network_transmit_bytes_total{databaseId="<databaseId>",device="eth0",syncgatewayId!="",tenantId="<tenantId>"})`
App Service: CPU Utilization by Node	CPU utilization percentage of the Sync Gateway process by node.	`sum by (couchbaseNode) (sgw_resource_utilization_process_cpu_percent_utilization{databaseId="<databaseId>",tenantId="<tenantId>"} / 10)`
App Service: Memory Utilization by Node	Memory utilization percentage of the Sync Gateway node.	`sum by (couchbaseNode) (node_memory_MemTotal_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"} - node_memory_MemAvailable_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"}) / (node_memory_MemTotal_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"} > 0 * 100)`
Data: Current Active Items by Bucket	Number of current active items by bucket.	`sum by (bucket) (bucket_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})`
Data: Disk Reads per Second by Bucket	Average disk reads per second by bucket.	`sum by (bucket) (bucket:kv_bg_load:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Disk Used by Bucket	Total disk space consumed by each bucket.	`sum by (bucket) (bucket:kv_ep_db_file_size_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Total Disk Write Queue Size by Bucket	Total disk write queue size by bucket.	`sum by (bucket) (bucket:kv_vb_queue_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: GSI Items Remaining to Index	A count of items to be indexed using Global Secondary Indexes.	`sum by (bucket) (bucket_connection:kv_dcp_items_remaining:sum{connection_type="secidx",databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Bucket GET Ops per Second	Bucket GET Ops per Second is the average number of GET operations per second over the last 5 minutes by bucket.	`sum by (bucket) (bucket_op:kv_ops:rate5m{databaseId="<databaseId>",op="get",tenantId="<tenantId>"})`
Data: Bucket Ops per Second	Bucket Ops per Second is the average number of operations per second over the last 5 minutes by bucket.	`sum by (bucket) (bucket:kv_ops:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Bucket SET Ops per Second	Bucket SET Ops per Second is the average number of SET operations per second over the last 5 minutes by bucket.	`sum by (bucket) (bucket_op:kv_ops:rate5m{databaseId="<databaseId>",op="set",tenantId="<tenantId>"})`
Data: Quota Memory Used Percent By Bucket/Node	Quota memory usage percent by bucket and node.	`sum by (bucket, couchbaseNode) (kv_mem_quota_usage_ratio{databaseId="<databaseId>",tenantId="<tenantId>"}) * 100`
Data: Memory Used by Bucket	Memory usage per bucket.	`sum by (bucket) (bucket:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Out Of Memory (OOM) Errors by Bucket	Out of Memory Errors (OOM) by bucket.	`sum by (bucket) (bucket:kv_ep_oom_errors:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Temporary Out Of Memory Errors by Bucket	A count of Temporary Out of Memory Errors by Bucket.	`sum by (bucket) (bucket:kv_ep_tmp_oom_errors:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: vBuckets (Active)	Count of active vBuckets used to distribute data across nodes.	`sum by (bucket) (bucket_state:kv_num_vbuckets:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})`
Data: Active Item Resident Ratio by Bucket	Ratio of unique items in memory compared to on disk, per bucket.	`avg by (bucket) (bucket_state:kv_vb_perc_mem_resident_ratio:avg{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})`
Data: vBuckets (Replica)	Count of replicate vBuckets used to distribute data across nodes.	`sum by (bucket) (bucket_state:kv_num_vbuckets:sum{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})`
Data: Replica Item Resident Ratio by Bucket	Ratio of replica items in memory compared to on disk, per bucket.	`avg by (bucket) (bucket_state:kv_vb_perc_mem_resident_ratio:avg{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})`
Analytics: Storage Used	The total size of remote s3 storage used.	`sum(cbas_remote_storage_size_bytes{databaseId="<databaseId>",tenantId="<tenantId>"})`
Analytics: Total Requests per Second	Total number of received analytics requests over the last 5 minutes for the entire cluster.	`sumrate(cbas_requests_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m])`
Data: Connections	Count of connections to the cluster.	`sum(database:kv_curr_connections:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Cluster GET Ops per Second	Cluster GET Ops per Second is the average total number of GET operations per second over the last 5 minutes across all buckets.	`sum(op:kv_ops:rate5m{databaseId="<databaseId>",op="get",tenantId="<tenantId>"})`
Data: Cluster Ops per Second	Cluster Ops per Second is the average total number of operations per second over the last 5 minutes across all buckets.	`sum(database:couchbase_bucket_basicstats_opspersec:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Cluster SET Ops per Second	Cluster SET Ops per Second is the average total number of SET operations per second over the last 5 minutes across all buckets.	`sum(op:kv_ops:rate5m{databaseId="<databaseId>",op="set",tenantId="<tenantId>"})`
Data: Cluster Total Memory Used	Total memory used by the cluster.	`sum(database:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: Total Requests per Second	The per second rate of SQL++ query requests over the last 5 minutes for the entire cluster.	`sum by (latency) (database:n1ql_requests:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Data Encryption Status by Node	Provides status information about encryption-at-rest functionality enabling monitoring of encryption coverage and operational health.	`sum by (couchbaseNode, data_type) (cm_encr_at_rest_data_status{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Data Encryption Keys In Use by Node	Counts active Data Encryption Keys currently being utilized providing insight into encryption key distribution and resource consumption.	`sum by (couchbaseNode, type) (cm_encr_at_rest_deks_in_use{databaseId="<databaseId>",tenantId="<tenantId>"})`
Columnar: Connections	Count of connections to the cluster.	`sum(database:kv_curr_connections:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Columnar: CPU Utilization by Node	Real-time CPU utilization percentage per Columnar node.	`sum by (couchbaseNode) (node_cpu_utilization_rate{databaseId="<databaseId>",tenantId="<tenantId>"})`
Node: Disk Read IOPS by Node	Disk read Input/Output Operations Per Second for each node providing insight into storage read activity patterns.	`sum by (couchbaseNode) (rate(disk_iops_reads{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Node: Disk Total IOPS by Node	Combined disk Input/Output Operations Per Second (read + write) providing comprehensive view of total storage I/O activity.	`sum by (couchbaseNode) (rate(disk_iops_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Node: Disk Write IOPS by Node	Disk write Input/Output Operations Per Second for each node measuring storage write activity intensity.	`sum by (couchbaseNode) (rate(disk_iops_writes{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Node: Disk Read Throughput by Node	Rate of data read from disk storage by each node measured in bytes per second for bandwidth utilization monitoring.	`sum by (couchbaseNode) (rate(disk_bytes_read{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Node: Disk Total Throughput by Node	Combined disk throughput (read + write) providing comprehensive view of total storage bandwidth utilization.	`sum by (couchbaseNode) (rate(disk_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Node: Disk Write Throughput by Node	Rate of data written to disk storage by each node including document persistence and compaction operations.	`sum by (couchbaseNode) (rate(disk_bytes_written{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Query: Query Request Time Average in Milliseconds	Total end-to-end time to process all queries.	`sum(rate(n1ql_request_time{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]) > 0) / 1e+06 / sum(rate(n1ql_requests{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]))`
Query: Query Execution Time Average in Milliseconds	Time to execute all queries.	`sum(rate(n1ql_service_time{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]) > 0) / 1e+06 / sum(rate(n1ql_requests{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]))`
Data: Active Items by Node	Number of active items (documents) stored on each node.	`sum by (couchbaseNode) (node_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})`
Analytics: Parse Failure Rate by Link/Bucket	The per second rate of record parsing failures from linked items, averaged over the last 5 minutes by link and bucket.	`sum by (link, bucket) (rate(cbas_failed_to_parse_records_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Analytics: Total Parse Failures by Link/Bucket	The total number of record parsing failures from linked items, averaged over the last 5 minutes by link and bucket.	`sum by (link, bucket) (cbas_failed_to_parse_records_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Analytics: Ingested Bytes Rate by Link/Bucket	The per second rate of incoming bytes ingested by analytics, averaged over the last 5 minutes by link and bucket.	`sum by (link, bucket) (rate(cbas_incoming_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Analytics: Total Ingested Bytes by Link/Bucket	The total incoming bytes ingested by analytics by link and bucket.	`sum by (link, bucket) (cbas_incoming_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Analytics: Linked Ops Rate by Link/Bucket	The per second rate of linked record operations processed by analytics, averaged over the last 5 minutes by link and bucket.	`sum by (link, bucket) (rate(cbas_incoming_records_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Analytics: Total Linked Ops by Link/Bucket	The total number of linked record operations processed by analytics by link and bucket.	`sum by (link, bucket) (cbas_incoming_records_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Analytics: Read Rate by Node	The per second rate at which disk bytes are read for the Analytics Service averaged over the last 5 minutes by node.	`sum by (couchbaseNode) (rate(cbas_io_reads_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Analytics: Write Rate by Node	The per second rate at which disk bytes are written for the Analytics Service averaged over the last 5 minutes by node.	`sum by (couchbaseNode) (rate(cbas_io_writes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Analytics: Total Requests per Second by Node	Total number of received analytics requests over the last 5 minutes by node.	`sum by (couchbaseNode) (rate(cbas_requests_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))`
Analytics: System Load per Node	The Analytics System load average for the last minute by node.	`sum by (couchbaseNode) (cbas_system_load_average{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Data Encryption Key Fetch Failures by Node	Percentage of failed attempts to retrieve Data Encryption Keys from the key management service due to network or authentication issues.	`sum by (couchbaseNode, type) cm_encr_at_rest_generate_dek_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"}) / (cm_encr_at_rest_generate_dek_total{databaseId="<databaseId>",tenantId="<tenantId>"} > 0 * 100`
Data: Data Encryption Key Fetch Frequency by Node	Measures how frequently Data Encryption Keys are retrieved from the key management service reflecting encryption activity levels.	`sum by (couchbaseNode, type) (cm_encr_at_rest_generate_dek_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Data Encryption Key Rotation Failures by Node	Tracks encryption key rotation process failures per node, which can pose security risks and compliance violations.	`sum by (couchbaseNode, key_name) cm_encryption_key_rotation_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"}) / (cm_encryption_key_rotations_total{databaseId="<databaseId>",tenantId="<tenantId>"} > 0 * 100`
Data: Data Encryption Key Rotation Frequency by Node	Tracks how frequently encryption keys are rotated per node, providing insight into security policy compliance and key management activity.	`sum by (couchbaseNode, key_name) (cm_encryption_key_rotations_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Data Encryption Service Failures by Node	Measures failures within the data encryption service infrastructure that can impact data protection capabilities and compliance.	`sum by (couchbaseNode, failure_type) (cm_encryption_service_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Node: CPU Utilization by Node	Real-time CPU utilization percentage per node.	`sum by (couchbaseNode) (node_cpu_utilization_rate{databaseId="<databaseId>",tenantId="<tenantId>"})`
Node: Disk Used Percent by Node	The used percentage of each node's disk space.	`sum by (couchbaseNode) (node_disk_usage_ratio{databaseId="<databaseId>",tenantId="<tenantId>"}) * 100`
Data: Disk Used Bytes by Node	Total disk space currently consumed on each node.	`sum by (couchbaseNode) (node_disk_used{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Average Item Size	Average size of the indexed keys by node.	`avg by (couchbaseNode) (node:index_avg_item_size:avg{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Cache Hits per Second	The per second rate of cache hits averaged over the last 5 minutes by node.	`sum by (couchbaseNode) (node:index_cache_hits:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Cache Misses per Second	The per second rate of cache misses averaged over the last 5 minutes by node.	`sum by (couchbaseNode) (node:index_cache_misses:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Indexer Codebook Memory Usage by Node	Memory Usage of Vector Index Codebook by Node.	`sum by (couchbaseNode) (index_codebook_mem_usage{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Indexer Codebook Train Duration By Node	Training Duration of Vector Index Codebook by Node in Seconds.	`sum by (couchbaseNode) (index_codebook_train_duration{databaseId="<databaseId>",tenantId="<tenantId>"} / 1e+09)`
Index: Process CPU (System) Usage by Node	The system-space process CPU utilization of the Index service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",mode="system",tenantId="<tenantId>"}) * 100`
Index: Process CPU Total Usage by Node	The total (user and system) process CPU utilization of the Index service by node.	`sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",tenantId="<tenantId>"}) * 100`
Index: Process CPU (User) Usage by Node	The user-space process CPU utilization of the Index service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",mode="user",tenantId="<tenantId>"}) * 100`
Index: Indexable Data Size	The size of indexable data that is maintained by the indexer by node.	`sum by (couchbaseNode) (node:index_raw_data_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Disk Size	Total disk file size consumed by all indexes by node.	`sum by (couchbaseNode) (node:index_disk_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Documents Indexed per Second	The per second rate of documents indexed averaged over the last 5 minutes by node.	`sum by (couchbaseNode) (node:index_num_docs_indexed:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Documents Pending	Number of documents pending to be indexed by node.	`sum by (couchbaseNode) (node:index_num_docs_pending:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Documents Queued	Number of documents queued to be indexed by node.	`sum by (couchbaseNode) (node:index_num_docs_queued:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Item Count	The number of items currently indexed by node.	`sum by (couchbaseNode) (node:index_items_count:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Total Process Memory Usage by Node	The total process memory usage of the Index service by node.	`sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="indexer",tenantId="<tenantId>"})`
Index: Diverging Replica Indexes	Number of Diverging Replica Indexes.	`sum by (couchbaseNode) (index_num_diverging_replica_indexes{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Requests per Second	The per second rate of requests to the indexer averaged over the last 5 minutes by node.	`sum by (couchbaseNode) (node:index_num_requests:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})`
Index: Resident Percent	Percentage of the data held in memory by the indexer by node.	`avg by (couchbaseNode) (node:index_resident_percent:avg{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Process CPU (System) Usage by Node	The system-space process CPU utilization of the Data service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",mode="system",tenantId="<tenantId>"}) * 100`
Data: Process CPU Total Usage by Node	The total (user and system) process CPU utilization of the Data service by node.	`sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",tenantId="<tenantId>"}) * 100`
Data: Process CPU (User) Usage by Node	The user-space process CPU utilization of the Data service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",mode="user",tenantId="<tenantId>"}) * 100`
Data: Total Process Memory Usage by Node	The total process memory usage of the Data service by node.	`sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="memcached",tenantId="<tenantId>"})`
Node: Memory Used by Node	Memory usage per node.	`sum by (couchbaseNode) (node:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: Process CPU (System) Usage by Node	The system-space process CPU utilization of the Query service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",mode="system",tenantId="<tenantId>"}) * 100`
Query: Process CPU Total Usage by Node	The total (user and system) process CPU utilization of the Query service by node.	`sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",tenantId="<tenantId>"}) * 100`
Query: Process CPU (User) Usage by Node	The user-space process CPU utilization of the Query service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",mode="user",tenantId="<tenantId>"}) * 100`
Query: Total Process Memory Usage by Node	The total process memory usage of the Query service by node.	`sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="cbq-engine",tenantId="<tenantId>"})`
Query: Requests per Second by Node	The per second rate of N1QL query requests over the last 5 minutes by node.	`sum by (couchbaseNode) (node:n1ql_requests:rate5m{databaseId="<databaseId>",latency=">0ms",tenantId="<tenantId>"})`
Data: Replica Items by Node	Number of replica items (documents) stored on each node.	`sum by (couchbaseNode) (node_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})`
Data: SDK Data Service Mutation Durable Duration Count by Node	Counts durable Key-Value mutations that include full persistence and replication guarantees for data safety.	`sum by (couchbaseNode) (sdk_kv_mutation_durable_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Mutation Durable Duration Sum by Node	Accumulates total time spent on durable Key-Value mutations including persistence and replication overhead.	`sum by (couchbaseNode) (sdk_kv_mutation_durable_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Mutation NonDurable Duration Count by Node	Counts non-durable Key-Value mutation operations that prioritize performance by completing upon memory persistence.	`sum by (couchbaseNode) (sdk_kv_mutation_nondurable_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Mutation NonDurable Duration Sum by Node	Accumulates total time spent on non-durable Key-Value mutations that prioritize performance over durability guarantees.	`sum by (couchbaseNode) (sdk_kv_mutation_nondurable_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Cancelled Requests by Node	Key-Value operations cancelled before completion due to client-side request cancellation or application shutdown.	`sum by (couchbaseNode) (sdk_kv_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK KV Service Timed Out Requests by Node	Key-Value operations that exceeded their timeout threshold before completion due to network latency or server overload.	`sum by (couchbaseNode) (sdk_kv_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Total Requests by Node	Comprehensive count of all Key-Value operations including GET, SET, DELETE and other CRUD operations.	`sum by (couchbaseNode) (sdk_kv_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Retrieval Duration Count by Node	Counts total number of Key-Value retrieval operations completed providing data for throughput and performance analysis.	`sum by (couchbaseNode) (sdk_kv_retrieval_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Retrieval Duration Sum by Node	Accumulates total time spent on all Key-Value retrieval operations enabling calculation of average retrieval times.	`sum by (couchbaseNode) (sdk_kv_retrieval_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: SDK Query Service Duration Count by Node	Counts total number of SQL++ query operations completed providing data for calculating average query durations.	`sum by (couchbaseNode) (sdk_query_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: SDK Query Service Duration Sum by Node	Accumulates total time spent executing all SQL++ queries enabling calculation of average query durations.	`sum by (couchbaseNode) (sdk_query_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: SDK Query Service Cancelled Requests by Node	SQL++ query operations cancelled before completion due to client-side cancellation or application timeouts.	`sum by (couchbaseNode) (sdk_query_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: SDK Query Service Timed Out Requests by Node	SQL++ query operations that exceeded their timeout threshold due to complexity, resource constraints, or network issues.	`sum by (couchbaseNode) (sdk_query_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: SDK Query Service Total Requests by Node	Total number of SQL++ query operations executed through the SDK encompassing all query types.	`sum by (couchbaseNode) (sdk_query_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
Search: SDK Search Service Duration Count by Node	Counts total number of Full Text Search operations completed providing data for calculating average search durations.	`sum by (couchbaseNode) (sdk_search_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})`
Search: SDK Search Service Duration Sum by Node	Accumulates total time spent executing all Full Text Search operations enabling calculation of average search durations.	`sum by (couchbaseNode) (sdk_search_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})`
Search: SDK Search Service Cancelled Requests by Node	Full Text Search operations cancelled before completion due to client-side request cancellation or timeout handling.	`sum by (couchbaseNode) (sdk_search_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})`
Search: SDK Search Service Timed Out Requests by Node	Full Text Search operations that exceeded their timeout threshold due to complex queries or resource constraints.	`sum by (couchbaseNode) (sdk_search_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})`
Search: SDK Search Service Total Requests by Node	Total number of Full Text Search operations performed through the SDK including all search query types.	`sum by (couchbaseNode) (sdk_search_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})`
XDCR: Process CPU (System) Usage by Node	The system-space process CPU utilization of the XDCR service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",mode="system",tenantId="<tenantId>"}) * 100`
XDCR: Process CPU Total Usage by Node	The total (user and system) process CPU utilization of the XDCR service by node.	`sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",tenantId="<tenantId>"}) * 100`
XDCR: Process CPU (User) Usage by Node	The user-space process CPU utilization of the XDCR service by node.	`sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",mode="user",tenantId="<tenantId>"}) * 100`
XDCR: Total Process Memory Usage by Node	The total process memory usage of the XDCR service by node.	`sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="goxdcr",tenantId="<tenantId>"})`
Data: SDK Data Service Mutation Durable Duration by Upper Bound of Bucket(le)	Histogram of durable Key-Value mutation durations that ensure data persistence and replication before completion.	`sum by (couchbaseNode, le) (sdk_kv_mutation_durable_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Mutation NonDurable Duration by Upper Bound of Bucket(le)	Histogram of non-durable Key-Value mutation durations that prioritize performance by completing when data reaches memory.	`sum by (couchbaseNode, le) (sdk_kv_mutation_nondurable_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: SDK Data Service Retrieval Duration by Upper Bound of Bucket(le)	Histogram of Key-Value retrieval operation durations organized by latency buckets for performance analysis.	`sum by (couchbaseNode, le) (sdk_kv_retrieval_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})`
Query: SDK Query Service Duration by Upper Bound of Bucket(le)	Histogram of SQL++ query operation durations organized by latency buckets for performance analysis.	`sum by (couchbaseNode, le) (sdk_query_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})`
Search: SDK Search Service Duration by Upper Bound of Bucket(le)	Histogram of Full Text Search operation durations organized by latency buckets for performance analysis.	`sum by (couchbaseNode, le) (sdk_search_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Compute Units by Bucket	Number of compute units by bucket.	`sum by (bucket) (meter_cu_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Current Active Items by Bucket	Number of current active items by bucket.	`sum by (bucket) (bucket_state:kv_vb_curr_items:sum{bucket="<bucket>",databaseId="<databaseId>",state="active",tenantId="<tenantId>"})`
Data: Disk Used by Bucket	Total disk storage consumed by each serverless bucket.	`sum by (bucket) (bucket:kv_ep_db_file_size_bytes:sum{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Bucket GET Ops per Second	Bucket GET Ops per Second is the average number of GET operations per second over the last 5 minutes by bucket.	`sum by (bucket, result) (bucket_op:kv_ops:rate5m{bucket="<bucket>",databaseId="<databaseId>",op="get",tenantId="<tenantId>"})`
Data: Bucket SET Ops per Second	Bucket SET Ops per Second is the average number of SET operations per second over the last 5 minutes by bucket.	`sum by (bucket, result) (bucket_op:kv_ops:rate5m{bucket="<bucket>",databaseId="<databaseId>",op="set",tenantId="<tenantId>"})`
Data: Read Units by Bucket	Number of units read by bucket.	`sum by (bucket) (meter_ru_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})`
Data: Write Units by Bucket	Number of units written by bucket.	`sum by (bucket) (meter_wu_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})`
XDCR: xdcr_resp_wait_time_seconds (avg)	The rolling average amount of time it takes from when a MemcachedRequest is created to be ready to route to an outnozzle to the time that the response has been heard back from the target node after a successful write.	`avg by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_resp_wait_time_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})`
XDCR: xdcr_resp_wait_time_seconds (max)	The rolling average amount of time it takes from when a MemcachedRequest is created to be ready to route to an outnozzle to the time that the response has been heard back from the target node after a successful write.	`max by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_resp_wait_time_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})`
XDCR: xdcr_wtavg_docs_latency_seconds	The rolling average amount of time it takes for the source cluster to receive the acknowledgement of a SET_WITH_META response after the Memcached request has been composed to be processed by the XDCR Target Nozzle.	`max by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_wtavg_docs_latency_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})`

App Endpoint: Total Auth Failures

Total Auth Failures is the total number of authentication failures per app endpoint.

sum(sgw_security_auth_failed_count{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Total Auth Successes

Total Auth Successes is the total number of successful authentication per app endpoint.

sum(sgw_security_auth_success_count{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Total Requested Deltas

Total Requested Deltas is the total number of deltas requested per app endpoint.

sum(sgw_delta_sync_deltas_requested{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Total Deltas Sent

Total Deltas Sent is the total number of deltas sent per app endpoint.

sum(sgw_delta_sync_deltas_sent{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Total Documents Imported

Total Documents Imported is the total number of documents imported per app endpoint.

sum(sgw_shared_bucket_import_import_count{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Total Documents Read

Total Documents Read is the total number of documents read per app endpoint.

sum(sgw_database_num_doc_reads_blip{databaseId="<databaseId>",tenantId="<tenantId>"} + sgw_database_num_doc_reads_rest{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Total Documents Rejected

Total Documents Rejected is the total number of documents rejected per app endpoint.

sum(sgw_security_num_docs_rejected{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Total Documents Written

Total Documents Written is the total number of documents written per app endpoint.

sum(sgw_database_num_doc_writes{databaseId="<databaseId>",tenantId="<tenantId>"})

App Endpoint: Active Pull Only Replications

Active Pull Only Replications is the total number of active pull only replication operations performed per app endpoint.

sum(sgw_replication_pull_num_pull_repl_active_one_shot{databaseId="<databaseId>",tenantId="<tenantId>"} + sgw_replication_pull_num_pull_repl_active_continuous{databaseId="<databaseId>",tenantId="<tenantId>"})

App Service: Bytes Received by Node

Total bytes received on the primary network interface by node.

sum by (couchbaseNode) (node_network_receive_bytes_total{databaseId="<databaseId>",device="eth0",syncgatewayId!="",tenantId="<tenantId>"})

App Service: Bytes Sent by Node

Total bytes sent on the primary network interface by node.

sum by (couchbaseNode) (node_network_transmit_bytes_total{databaseId="<databaseId>",device="eth0",syncgatewayId!="",tenantId="<tenantId>"})

App Service: CPU Utilization by Node

CPU utilization percentage of the Sync Gateway process by node.

sum by (couchbaseNode) (sgw_resource_utilization_process_cpu_percent_utilization{databaseId="<databaseId>",tenantId="<tenantId>"} / 10)

App Service: Memory Utilization by Node

Memory utilization percentage of the Sync Gateway node.

sum by (couchbaseNode) (node_memory_MemTotal_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"} - node_memory_MemAvailable_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"}) / (node_memory_MemTotal_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"} > 0 * 100)

Data: Current Active Items by Bucket

Number of current active items by bucket.

sum by (bucket) (bucket_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

Data: Disk Reads per Second by Bucket

Average disk reads per second by bucket.

sum by (bucket) (bucket:kv_bg_load:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Disk Used by Bucket

Total disk space consumed by each bucket.

sum by (bucket) (bucket:kv_ep_db_file_size_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Total Disk Write Queue Size by Bucket

Total disk write queue size by bucket.

sum by (bucket) (bucket:kv_vb_queue_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: GSI Items Remaining to Index

A count of items to be indexed using Global Secondary Indexes.

sum by (bucket) (bucket_connection:kv_dcp_items_remaining:sum{connection_type="secidx",databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Bucket GET Ops per Second

Bucket GET Ops per Second is the average number of GET operations per second over the last 5 minutes by bucket.

sum by (bucket) (bucket_op:kv_ops:rate5m{databaseId="<databaseId>",op="get",tenantId="<tenantId>"})

Data: Bucket Ops per Second

Bucket Ops per Second is the average number of operations per second over the last 5 minutes by bucket.

sum by (bucket) (bucket:kv_ops:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Bucket SET Ops per Second

Bucket SET Ops per Second is the average number of SET operations per second over the last 5 minutes by bucket.

sum by (bucket) (bucket_op:kv_ops:rate5m{databaseId="<databaseId>",op="set",tenantId="<tenantId>"})

Data: Quota Memory Used Percent By Bucket/Node

Quota memory usage percent by bucket and node.

sum by (bucket, couchbaseNode) (kv_mem_quota_usage_ratio{databaseId="<databaseId>",tenantId="<tenantId>"}) * 100

Data: Memory Used by Bucket

Memory usage per bucket.

sum by (bucket) (bucket:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Out Of Memory (OOM) Errors by Bucket

Out of Memory Errors (OOM) by bucket.

sum by (bucket) (bucket:kv_ep_oom_errors:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Temporary Out Of Memory Errors by Bucket

A count of Temporary Out of Memory Errors by Bucket.

sum by (bucket) (bucket:kv_ep_tmp_oom_errors:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: vBuckets (Active)

Count of active vBuckets used to distribute data across nodes.

sum by (bucket) (bucket_state:kv_num_vbuckets:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

Data: Active Item Resident Ratio by Bucket

Ratio of unique items in memory compared to on disk, per bucket.

avg by (bucket) (bucket_state:kv_vb_perc_mem_resident_ratio:avg{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

Data: vBuckets (Replica)

Count of replicate vBuckets used to distribute data across nodes.

sum by (bucket) (bucket_state:kv_num_vbuckets:sum{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})

Data: Replica Item Resident Ratio by Bucket

Ratio of replica items in memory compared to on disk, per bucket.

avg by (bucket) (bucket_state:kv_vb_perc_mem_resident_ratio:avg{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})

Analytics: Storage Used

The total size of remote s3 storage used.

sum(cbas_remote_storage_size_bytes{databaseId="<databaseId>",tenantId="<tenantId>"})

Analytics: Total Requests per Second

Total number of received analytics requests over the last 5 minutes for the entire cluster.

sumrate(cbas_requests_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m])

Data: Connections

Count of connections to the cluster.

sum(database:kv_curr_connections:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Cluster GET Ops per Second

Cluster GET Ops per Second is the average total number of GET operations per second over the last 5 minutes across all buckets.

sum(op:kv_ops:rate5m{databaseId="<databaseId>",op="get",tenantId="<tenantId>"})

Data: Cluster Ops per Second

Cluster Ops per Second is the average total number of operations per second over the last 5 minutes across all buckets.

sum(database:couchbase_bucket_basicstats_opspersec:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Cluster SET Ops per Second

Cluster SET Ops per Second is the average total number of SET operations per second over the last 5 minutes across all buckets.

sum(op:kv_ops:rate5m{databaseId="<databaseId>",op="set",tenantId="<tenantId>"})

Data: Cluster Total Memory Used

Total memory used by the cluster.

sum(database:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: Total Requests per Second

The per second rate of SQL++ query requests over the last 5 minutes for the entire cluster.

sum by (latency) (database:n1ql_requests:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Data Encryption Status by Node

Provides status information about encryption-at-rest functionality enabling monitoring of encryption coverage and operational health.

sum by (couchbaseNode, data_type) (cm_encr_at_rest_data_status{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Data Encryption Keys In Use by Node

Counts active Data Encryption Keys currently being utilized providing insight into encryption key distribution and resource consumption.

sum by (couchbaseNode, type) (cm_encr_at_rest_deks_in_use{databaseId="<databaseId>",tenantId="<tenantId>"})

Columnar: Connections

Count of connections to the cluster.

sum(database:kv_curr_connections:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Columnar: CPU Utilization by Node

Real-time CPU utilization percentage per Columnar node.

sum by (couchbaseNode) (node_cpu_utilization_rate{databaseId="<databaseId>",tenantId="<tenantId>"})

Node: Disk Read IOPS by Node

Disk read Input/Output Operations Per Second for each node providing insight into storage read activity patterns.

sum by (couchbaseNode) (rate(disk_iops_reads{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Node: Disk Total IOPS by Node

Combined disk Input/Output Operations Per Second (read + write) providing comprehensive view of total storage I/O activity.

sum by (couchbaseNode) (rate(disk_iops_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Node: Disk Write IOPS by Node

Disk write Input/Output Operations Per Second for each node measuring storage write activity intensity.

sum by (couchbaseNode) (rate(disk_iops_writes{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Node: Disk Read Throughput by Node

Rate of data read from disk storage by each node measured in bytes per second for bandwidth utilization monitoring.

sum by (couchbaseNode) (rate(disk_bytes_read{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Node: Disk Total Throughput by Node

Combined disk throughput (read + write) providing comprehensive view of total storage bandwidth utilization.

sum by (couchbaseNode) (rate(disk_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Node: Disk Write Throughput by Node

Rate of data written to disk storage by each node including document persistence and compaction operations.

sum by (couchbaseNode) (rate(disk_bytes_written{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Query: Query Request Time Average in Milliseconds

Total end-to-end time to process all queries.

sum(rate(n1ql_request_time{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]) > 0) / 1e+06 / sum(rate(n1ql_requests{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]))

Query: Query Execution Time Average in Milliseconds

Time to execute all queries.

sum(rate(n1ql_service_time{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]) > 0) / 1e+06 / sum(rate(n1ql_requests{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]))

Data: Active Items by Node

Number of active items (documents) stored on each node.

sum by (couchbaseNode) (node_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

Analytics: Parse Failure Rate by Link/Bucket

The per second rate of record parsing failures from linked items, averaged over the last 5 minutes by link and bucket.

sum by (link, bucket) (rate(cbas_failed_to_parse_records_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Analytics: Total Parse Failures by Link/Bucket

The total number of record parsing failures from linked items, averaged over the last 5 minutes by link and bucket.

sum by (link, bucket) (cbas_failed_to_parse_records_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Analytics: Ingested Bytes Rate by Link/Bucket

The per second rate of incoming bytes ingested by analytics, averaged over the last 5 minutes by link and bucket.

sum by (link, bucket) (rate(cbas_incoming_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Analytics: Total Ingested Bytes by Link/Bucket

The total incoming bytes ingested by analytics by link and bucket.

sum by (link, bucket) (cbas_incoming_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Analytics: Linked Ops Rate by Link/Bucket

The per second rate of linked record operations processed by analytics, averaged over the last 5 minutes by link and bucket.

sum by (link, bucket) (rate(cbas_incoming_records_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Analytics: Total Linked Ops by Link/Bucket

The total number of linked record operations processed by analytics by link and bucket.

sum by (link, bucket) (cbas_incoming_records_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Analytics: Read Rate by Node

The per second rate at which disk bytes are read for the Analytics Service averaged over the last 5 minutes by node.

sum by (couchbaseNode) (rate(cbas_io_reads_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Analytics: Write Rate by Node

The per second rate at which disk bytes are written for the Analytics Service averaged over the last 5 minutes by node.

sum by (couchbaseNode) (rate(cbas_io_writes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Analytics: Total Requests per Second by Node

Total number of received analytics requests over the last 5 minutes by node.

sum by (couchbaseNode) (rate(cbas_requests_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

Analytics: System Load per Node

The Analytics System load average for the last minute by node.

sum by (couchbaseNode) (cbas_system_load_average{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Data Encryption Key Fetch Failures by Node

Percentage of failed attempts to retrieve Data Encryption Keys from the key management service due to network or authentication issues.

sum by (couchbaseNode, type) cm_encr_at_rest_generate_dek_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"}) / (cm_encr_at_rest_generate_dek_total{databaseId="<databaseId>",tenantId="<tenantId>"} > 0 * 100

Data: Data Encryption Key Fetch Frequency by Node

Measures how frequently Data Encryption Keys are retrieved from the key management service reflecting encryption activity levels.

sum by (couchbaseNode, type) (cm_encr_at_rest_generate_dek_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Data Encryption Key Rotation Failures by Node

Tracks encryption key rotation process failures per node, which can pose security risks and compliance violations.

sum by (couchbaseNode, key_name) cm_encryption_key_rotation_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"}) / (cm_encryption_key_rotations_total{databaseId="<databaseId>",tenantId="<tenantId>"} > 0 * 100

Data: Data Encryption Key Rotation Frequency by Node

Tracks how frequently encryption keys are rotated per node, providing insight into security policy compliance and key management activity.

sum by (couchbaseNode, key_name) (cm_encryption_key_rotations_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Data Encryption Service Failures by Node

Measures failures within the data encryption service infrastructure that can impact data protection capabilities and compliance.

sum by (couchbaseNode, failure_type) (cm_encryption_service_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Node: CPU Utilization by Node

Real-time CPU utilization percentage per node.

sum by (couchbaseNode) (node_cpu_utilization_rate{databaseId="<databaseId>",tenantId="<tenantId>"})

Node: Disk Used Percent by Node

The used percentage of each node's disk space.

sum by (couchbaseNode) (node_disk_usage_ratio{databaseId="<databaseId>",tenantId="<tenantId>"}) * 100

Data: Disk Used Bytes by Node

Total disk space currently consumed on each node.

sum by (couchbaseNode) (node_disk_used{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Average Item Size

Average size of the indexed keys by node.

avg by (couchbaseNode) (node:index_avg_item_size:avg{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Cache Hits per Second

The per second rate of cache hits averaged over the last 5 minutes by node.

sum by (couchbaseNode) (node:index_cache_hits:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Cache Misses per Second

The per second rate of cache misses averaged over the last 5 minutes by node.

sum by (couchbaseNode) (node:index_cache_misses:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Indexer Codebook Memory Usage by Node

Memory Usage of Vector Index Codebook by Node.

sum by (couchbaseNode) (index_codebook_mem_usage{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Indexer Codebook Train Duration By Node

Training Duration of Vector Index Codebook by Node in Seconds.

sum by (couchbaseNode) (index_codebook_train_duration{databaseId="<databaseId>",tenantId="<tenantId>"} / 1e+09)

Index: Process CPU (System) Usage by Node

The system-space process CPU utilization of the Index service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",mode="system",tenantId="<tenantId>"}) * 100

Index: Process CPU Total Usage by Node

The total (user and system) process CPU utilization of the Index service by node.

sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",tenantId="<tenantId>"}) * 100

Index: Process CPU (User) Usage by Node

The user-space process CPU utilization of the Index service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",mode="user",tenantId="<tenantId>"}) * 100

Index: Indexable Data Size

The size of indexable data that is maintained by the indexer by node.

sum by (couchbaseNode) (node:index_raw_data_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Disk Size

Total disk file size consumed by all indexes by node.

sum by (couchbaseNode) (node:index_disk_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Documents Indexed per Second

The per second rate of documents indexed averaged over the last 5 minutes by node.

sum by (couchbaseNode) (node:index_num_docs_indexed:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Documents Pending

Number of documents pending to be indexed by node.

sum by (couchbaseNode) (node:index_num_docs_pending:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Documents Queued

Number of documents queued to be indexed by node.

sum by (couchbaseNode) (node:index_num_docs_queued:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Item Count

The number of items currently indexed by node.

sum by (couchbaseNode) (node:index_items_count:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Total Process Memory Usage by Node

The total process memory usage of the Index service by node.

sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="indexer",tenantId="<tenantId>"})

Index: Diverging Replica Indexes

Number of Diverging Replica Indexes.

sum by (couchbaseNode) (index_num_diverging_replica_indexes{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Requests per Second

The per second rate of requests to the indexer averaged over the last 5 minutes by node.

sum by (couchbaseNode) (node:index_num_requests:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

Index: Resident Percent

Percentage of the data held in memory by the indexer by node.

avg by (couchbaseNode) (node:index_resident_percent:avg{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Process CPU (System) Usage by Node

The system-space process CPU utilization of the Data service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",mode="system",tenantId="<tenantId>"}) * 100

Data: Process CPU Total Usage by Node

The total (user and system) process CPU utilization of the Data service by node.

sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",tenantId="<tenantId>"}) * 100

Data: Process CPU (User) Usage by Node

The user-space process CPU utilization of the Data service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",mode="user",tenantId="<tenantId>"}) * 100

Data: Total Process Memory Usage by Node

The total process memory usage of the Data service by node.

sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="memcached",tenantId="<tenantId>"})

Node: Memory Used by Node

Memory usage per node.

sum by (couchbaseNode) (node:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: Process CPU (System) Usage by Node

The system-space process CPU utilization of the Query service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",mode="system",tenantId="<tenantId>"}) * 100

Query: Process CPU Total Usage by Node

The total (user and system) process CPU utilization of the Query service by node.

sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",tenantId="<tenantId>"}) * 100

Query: Process CPU (User) Usage by Node

The user-space process CPU utilization of the Query service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",mode="user",tenantId="<tenantId>"}) * 100

Query: Total Process Memory Usage by Node

The total process memory usage of the Query service by node.

sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="cbq-engine",tenantId="<tenantId>"})

Query: Requests per Second by Node

The per second rate of N1QL query requests over the last 5 minutes by node.

sum by (couchbaseNode) (node:n1ql_requests:rate5m{databaseId="<databaseId>",latency=">0ms",tenantId="<tenantId>"})

Data: Replica Items by Node

Number of replica items (documents) stored on each node.

sum by (couchbaseNode) (node_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})

Data: SDK Data Service Mutation Durable Duration Count by Node

Counts durable Key-Value mutations that include full persistence and replication guarantees for data safety.

sum by (couchbaseNode) (sdk_kv_mutation_durable_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Mutation Durable Duration Sum by Node

Accumulates total time spent on durable Key-Value mutations including persistence and replication overhead.

sum by (couchbaseNode) (sdk_kv_mutation_durable_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Mutation NonDurable Duration Count by Node

Counts non-durable Key-Value mutation operations that prioritize performance by completing upon memory persistence.

sum by (couchbaseNode) (sdk_kv_mutation_nondurable_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Mutation NonDurable Duration Sum by Node

Accumulates total time spent on non-durable Key-Value mutations that prioritize performance over durability guarantees.

sum by (couchbaseNode) (sdk_kv_mutation_nondurable_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Cancelled Requests by Node

Key-Value operations cancelled before completion due to client-side request cancellation or application shutdown.

sum by (couchbaseNode) (sdk_kv_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK KV Service Timed Out Requests by Node

Key-Value operations that exceeded their timeout threshold before completion due to network latency or server overload.

sum by (couchbaseNode) (sdk_kv_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Total Requests by Node

Comprehensive count of all Key-Value operations including GET, SET, DELETE and other CRUD operations.

sum by (couchbaseNode) (sdk_kv_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Retrieval Duration Count by Node

Counts total number of Key-Value retrieval operations completed providing data for throughput and performance analysis.

sum by (couchbaseNode) (sdk_kv_retrieval_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Retrieval Duration Sum by Node

Accumulates total time spent on all Key-Value retrieval operations enabling calculation of average retrieval times.

sum by (couchbaseNode) (sdk_kv_retrieval_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: SDK Query Service Duration Count by Node

Counts total number of SQL++ query operations completed providing data for calculating average query durations.

sum by (couchbaseNode) (sdk_query_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: SDK Query Service Duration Sum by Node

Accumulates total time spent executing all SQL++ queries enabling calculation of average query durations.

sum by (couchbaseNode) (sdk_query_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: SDK Query Service Cancelled Requests by Node

SQL++ query operations cancelled before completion due to client-side cancellation or application timeouts.

sum by (couchbaseNode) (sdk_query_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: SDK Query Service Timed Out Requests by Node

SQL++ query operations that exceeded their timeout threshold due to complexity, resource constraints, or network issues.

sum by (couchbaseNode) (sdk_query_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: SDK Query Service Total Requests by Node

Total number of SQL++ query operations executed through the SDK encompassing all query types.

sum by (couchbaseNode) (sdk_query_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})

Search: SDK Search Service Duration Count by Node

Counts total number of Full Text Search operations completed providing data for calculating average search durations.

sum by (couchbaseNode) (sdk_search_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

Search: SDK Search Service Duration Sum by Node

Accumulates total time spent executing all Full Text Search operations enabling calculation of average search durations.

sum by (couchbaseNode) (sdk_search_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

Search: SDK Search Service Cancelled Requests by Node

Full Text Search operations cancelled before completion due to client-side request cancellation or timeout handling.

sum by (couchbaseNode) (sdk_search_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})

Search: SDK Search Service Timed Out Requests by Node

Full Text Search operations that exceeded their timeout threshold due to complex queries or resource constraints.

sum by (couchbaseNode) (sdk_search_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})

Search: SDK Search Service Total Requests by Node

Total number of Full Text Search operations performed through the SDK including all search query types.

sum by (couchbaseNode) (sdk_search_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})

XDCR: Process CPU (System) Usage by Node

The system-space process CPU utilization of the XDCR service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",mode="system",tenantId="<tenantId>"}) * 100

XDCR: Process CPU Total Usage by Node

The total (user and system) process CPU utilization of the XDCR service by node.

sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",tenantId="<tenantId>"}) * 100

XDCR: Process CPU (User) Usage by Node

The user-space process CPU utilization of the XDCR service by node.

sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",mode="user",tenantId="<tenantId>"}) * 100

XDCR: Total Process Memory Usage by Node

The total process memory usage of the XDCR service by node.

sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="goxdcr",tenantId="<tenantId>"})

Data: SDK Data Service Mutation Durable Duration by Upper Bound of Bucket(le)

Histogram of durable Key-Value mutation durations that ensure data persistence and replication before completion.

sum by (couchbaseNode, le) (sdk_kv_mutation_durable_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Mutation NonDurable Duration by Upper Bound of Bucket(le)

Histogram of non-durable Key-Value mutation durations that prioritize performance by completing when data reaches memory.

sum by (couchbaseNode, le) (sdk_kv_mutation_nondurable_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: SDK Data Service Retrieval Duration by Upper Bound of Bucket(le)

Histogram of Key-Value retrieval operation durations organized by latency buckets for performance analysis.

sum by (couchbaseNode, le) (sdk_kv_retrieval_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

Query: SDK Query Service Duration by Upper Bound of Bucket(le)

Histogram of SQL++ query operation durations organized by latency buckets for performance analysis.

sum by (couchbaseNode, le) (sdk_query_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

Search: SDK Search Service Duration by Upper Bound of Bucket(le)

Histogram of Full Text Search operation durations organized by latency buckets for performance analysis.

sum by (couchbaseNode, le) (sdk_search_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Compute Units by Bucket

Number of compute units by bucket.

sum by (bucket) (meter_cu_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Current Active Items by Bucket

Number of current active items by bucket.

sum by (bucket) (bucket_state:kv_vb_curr_items:sum{bucket="<bucket>",databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

Data: Disk Used by Bucket

Total disk storage consumed by each serverless bucket.

sum by (bucket) (bucket:kv_ep_db_file_size_bytes:sum{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Bucket GET Ops per Second

Bucket GET Ops per Second is the average number of GET operations per second over the last 5 minutes by bucket.

sum by (bucket, result) (bucket_op:kv_ops:rate5m{bucket="<bucket>",databaseId="<databaseId>",op="get",tenantId="<tenantId>"})

Data: Bucket SET Ops per Second

Bucket SET Ops per Second is the average number of SET operations per second over the last 5 minutes by bucket.

sum by (bucket, result) (bucket_op:kv_ops:rate5m{bucket="<bucket>",databaseId="<databaseId>",op="set",tenantId="<tenantId>"})

Data: Read Units by Bucket

Number of units read by bucket.

sum by (bucket) (meter_ru_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

Data: Write Units by Bucket

Number of units written by bucket.

sum by (bucket) (meter_wu_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

XDCR: xdcr_resp_wait_time_seconds (avg)

The rolling average amount of time it takes from when a MemcachedRequest is created to be ready to route to an outnozzle to the time that the response has been heard back from the target node after a successful write.

avg by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_resp_wait_time_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})

XDCR: xdcr_resp_wait_time_seconds (max)

max by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_resp_wait_time_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})

XDCR: xdcr_wtavg_docs_latency_seconds

The rolling average amount of time it takes for the source cluster to receive the acknowledgement of a SET_WITH_META response after the Memcached request has been composed to be processed by the XDCR Target Nozzle.

max by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_wtavg_docs_latency_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})

For AI agents:

Monitoring Reference

See Also