Monitoring Reference

  • Capella Operational
  • reference
    +
    This reference lists the metric graphs displayed in the Capella UI Monitoring dashboards.

    In the Capella UI, the Monitoring dashboards display a set of metric graphs, enabling users to monitor system performance in real time.

    For more information about Capella’s monitoring dashboards, see View Monitoring Dashboards. For more information about App Service’s monitoring dashboards, see Monitor through the UI.

    This monitoring reference lists:

    • The Graph Name as displayed in the Capella UI.

    • A Description of what this metric graph entails.

    • The Metric calculation method for this metric. For more information about the metrics used, see Metrics Reference.

    The monitoring dashboards show the following metrics:

    Graph Name Description Metric

    App Endpoint: Total Auth Failures

    Total Auth Failures is the total number of authentication failures per app endpoint.

    sum(sgw_security_auth_failed_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Total Auth Successes

    Total Auth Successes is the total number of successful authentication per app endpoint.

    sum(sgw_security_auth_success_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Total Requested Deltas

    Total Requested Deltas is the total number of deltas requested per app endpoint.

    sum(sgw_delta_sync_deltas_requested{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Total Deltas Sent

    Total Deltas Sent is the total number of deltas sent per app endpoint.

    sum(sgw_delta_sync_deltas_sent{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Total Documents Imported

    Total Documents Imported is the total number of documents imported per app endpoint.

    sum(sgw_shared_bucket_import_import_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Total Documents Read

    Total Documents Read is the total number of documents read per app endpoint.

    sum(sgw_database_num_doc_reads_blip{databaseId="<databaseId>",tenantId="<tenantId>"} + sgw_database_num_doc_reads_rest{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Total Documents Rejected

    Total Documents Rejected is the total number of documents rejected per app endpoint.

    sum(sgw_security_num_docs_rejected{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Total Documents Written

    Total Documents Written is the total number of documents written per app endpoint.

    sum(sgw_database_num_doc_writes{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Endpoint: Active Pull Only Replications

    Active Pull Only Replications is the total number of active pull only replication operations performed per app endpoint.

    sum(sgw_replication_pull_num_pull_repl_active_one_shot{databaseId="<databaseId>",tenantId="<tenantId>"} + sgw_replication_pull_num_pull_repl_active_continuous{databaseId="<databaseId>",tenantId="<tenantId>"})

    App Service: Bytes Received by Node

    Total bytes received on the primary network interface by node.

    sum by (couchbaseNode) (node_network_receive_bytes_total{databaseId="<databaseId>",device="eth0",syncgatewayId!="",tenantId="<tenantId>"})

    App Service: Bytes Sent by Node

    Total bytes sent on the primary network interface by node.

    sum by (couchbaseNode) (node_network_transmit_bytes_total{databaseId="<databaseId>",device="eth0",syncgatewayId!="",tenantId="<tenantId>"})

    App Service: CPU Utilization by Node

    CPU utilization percentage of the Sync Gateway process by node.

    sum by (couchbaseNode) (sgw_resource_utilization_process_cpu_percent_utilization{databaseId="<databaseId>",tenantId="<tenantId>"} / 10)

    App Service: Memory Utilization by Node

    Memory utilization percentage of the Sync Gateway node.

    sum by (couchbaseNode) (node_memory_MemTotal_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"} - node_memory_MemAvailable_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"}) / (node_memory_MemTotal_bytes{databaseId="<databaseId>",syncgatewayId!="",tenantId="<tenantId>"} > 0 * 100)

    Data: Current Active Items by Bucket

    Number of current active items by bucket.

    sum by (bucket) (bucket_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

    Data: Disk Reads per Second by Bucket

    Average disk reads per second by bucket.

    sum by (bucket) (bucket:kv_bg_load:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Disk Used by Bucket

    Total disk space consumed by each bucket.

    sum by (bucket) (bucket:kv_ep_db_file_size_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Total Disk Write Queue Size by Bucket

    Total disk write queue size by bucket.

    sum by (bucket) (bucket:kv_vb_queue_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: GSI Items Remaining to Index

    A count of items to be indexed using Global Secondary Indexes.

    sum by (bucket) (bucket_connection:kv_dcp_items_remaining:sum{connection_type="secidx",databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Bucket GET Ops per Second

    Bucket GET Ops per Second is the average number of GET operations per second over the last 5 minutes by bucket.

    sum by (bucket) (bucket_op:kv_ops:rate5m{databaseId="<databaseId>",op="get",tenantId="<tenantId>"})

    Data: Bucket Ops per Second

    Bucket Ops per Second is the average number of operations per second over the last 5 minutes by bucket.

    sum by (bucket) (bucket:kv_ops:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Bucket SET Ops per Second

    Bucket SET Ops per Second is the average number of SET operations per second over the last 5 minutes by bucket.

    sum by (bucket) (bucket_op:kv_ops:rate5m{databaseId="<databaseId>",op="set",tenantId="<tenantId>"})

    Data: Quota Memory Used Percent By Bucket/Node

    Quota memory usage percent by bucket and node.

    sum by (bucket, couchbaseNode) (kv_mem_quota_usage_ratio{databaseId="<databaseId>",tenantId="<tenantId>"}) * 100

    Data: Memory Used by Bucket

    Memory usage per bucket.

    sum by (bucket) (bucket:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Out Of Memory (OOM) Errors by Bucket

    Out of Memory Errors (OOM) by bucket.

    sum by (bucket) (bucket:kv_ep_oom_errors:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Temporary Out Of Memory Errors by Bucket

    A count of Temporary Out of Memory Errors by Bucket.

    sum by (bucket) (bucket:kv_ep_tmp_oom_errors:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: vBuckets (Active)

    Count of active vBuckets used to distribute data across nodes.

    sum by (bucket) (bucket_state:kv_num_vbuckets:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

    Data: Active Item Resident Ratio by Bucket

    Ratio of unique items in memory compared to on disk, per bucket.

    avg by (bucket) (bucket_state:kv_vb_perc_mem_resident_ratio:avg{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

    Data: vBuckets (Replica)

    Count of replicate vBuckets used to distribute data across nodes.

    sum by (bucket) (bucket_state:kv_num_vbuckets:sum{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})

    Data: Replica Item Resident Ratio by Bucket

    Ratio of replica items in memory compared to on disk, per bucket.

    avg by (bucket) (bucket_state:kv_vb_perc_mem_resident_ratio:avg{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})

    Analytics: Storage Used

    The total size of remote s3 storage used.

    sum(cbas_remote_storage_size_bytes{databaseId="<databaseId>",tenantId="<tenantId>"})

    Analytics: Total Requests per Second

    Total number of received analytics requests over the last 5 minutes for the entire cluster.

    sumrate(cbas_requests_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m])

    Data: Connections

    Count of connections to the cluster.

    sum(database:kv_curr_connections:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Cluster GET Ops per Second

    Cluster GET Ops per Second is the average total number of GET operations per second over the last 5 minutes across all buckets.

    sum(op:kv_ops:rate5m{databaseId="<databaseId>",op="get",tenantId="<tenantId>"})

    Data: Cluster Ops per Second

    Cluster Ops per Second is the average total number of operations per second over the last 5 minutes across all buckets.

    sum(database:couchbase_bucket_basicstats_opspersec:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Cluster SET Ops per Second

    Cluster SET Ops per Second is the average total number of SET operations per second over the last 5 minutes across all buckets.

    sum(op:kv_ops:rate5m{databaseId="<databaseId>",op="set",tenantId="<tenantId>"})

    Data: Cluster Total Memory Used

    Total memory used by the cluster.

    sum(database:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: Total Requests per Second

    The per second rate of SQL++ query requests over the last 5 minutes for the entire cluster.

    sum by (latency) (database:n1ql_requests:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Data Encryption Status by Node

    Provides status information about encryption-at-rest functionality enabling monitoring of encryption coverage and operational health.

    sum by (couchbaseNode, data_type) (cm_encr_at_rest_data_status{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Data Encryption Keys In Use by Node

    Counts active Data Encryption Keys currently being utilized providing insight into encryption key distribution and resource consumption.

    sum by (couchbaseNode, type) (cm_encr_at_rest_deks_in_use{databaseId="<databaseId>",tenantId="<tenantId>"})

    Columnar: Connections

    Count of connections to the cluster.

    sum(database:kv_curr_connections:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Columnar: CPU Utilization by Node

    Real-time CPU utilization percentage per Columnar node.

    sum by (couchbaseNode) (node_cpu_utilization_rate{databaseId="<databaseId>",tenantId="<tenantId>"})

    Node: Disk Read IOPS by Node

    Disk read Input/Output Operations Per Second for each node providing insight into storage read activity patterns.

    sum by (couchbaseNode) (rate(disk_iops_reads{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Node: Disk Total IOPS by Node

    Combined disk Input/Output Operations Per Second (read + write) providing comprehensive view of total storage I/O activity.

    sum by (couchbaseNode) (rate(disk_iops_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Node: Disk Write IOPS by Node

    Disk write Input/Output Operations Per Second for each node measuring storage write activity intensity.

    sum by (couchbaseNode) (rate(disk_iops_writes{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Node: Disk Read Throughput by Node

    Rate of data read from disk storage by each node measured in bytes per second for bandwidth utilization monitoring.

    sum by (couchbaseNode) (rate(disk_bytes_read{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Node: Disk Total Throughput by Node

    Combined disk throughput (read + write) providing comprehensive view of total storage bandwidth utilization.

    sum by (couchbaseNode) (rate(disk_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Node: Disk Write Throughput by Node

    Rate of data written to disk storage by each node including document persistence and compaction operations.

    sum by (couchbaseNode) (rate(disk_bytes_written{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Query: Query Request Time Average in Milliseconds

    Total end-to-end time to process all queries.

    sum(rate(n1ql_request_time{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]) > 0) / 1e+06 / sum(rate(n1ql_requests{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]))

    Query: Query Execution Time Average in Milliseconds

    Time to execute all queries.

    sum(rate(n1ql_service_time{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]) > 0) / 1e+06 / sum(rate(n1ql_requests{databaseId="<databaseId>",tenantId="<tenantId>"}[10m]))

    Data: Active Items by Node

    Number of active items (documents) stored on each node.

    sum by (couchbaseNode) (node_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

    Analytics: Parse Failure Rate by Link/Bucket

    The per second rate of record parsing failures from linked items, averaged over the last 5 minutes by link and bucket.

    sum by (link, bucket) (rate(cbas_failed_to_parse_records_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Analytics: Total Parse Failures by Link/Bucket

    The total number of record parsing failures from linked items, averaged over the last 5 minutes by link and bucket.

    sum by (link, bucket) (cbas_failed_to_parse_records_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Analytics: Ingested Bytes Rate by Link/Bucket

    The per second rate of incoming bytes ingested by analytics, averaged over the last 5 minutes by link and bucket.

    sum by (link, bucket) (rate(cbas_incoming_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Analytics: Total Ingested Bytes by Link/Bucket

    The total incoming bytes ingested by analytics by link and bucket.

    sum by (link, bucket) (cbas_incoming_bytes_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Analytics: Linked Ops Rate by Link/Bucket

    The per second rate of linked record operations processed by analytics, averaged over the last 5 minutes by link and bucket.

    sum by (link, bucket) (rate(cbas_incoming_records_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Analytics: Total Linked Ops by Link/Bucket

    The total number of linked record operations processed by analytics by link and bucket.

    sum by (link, bucket) (cbas_incoming_records_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Analytics: Read Rate by Node

    The per second rate at which disk bytes are read for the Analytics Service averaged over the last 5 minutes by node.

    sum by (couchbaseNode) (rate(cbas_io_reads_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Analytics: Write Rate by Node

    The per second rate at which disk bytes are written for the Analytics Service averaged over the last 5 minutes by node.

    sum by (couchbaseNode) (rate(cbas_io_writes_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Analytics: Total Requests per Second by Node

    Total number of received analytics requests over the last 5 minutes by node.

    sum by (couchbaseNode) (rate(cbas_requests_total{databaseId="<databaseId>",tenantId="<tenantId>"}[5m]))

    Analytics: System Load per Node

    The Analytics System load average for the last minute by node.

    sum by (couchbaseNode) (cbas_system_load_average{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Data Encryption Key Fetch Failures by Node

    Percentage of failed attempts to retrieve Data Encryption Keys from the key management service due to network or authentication issues.

    sum by (couchbaseNode, type) cm_encr_at_rest_generate_dek_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"}) / (cm_encr_at_rest_generate_dek_total{databaseId="<databaseId>",tenantId="<tenantId>"} > 0 * 100

    Data: Data Encryption Key Fetch Frequency by Node

    Measures how frequently Data Encryption Keys are retrieved from the key management service reflecting encryption activity levels.

    sum by (couchbaseNode, type) (cm_encr_at_rest_generate_dek_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Data Encryption Key Rotation Failures by Node

    Tracks encryption key rotation process failures per node, which can pose security risks and compliance violations.

    sum by (couchbaseNode, key_name) cm_encryption_key_rotation_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"}) / (cm_encryption_key_rotations_total{databaseId="<databaseId>",tenantId="<tenantId>"} > 0 * 100

    Data: Data Encryption Key Rotation Frequency by Node

    Tracks how frequently encryption keys are rotated per node, providing insight into security policy compliance and key management activity.

    sum by (couchbaseNode, key_name) (cm_encryption_key_rotations_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Data Encryption Service Failures by Node

    Measures failures within the data encryption service infrastructure that can impact data protection capabilities and compliance.

    sum by (couchbaseNode, failure_type) (cm_encryption_service_failures_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Node: CPU Utilization by Node

    Real-time CPU utilization percentage per node.

    sum by (couchbaseNode) (node_cpu_utilization_rate{databaseId="<databaseId>",tenantId="<tenantId>"})

    Node: Disk Used Percent by Node

    The used percentage of each node's disk space.

    sum by (couchbaseNode) (node_disk_usage_ratio{databaseId="<databaseId>",tenantId="<tenantId>"}) * 100

    Data: Disk Used Bytes by Node

    Total disk space currently consumed on each node.

    sum by (couchbaseNode) (node_disk_used{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Average Item Size

    Average size of the indexed keys by node.

    avg by (couchbaseNode) (node:index_avg_item_size:avg{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Cache Hits per Second

    The per second rate of cache hits averaged over the last 5 minutes by node.

    sum by (couchbaseNode) (node:index_cache_hits:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Cache Misses per Second

    The per second rate of cache misses averaged over the last 5 minutes by node.

    sum by (couchbaseNode) (node:index_cache_misses:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Indexer Codebook Memory Usage by Node

    Memory Usage of Vector Index Codebook by Node.

    sum by (couchbaseNode) (index_codebook_mem_usage{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Indexer Codebook Train Duration By Node

    Training Duration of Vector Index Codebook by Node in Seconds.

    sum by (couchbaseNode) (index_codebook_train_duration{databaseId="<databaseId>",tenantId="<tenantId>"} / 1e+09)

    Index: Process CPU (System) Usage by Node

    The system-space process CPU utilization of the Index service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",mode="system",tenantId="<tenantId>"}) * 100

    Index: Process CPU Total Usage by Node

    The total (user and system) process CPU utilization of the Index service by node.

    sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",tenantId="<tenantId>"}) * 100

    Index: Process CPU (User) Usage by Node

    The user-space process CPU utilization of the Index service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="indexer",mode="user",tenantId="<tenantId>"}) * 100

    Index: Indexable Data Size

    The size of indexable data that is maintained by the indexer by node.

    sum by (couchbaseNode) (node:index_raw_data_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Disk Size

    Total disk file size consumed by all indexes by node.

    sum by (couchbaseNode) (node:index_disk_size:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Documents Indexed per Second

    The per second rate of documents indexed averaged over the last 5 minutes by node.

    sum by (couchbaseNode) (node:index_num_docs_indexed:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Documents Pending

    Number of documents pending to be indexed by node.

    sum by (couchbaseNode) (node:index_num_docs_pending:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Documents Queued

    Number of documents queued to be indexed by node.

    sum by (couchbaseNode) (node:index_num_docs_queued:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Item Count

    The number of items currently indexed by node.

    sum by (couchbaseNode) (node:index_items_count:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Total Process Memory Usage by Node

    The total process memory usage of the Index service by node.

    sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="indexer",tenantId="<tenantId>"})

    Index: Diverging Replica Indexes

    Number of Diverging Replica Indexes.

    sum by (couchbaseNode) (index_num_diverging_replica_indexes{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Requests per Second

    The per second rate of requests to the indexer averaged over the last 5 minutes by node.

    sum by (couchbaseNode) (node:index_num_requests:rate5m{databaseId="<databaseId>",tenantId="<tenantId>"})

    Index: Resident Percent

    Percentage of the data held in memory by the indexer by node.

    avg by (couchbaseNode) (node:index_resident_percent:avg{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Process CPU (System) Usage by Node

    The system-space process CPU utilization of the Data service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",mode="system",tenantId="<tenantId>"}) * 100

    Data: Process CPU Total Usage by Node

    The total (user and system) process CPU utilization of the Data service by node.

    sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",tenantId="<tenantId>"}) * 100

    Data: Process CPU (User) Usage by Node

    The user-space process CPU utilization of the Data service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="memcached",mode="user",tenantId="<tenantId>"}) * 100

    Data: Total Process Memory Usage by Node

    The total process memory usage of the Data service by node.

    sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="memcached",tenantId="<tenantId>"})

    Node: Memory Used by Node

    Memory usage per node.

    sum by (couchbaseNode) (node:kv_mem_used_bytes:sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: Process CPU (System) Usage by Node

    The system-space process CPU utilization of the Query service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",mode="system",tenantId="<tenantId>"}) * 100

    Query: Process CPU Total Usage by Node

    The total (user and system) process CPU utilization of the Query service by node.

    sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",tenantId="<tenantId>"}) * 100

    Query: Process CPU (User) Usage by Node

    The user-space process CPU utilization of the Query service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="cbq-engine",mode="user",tenantId="<tenantId>"}) * 100

    Query: Total Process Memory Usage by Node

    The total process memory usage of the Query service by node.

    sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="cbq-engine",tenantId="<tenantId>"})

    Query: Requests per Second by Node

    The per second rate of N1QL query requests over the last 5 minutes by node.

    sum by (couchbaseNode) (node:n1ql_requests:rate5m{databaseId="<databaseId>",latency=">0ms",tenantId="<tenantId>"})

    Data: Replica Items by Node

    Number of replica items (documents) stored on each node.

    sum by (couchbaseNode) (node_state:kv_vb_curr_items:sum{databaseId="<databaseId>",state="replica",tenantId="<tenantId>"})

    Data: SDK Data Service Mutation Durable Duration Count by Node

    Counts durable Key-Value mutations that include full persistence and replication guarantees for data safety.

    sum by (couchbaseNode) (sdk_kv_mutation_durable_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Mutation Durable Duration Sum by Node

    Accumulates total time spent on durable Key-Value mutations including persistence and replication overhead.

    sum by (couchbaseNode) (sdk_kv_mutation_durable_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Mutation NonDurable Duration Count by Node

    Counts non-durable Key-Value mutation operations that prioritize performance by completing upon memory persistence.

    sum by (couchbaseNode) (sdk_kv_mutation_nondurable_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Mutation NonDurable Duration Sum by Node

    Accumulates total time spent on non-durable Key-Value mutations that prioritize performance over durability guarantees.

    sum by (couchbaseNode) (sdk_kv_mutation_nondurable_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Cancelled Requests by Node

    Key-Value operations cancelled before completion due to client-side request cancellation or application shutdown.

    sum by (couchbaseNode) (sdk_kv_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK KV Service Timed Out Requests by Node

    Key-Value operations that exceeded their timeout threshold before completion due to network latency or server overload.

    sum by (couchbaseNode) (sdk_kv_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Total Requests by Node

    Comprehensive count of all Key-Value operations including GET, SET, DELETE and other CRUD operations.

    sum by (couchbaseNode) (sdk_kv_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Retrieval Duration Count by Node

    Counts total number of Key-Value retrieval operations completed providing data for throughput and performance analysis.

    sum by (couchbaseNode) (sdk_kv_retrieval_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Retrieval Duration Sum by Node

    Accumulates total time spent on all Key-Value retrieval operations enabling calculation of average retrieval times.

    sum by (couchbaseNode) (sdk_kv_retrieval_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: SDK Query Service Duration Count by Node

    Counts total number of SQL++ query operations completed providing data for calculating average query durations.

    sum by (couchbaseNode) (sdk_query_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: SDK Query Service Duration Sum by Node

    Accumulates total time spent executing all SQL++ queries enabling calculation of average query durations.

    sum by (couchbaseNode) (sdk_query_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: SDK Query Service Cancelled Requests by Node

    SQL++ query operations cancelled before completion due to client-side cancellation or application timeouts.

    sum by (couchbaseNode) (sdk_query_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: SDK Query Service Timed Out Requests by Node

    SQL++ query operations that exceeded their timeout threshold due to complexity, resource constraints, or network issues.

    sum by (couchbaseNode) (sdk_query_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: SDK Query Service Total Requests by Node

    Total number of SQL++ query operations executed through the SDK encompassing all query types.

    sum by (couchbaseNode) (sdk_query_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    Search: SDK Search Service Duration Count by Node

    Counts total number of Full Text Search operations completed providing data for calculating average search durations.

    sum by (couchbaseNode) (sdk_search_duration_milliseconds_count{databaseId="<databaseId>",tenantId="<tenantId>"})

    Search: SDK Search Service Duration Sum by Node

    Accumulates total time spent executing all Full Text Search operations enabling calculation of average search durations.

    sum by (couchbaseNode) (sdk_search_duration_milliseconds_sum{databaseId="<databaseId>",tenantId="<tenantId>"})

    Search: SDK Search Service Cancelled Requests by Node

    Full Text Search operations cancelled before completion due to client-side request cancellation or timeout handling.

    sum by (couchbaseNode) (sdk_search_r_canceled{databaseId="<databaseId>",tenantId="<tenantId>"})

    Search: SDK Search Service Timed Out Requests by Node

    Full Text Search operations that exceeded their timeout threshold due to complex queries or resource constraints.

    sum by (couchbaseNode) (sdk_search_r_timedout{databaseId="<databaseId>",tenantId="<tenantId>"})

    Search: SDK Search Service Total Requests by Node

    Total number of Full Text Search operations performed through the SDK including all search query types.

    sum by (couchbaseNode) (sdk_search_r_total{databaseId="<databaseId>",tenantId="<tenantId>"})

    XDCR: Process CPU (System) Usage by Node

    The system-space process CPU utilization of the XDCR service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",mode="system",tenantId="<tenantId>"}) * 100

    XDCR: Process CPU Total Usage by Node

    The total (user and system) process CPU utilization of the XDCR service by node.

    sum by (couchbaseNode) (group_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",tenantId="<tenantId>"}) * 100

    XDCR: Process CPU (User) Usage by Node

    The user-space process CPU utilization of the XDCR service by node.

    sum by (couchbaseNode) (group_mode_node:process_cpu_seconds_total:rate5m{databaseId="<databaseId>",groupname="goxdcr",mode="user",tenantId="<tenantId>"}) * 100

    XDCR: Total Process Memory Usage by Node

    The total process memory usage of the XDCR service by node.

    sum by (couchbaseNode) (sysproc_mem_resident{databaseId="<databaseId>",proc="goxdcr",tenantId="<tenantId>"})

    Data: SDK Data Service Mutation Durable Duration by Upper Bound of Bucket(le)

    Histogram of durable Key-Value mutation durations that ensure data persistence and replication before completion.

    sum by (couchbaseNode, le) (sdk_kv_mutation_durable_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Mutation NonDurable Duration by Upper Bound of Bucket(le)

    Histogram of non-durable Key-Value mutation durations that prioritize performance by completing when data reaches memory.

    sum by (couchbaseNode, le) (sdk_kv_mutation_nondurable_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: SDK Data Service Retrieval Duration by Upper Bound of Bucket(le)

    Histogram of Key-Value retrieval operation durations organized by latency buckets for performance analysis.

    sum by (couchbaseNode, le) (sdk_kv_retrieval_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

    Query: SDK Query Service Duration by Upper Bound of Bucket(le)

    Histogram of SQL++ query operation durations organized by latency buckets for performance analysis.

    sum by (couchbaseNode, le) (sdk_query_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

    Search: SDK Search Service Duration by Upper Bound of Bucket(le)

    Histogram of Full Text Search operation durations organized by latency buckets for performance analysis.

    sum by (couchbaseNode, le) (sdk_search_duration_milliseconds_bucket{databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Compute Units by Bucket

    Number of compute units by bucket.

    sum by (bucket) (meter_cu_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Current Active Items by Bucket

    Number of current active items by bucket.

    sum by (bucket) (bucket_state:kv_vb_curr_items:sum{bucket="<bucket>",databaseId="<databaseId>",state="active",tenantId="<tenantId>"})

    Data: Disk Used by Bucket

    Total disk storage consumed by each serverless bucket.

    sum by (bucket) (bucket:kv_ep_db_file_size_bytes:sum{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Bucket GET Ops per Second

    Bucket GET Ops per Second is the average number of GET operations per second over the last 5 minutes by bucket.

    sum by (bucket, result) (bucket_op:kv_ops:rate5m{bucket="<bucket>",databaseId="<databaseId>",op="get",tenantId="<tenantId>"})

    Data: Bucket SET Ops per Second

    Bucket SET Ops per Second is the average number of SET operations per second over the last 5 minutes by bucket.

    sum by (bucket, result) (bucket_op:kv_ops:rate5m{bucket="<bucket>",databaseId="<databaseId>",op="set",tenantId="<tenantId>"})

    Data: Read Units by Bucket

    Number of units read by bucket.

    sum by (bucket) (meter_ru_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

    Data: Write Units by Bucket

    Number of units written by bucket.

    sum by (bucket) (meter_wu_total{bucket="<bucket>",databaseId="<databaseId>",tenantId="<tenantId>"})

    XDCR: xdcr_resp_wait_time_seconds (avg)

    The rolling average amount of time it takes from when a MemcachedRequest is created to be ready to route to an outnozzle to the time that the response has been heard back from the target node after a successful write.

    avg by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_resp_wait_time_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})

    XDCR: xdcr_resp_wait_time_seconds (max)

    The rolling average amount of time it takes from when a MemcachedRequest is created to be ready to route to an outnozzle to the time that the response has been heard back from the target node after a successful write.

    max by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_resp_wait_time_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})

    XDCR: xdcr_wtavg_docs_latency_seconds

    The rolling average amount of time it takes for the source cluster to receive the acknowledgement of a SET_WITH_META response after the Memcached request has been composed to be processed by the XDCR Target Nozzle.

    max by (couchbaseNode, pipelineType, sourceBucketName, targetBucketName) (xdcr_wtavg_docs_latency_seconds{databaseId="<databaseId>",tenantId="<tenantId>"})