Alert Reference
- Capella Operational
- reference
This reference lists the alerts that Capella can emit, the conditions in which they occur, and a description for each.
Metric-Based Alerts
For alerts caused by measurable changes to Capella resource use, notification messages include information about potential causes and remedial actions to investigate.
Capella delivers notifications for these alerts to you by:
-
Displaying a message banner in the Capella UI.
-
Sending email to users who enable email notifications for their accounts.
-
Delivering messages to a third-party notification system through an alert integration if configured for the project.
Display Name | Resource | Conditions | Description | Related Documentation |
---|---|---|---|---|
High CPU Usage Warning |
Cluster |
Critical: during a one-minute interval, the five minute average CPU usage of one or more cluster nodes exceeded 90%. Warning: during a five-minute interval, the five-minute average CPU usage of one or more cluster nodes exceeded 85%. |
High CPU usage events can impact the throughput of your cluster. This issue could be due to recent changes in the downstream application or dataset, such as changes to the data type sent, the amount of data sent, or natural data/transaction growth. Consider scaling your service nodes to address the issue. If new queries were recently introduced, validate and add the required indexes. |
|
Low Node Disk Storage |
Cluster |
Critical: disk usage is more than 90 % for the last 5 minutes. Warning: disk usage is more than 80% for the last 5 minutes. |
This issue could be due to spikes in data usage or natural data growth. Consider expanding your cluster storage immediately to resolve the issue. Inaction could result in service disruption. |
|
Runaway Disk Queue |
Cluster |
Critical: the disk queue has reached over 800,000 requests during a five minute period. Warning: the disk queue has reached over 500,000 requests during a five minute period. |
A bucket is experiencing a runaway disk queue. This is when data is added to the write queue faster than the node can write to the bucket. This issue can be caused by a sudden spike of incoming transactions or an undersized cluster configuration that cannot keep up with its workload. Consider validating incoming data before scaling node capacity. |
|
Bucket Hard Out of Memory |
Cluster |
Critical: there has been one or more out of memory errors in the past five minutes. |
A bucket exceeded its available memory and requires immediate attention. This issue can be caused by changes to the incoming data, undersized service nodes, undersized service quotas, or a long time to live (TTL) setting on documents. Consider immediately adding additional memory or nodes to resolve the issue. |
|
Node High IOPS Usage |
Cluster |
Warning: A node in your cluster has been consistently utilizing over 90% of its allocated IOPS for the past 30 minutes. Critical: A node in your cluster has been consistently utilizing over 95% of its allocated IOPS for the past 30 minutes. |
High IOPS usage may slow down document reads and writes, query execution, indexing, and maintenance tasks such as rebalance and compaction. To maintain optimal performance, consider increasing the allocated IOPS value in your cluster configuration. Review your cluster’s disk performance in the Monitoring UI and check the Health Advisor reports for Disk IOPS scaling recommendations. |
|
Node Low Index Resident Ratio |
Cluster |
Warning: The index resident ratio on 1 or more Index Service nodes is at 10% or less in the past 5 minutes. Critical: The index resident ratio on 1 or more Index Service nodes is at 5% or less in the past 5 minutes. |
A low index resident ratio may degrade query performance, and even affect your cluster’s ability to run routine maintenance. To maintain optimal performance, we recommend scaling your cluster to increase the memory resources on any nodes running the Index Service. If your indexes are partitioned, add the Index Service to additional nodes in your cluster. You can also consider reviewing and removing unused indexes, merging 2 or more indexes into a single index, or using the Index Advisor to build more efficient indexes. Create a support ticket if you need more assistance. |
|
Index Has Diverging Replicas |
Cluster |
Critical: The Couchbase Index Service is observing 2 (or more) replicas are diverging, breaching the threshold of 1 within 30 minutes. |
Divergence in item counts between index partition replicas is detected. This can cause index scan to return inconsistent data.
Please identify the index with |
|
Node High Throughput Usage |
Cluster |
Warning: A node in your cluster has been consistently utilizing over 90% of its allocated disk throughput for the past 30 minutes. Critical: A node in your cluster has been consistently utilizing over 95% of its allocated disk throughput for the past 30 minutes. |
High throughput usage may slow down document reads and writes, query execution, indexing, and maintenance tasks such as rebalance and compaction. To maintain optimal performance, consider increasing the allocated IOPS value in your cluster configuration. Disk throughput scales automatically when allocated IOPS are increased. Review your cluster’s disk performance in the Monitoring UI and check the Health Advisor reports for Disk IOPS scaling recommendations. |
|
App Service High Data Sync Errors Warning |
App Services |
Warning: In a 5-minute interval, more than 10 documents were rejected by the App Endpoint’s sync function. |
These documents will not be accessible via the App Endpoint. If this is unexpected, troubleshoot the Sync Function or contact Customer Support with details of the intended use of the Sync Function to help troubleshoot the errors. This only corresponds to sync function rejections and not sync function exceptions. Rejections are logged in Sync Gateway logs at info and debug levels. |
|
App Service High Import Errors Warning |
App Services |
Warning: In a 5-minute interval, more than 10 documents failed to import due to error. |
These documents will not be accessible through the App Endpoint. The documents may have been rejected by the App Endpoint’s Sync Function or encountered an error processing the document. There may be an error in the Sync Function’s logic, by a writer error or timeout. This alert is not caused by a CAS failure, a canceled import, or an already imported document. Import errors are logged in the Sync Gateway at the info level. |
|
App Service High Access Errors Warning |
App Services |
Warning: In the last 5 minutes, more than 50 requests made to the App Endpoint failed to successfully authenticate. |
A high volume of unsuccessful authentications may indicate malicious clients attempting to access the system. It can be caused by:
|
|
App Service High CPU Usage |
App Services |
Warning: During a 5-minute interval, the 5-minute average CPU usage of one or more App Service nodes exceeded 90%. Critical: During a 1-minute interval, the 5-minute average CPU usage of one or more App Service nodes exceeded 95%. |
High CPU usage events can affect the throughput and latency of App Endpoints. This issue could be due to changes in the downstream application or dataset, such as changes to the number of requests or connections, the amount of data sent, or natural data/request growth. It may also be related to changes to the endpoint’s access control function or dataset. If these changes are expected, you may consider scaling your App Service deployment to address the issue. |
|
App Service High Memory Usage |
App Services |
Warning: During a 5-minute interval, the 5-minute average memory usage of one or more app services exceeded 85%. Critical: During a 1-minute interval, the 5-minute average memory usage of one or more app services exceeded 90%. |
High memory usage events can impact the throughput of your service. This issue could be due to recent changes in the downstream application or dataset, such as changes to the data type sent, the amount of data sent, or natural data/transaction growth. Consider scaling your service nodes to address the issue. |
Operational Alerts
When an alert results from operational disruption, Capella proactively notifies Couchbase Support.
Capella delivers notifications for these alerts to you by:
-
Displaying a message banner in the Capella UI.
-
Sending email to users who enable email notifications for their accounts.
Display Name | Resource | Conditions | Description | Related Documentation |
---|---|---|---|---|
Backup Failed |
Cluster |
Warning: a cluster backup did not complete. |
A backup for a bucket in this cluster failed to complete. Retry the backup. If you continue to experience issues, please contact Capella Support. |
|
Backup Deletion Failed |
Cluster |
Warning: a cluster backup deletion did not complete. |
A backup for a bucket in this cluster has failed to delete. Retry the backup deletion. If you continue to experience issues, please contact Capella Support. |
|
Backup Create Download Failed |
Cluster |
Warning: the creation of a downloadable backup file has failed. |
The process to create a downloadable backup file from a backup cycle has failed for a bucket in this cluster. Please retry the operation. If you continue to experience issues, please contact Couchbase Capella Support. |
|
Restore Failed |
Cluster |
Warning: a bucket restoration operation did not complete. |
The process to restore data from a backup has failed for a bucket in this cluster. Some contents from the backup may have been successfully restored. Please retry the operation. If you continue to experience issues, please contact Couchbase Capella Support. |
|
Cluster Deployment Failed |
Cluster |
Warning: a cluster deployment did not complete. |
A cluster failed to deploy. Clusters that fail deployment cannot guarantee service functionality. This issue could be due to an underlying service or limit issue. Retry the deployment or contact Capella Support for assistence. |
|
Cluster Peering Failed |
Cluster |
Warning: a cluster peering operation did not complete. |
The process to peer a cluster has failed. Please contact Couchbase Capella Support for assistance. |
|
App Services Cert Expiration |
App Services |
Warning: The Public CA-signed certificate for one or more App Services will soon expire. |
The certs will automatically be updated on expiration. If you have pinned the cert within your application, you should plan to upgrade your application with the new certificate to avoid an outage. Contact support if you would prefer to upgrade your App Service certificate ahead of the expiration date. Otherwise, no action is required from you. |
|
App Services Cert Expired |
App Services |
Warning: The Public CA-signed certificate has been updated for one or more App Services. |
If you have pinned the certificate within your application, please download the updated certificate and update your application. Otherwise, no action is required. |