Health Advisor Reference
- Capella Operational
- reference
Use the following as a reference for the different kinds of advice available in a Health Advisor report.
Capella Health Advisor provides a health assessment of your cluster by going through a series of health checks. It generates a report on its findings. The report offers advice on how you can improve the health of your cluster.
The report divides these health checks by severity level and cluster category. Health Advisor shares advice in the report when it flags a health check with a severity level of Needs Review or Warning.
All health checks are configured with a Good severity level, but not all checks are configured to trigger both the Needs Review and Warning level. Health Advisor flags some checks with just Needs Review, while others are flagged directly with Warning. |
For more information about Health Advisor, see View Health Advisor.
Health Advisor can give the following advice:
Advice | Severity | Category | Details | Description |
---|---|---|---|---|
Multiple-Dimensional Scaling |
Needs Review |
Cluster |
Your cluster is currently running multiple Services on a single node. Multi-Dimensional Scaling separates, isolates, and scales your cluster’s Services, based on each Service’s indvidiual workload. |
Your cluster is currently running multiple Services on a single node. We recommend taking advantage of Multi-Dimensional Scaling to better fine-tune your cluster’s Services, and optimize resource consumption and performance. You can add nodes, modify Service Groups, or change node configurations to better support your Services. Review your current cluster configuration or contact Couchbase Support for help with getting the most out of Multi-Dimensional Scaling. |
Multiple Availability Zones |
Needs Review |
Cluster |
Your cluster is currently not deployed using Multiple Availability Zones. Use Multiple Availability Zones to distribute your cluster nodes evenly across your Cloud Service Provider’s (CSP) availability zones. Multiple Availability Zones help keep your data available, even if 1 Availability Zone goes offline. |
Your cluster is currently not deployed using Multiple Availability Zones. Your cluster could be vulnerable to an Availability Zone outage. You cannot change this setting after you deploy your cluster. If you have an enterprise cluster that requires high availability, create a support ticket to get help with migrating your cluster to Multiple Availability Zones. |
Scheduled Backups |
Needs Review |
Cluster |
1 or more buckets on your cluster have no backups scheduled. Use bucket backups or full cluster backups to automate backing up data on your cluster. Use backups to restore your cluster to a previous state. |
1 or more buckets on your cluster have no backups scheduled. We recommend creating a robust scheduled backup and cost optimized retention policy as part of a recovery plan for the data on your cluster. |
Disk Auto-Scaling |
Needs Review |
Cluster |
1 or more Service Groups in your Azure cluster do not have Auto-Scaling enabled. Storage Auto-Scaling automatically expands storage on your cluster, making sure you do not run out of disk space. |
1 or more Service Groups in your Azure cluster do not have Auto-Scaling enabled. We recommend that you turn on Auto-Scaling to avoid running out of disk space on your cluster. |
Cross Data Center Replication |
Needs Review |
Cluster |
1 or more buckets on your cluster have no replications. Cross Data Center Replication (XDCR) replicates data from 1 cluster to another cluster. Use XDCR to protect against data center outages and increase data performance across distributed applications. |
1 or more buckets on your cluster have no replications set up. We recommend setting up XDCR to replicate data from 1 bucket to another bucket on a destination cluster. |
Index Replicas |
Needs Review |
Index |
1 or more indexes on your cluster do not have a replica. Use Index Replicas on secondary indexes across nodes in your cluster to ensure higher availability and performance. |
1 or more indexes on your cluster do not have a replica. If you lose an Index Service node, the Index Service can use another node with replicas of those indexes to keep running queries. We recommend that you add replicas to support availability and performance, using the ALTER INDEX statement. |
Private Networking |
Needs Review |
Cluster |
You have not set up a private network connection for this cluster. Use private networks to make a private connection between your applications and your Capella clusters, using virtual private clouds or virtual network peering. Private networks avoid communicating over the Internet, and reduce your latency and egress costs. |
You have not set up a private network connection for applications to connect to this cluster. We recommend using private connections for connections between your applications and Couchbase Capella, whenever possible. |
Data Resident Ratio |
Needs Review Warning |
Data |
The resident ratio on 1 or more of your buckets is below the recommended threshold. A bucket memory quota sets the maximum memory for a bucket’s chosen sstorage engine. The resident ratio for a bucket is the percentage of its data that’s stored in RAM. |
The resident ratio on 1 or more of your buckets is below the recommended threshold (20% for Couchstore, 4% for Magma). A low resident ratio value might indicate that you have insufficient resources on your cluster. We recommend you increase the bucket memory quota to improve the resident ratio. |
Number of Indexes |
Needs Review |
Index |
You might have too many secondary indexes on your cluster. Use indexes to improve query performance on your cluster. |
You might have too many secondary indexes on your cluster for your current number of CPU cores and memory. Even unused indexes can use memory resources. We recommend keeping 10 or less indexes per CPU core and GB of memory on your cluster. Review your indexes and remove any unneeded indexes. |
Unused Indexes |
Needs Review |
Index |
1 or more indexes on your cluster have never been scanned, or were last scanned more than 30 days ago. Use indexes to improve query performance on your cluster. Replica and system indexes are not included in this check. |
1 or more indexes on your cluster have never been scanned, or were last scanned more than 30 days ago. These indexes are still using memory resources on your cluster. Review your indexes and remove any unneeded indexes. |
Number of KV Client Connections |
Warning |
Data |
Applications attempting to connect to your cluster might be creating too many connections to the Data Service. The Data Service allows up to 60,000 concurrent key-value connections to its client port, 11207. If you exceed this limit, clients will fail to connect to your Capella cluster. |
Applications attempting to connect to your cluster might be creating too many connections to the Data Service. We recommend you analyze your application code for unnecessary client connections. Try to use a single cluster object shared throughout your application - the SDK supports connection pooling. |
Slow Operations |
Warning |
Data |
We have noticed slow Data Service operations on your cluster. The Data Service requires appropriate disk, CPU, and memory to reduce slow operations and timeouts. |
We recommend that you investigate what could be causing slow Data Service operations on your cluster. Try scaling your database to add more resources to the Data Service. |
Memory Usage |
Needs Review |
Node |
The average memory usage on 1 or more of your cluster nodes is 90% or higher. Health Advisor checks for high memory usage on your cluster nodes over the last week. |
The average memory usage on 1 or more of your cluster nodes is 90% or higher. We recommend reviewing your application performance, and you should consider scaling your cluster to add more memory resources. |
CPU Usage |
Needs Review |
Node |
The average CPU usage on 1 or more of your cluster nodes is 70% or higher. Health Advisor checks for high CPU usage on your cluster nodes over the last week. |
The average CPU usage on 1 or more of your cluster nodes is 70% or higher. We recommend reviewing your application performance, and you should consider scaling your cluster to add more CPU resources. |
Slow Queries |
Needs Review |
Query |
We have noticed some slow-running queries on your cluster. Slow queries might indicate insufficient resources on your cluster, poor indexing, or poor query optimization. |
We recommend using the query monitoring tools to check your slow-running queries, and you should consider scaling your cluster to add more resources. |
Query Service Crashes |
Needs Review |
Query |
We have noticed Query Service crashes on your cluster. Use the Query Service to query data on your cluster using the Couchbase SQL++ query language. |
We noticed that the Query Service has crashed on this cluster. Create a support ticket to get help from Couchbase Capella Support. |
Disk Usage |
Needs Review Warning |
Node |
The disk usage on 1 or more of your cluster nodes is 80% or higher, or your disk usage rate is at risk of becoming too high. Health Advisor checks for high disk usage on your cluster nodes, which might indicate insufficient resources on your cluster. |
The disk usage on 1 or more of your cluster nodes is 80% or higher, or your disk usage rate is at risk of becoming too high. We recommend you increase storage on the Service Groups with high disk usage. |
Number of Buckets |
Needs Review |
Data |
You have more than the recommended number of buckets on your cluster. Buckets require a certain amount of resources on your cluster. |
We recommend at least 0.2 CPU cores dedicated to each bucket on this cluster. Reduce the number of buckets in your cluster and try using scopes and collections, instead. You can also consider increasing the number of CPU cores on your cluster nodes. |
Index Resident Ratio |
Needs Review |
Index |
The resident ratio for 1 or more nodes running the Index Service in your cluster is low, indicating potential memory issues. An index resident ratio is the ratio of index data that can be cached in memory on a node running the Index Service. |
The index resident ratio on 1 or more Index Service nodes is at 25% or less. Your data growth has also exceeded 10% in the last week. This can lead to degraded query performance and impact latency on index scans if index data continues to grow. We recommend you increase the memory resources on any nodes running the Index Service. Consider reviewing and removing unused indexes, merging 2 or more indexes into a single index, or using the Index Advisor to build more efficient indexes. |
Index Resident Ratio |
Needs Review |
Index |
The resident ratio for 1 or more nodes running the Index Service in your cluster is low, indicating potential memory issues. An index resident ratio is the ratio of index data that can be cached in memory on a node running the Index Service. |
The index resident ratio on 1 or more Index Service nodes is at 15% or less. This low ratio can lead to slower query performance and increased disk I/O, as your index data is read from disk more frequently. We recommend you increase the memory resources on any nodes running the Index Service as soon as possible. If your indexes are partitioned, add the Index Service to additional nodes in your cluster. You can also consider reviewing and removing unused indexes, merging 2 or more indexes into a single index, or using the Index Advisor to build more efficient indexes. |
Index Resident Ratio |
Warning |
Index |
The resident ratio for 1 or more nodes running the Index Service in your cluster is low, indicating potential memory issues. An index resident ratio is the ratio of index data that can be cached in memory on a node running the Index Service. |
The index resident ratio on 1 or more Index Service nodes is critically low. We recommend you take immediate action to restore query performance and reduce disk I/O for your index data. We recommend you increase the memory resources on any nodes running the Index Service. If your indexes are partitioned, add the Index Service to additional nodes in your cluster. Consider reviewing and removing unused indexes, merging 2 or more indexes into a single index, or using the Index Advisor to build more efficient indexes. Create a support ticket if you need more assistance. |
Primary Indexes |
Needs Review |
Index |
You are using primary indexes on your cluster, which are not recommended in a production environment. Primary indexes provide the equivalent of "full table scan" functionality, but secondary indexes are recommended in production environments for performance. Replica and system indexes are not included in this check. |
We recommend you drop your primary indexes and replace them with secondary indexes - Global Secondary Indexes (GSIs). GSIs ensure optimal query performance in a production environment. |
Overlapping indexes |
Needs Review |
Index |
You have multiple indexes that share index keys in the same order. Overlapping indexes occur when 1 index shares keys in the same order as another index. For example, 1 index with the keys "age" and "name", and another with the keys "age", "name", and "address". The second index can be used for any query covered by the first index. These overlapping indexes increase the work required to build your indexes. Replica and system indexes are not included in this check. |
We recommend you review your overlapping indexes, and drop any indexes that are not needed. |
App Services CPU Usage |
Needs Review |
App Services |
Your average CPU usage on 1 or more of your App Services nodes is 70% or higher. Health Advisor checks for high CPU usage on your App Services nodes over the last week. |
The average CPU usage on 1 or more of your App Services nodes is 70% or higher. We recommend reviewing your application performance, and you should consider scaling your App Services nodes to add more CPU resources. |
App Services Allowed IPs |
Warning |
App Services |
Your App Service is currently allowing any IP address to access the Admin and Metrics APIs. Setting an allowed IP address of 0.0.0.0/0 allows any IP address to connect to your App Service. |
We recommend you restrict your allowed IP addresses to only the IP addresses that need access to your App Service. |
App Services Memory Usage |
Needs Review |
App Services |
Your average memory usage on 1 or more of your App Services nodes is 90% or higher. Health Advisor checks for high memory usage on your App Services nodes over the last week. |
The average memory usage on 1 or more of your App Services nodes is 90% or higher. We recommend reviewing your application’s performance and you should consider scaling your App Services nodes to add more memory resources. |
App Endpoint Anonymous Auth |
Needs Review |
App Services |
1 or more of your App Service’s App Endpoints currently only allows anonymous authentication. Anonymous authentication allows users to access your App Endpoint without authenticating. This can be a security risk. |
1 or more of your App Service’s App Endpoints currently only allows anonymous authentication. If this is not intentional, we recommend you configure basic authentication or OIDC on your App Endpoint. |
App Endpoint Activity Check |
Needs Review |
App Services |
1 or more of your App Endpoints has not had any read or write activity in the last 7 days. A lack of read and write activity on an App Endpoint might indicate that the App Endpoint is not being used. |
We recommend reviewing your current App Endpoints and removing any Endpoints that are no longer needed. |
Allowed IPs |
Warning |
Cluster |
Your cluster is currently allowing access from any IP address. Setting an allowed IP address of 0.0.0.0/0 allows any IP address to connect to your cluster. This is a security risk in a production environment. |
We recommend you restrict your allowed IP addresses to only the IP addresses that need access to your cluster. You can also consider configuring private networking. |
Paused Search Indexes |
Needs Review |
Search |
1 or more of your Search indexes is currently paused. You can pause a Search index to stop loading any new document mutations. This is a security risk in a production environment. |
1 or more of the Search indexes on your cluster are currently paused. We recommend you review the paused indexes, and resume or delete them. Create a support ticket if you need help resuming or deleting your Search index. |
Search Index Replicas |
Needs Review |
Search |
1 or more of your Search indexes do not currently have a replica. Use replicas on your Search indexes to distribute them across other Search Service nodes, adding high availability and improving performance. |
1 or more Search indexes on your cluster do not have a replica. We recommend you add a replica to your Search indexes. |
Disk IO Utilization Time |
Needs Review |
Node |
1 or more nodes in your cluster have experienced a disk IO utilization time of 90% or higher. Health Advisor checks the percentage of time the disk is actively engaged in input and output operations on cluster nodes over the past week. |
1 or more nodes in your cluster have experienced a disk IO utilization time of 90% or higher over the past week. This suggests the disk is spending a significant amount of time servicing data requests, which can cause delays in application response time. We recommend you review your application performance to identify potential bottlenecks and analyze Couchbase Capella metrics to determine if affected nodes require scaling or configuration adjustments. |
Data Service Disk Write Queue |
Needs Review |
Data |
1 or more of your Data Service cluster nodes has a disk write queue of 10,000 or more items. The disk write queue for a Data Service node contains the items waiting to be written to disk. A large disk write queue could indicate that your storage is unable to keep up with the write load on your cluster. |
1 or more of your Data Service cluster nodes has a disk write queue of 10,000 or more items. We recommend you review your application performance and scale your Data Service nodes as needed. |
Hard OOM Errors |
Warning |
Data |
1 or more buckets on your cluster have encountered hard Out-Of-Memory (OOM) errors. A hard Out-Of-Memory (OOM) error indicates that a bucket has run out of available memory. Buckets can run out of memory due to changes in the incoming data, undersized nodes, undersized Service quotas, or long time to live (TTL) settings on documents. |
1 or more buckets on your cluster have encountered hard Out-of-Memory (OOM) errors. We recommend you review your application performance and scale your Data Service nodes as needed. |
Temp OOM Errors |
Warning |
Data |
1 or more buckets on your cluster have encountered temporary Out-Of-Memory (OOM) errors. A temporary Out-Of-Memory (OOM) error occurs when a bucket uses a high percentage of its total memory quota. Repeated, excessive memory usage will eventually lead to hard OOM errors and failed read and write operations. |
1 or more buckets on your cluster have encountered temporary Out-of-Memory (OOM) errors for more than 1% of all operations. We recommend you review your application performance and scale your Data Service nodes as needed. |
OOM Errors |
Warning |
Node |
1 or more nodes in your cluster have encountered an Out-Of-Memory (OOM) error. An Out-Of-Memory (OOM) error can occur on a cluster when a node runs out of memory and cannot allocate more memory to a process. |
1 or more nodes in your cluster have encountered an Out-Of-Memory (OOM) error. We recommend you review your application performance and scale your affected nodes as needed. |
Disk Write Queue |
Needs Review |
Node |
1 or more of your cluster nodes has a disk write queue of 10,000 or more items. The disk write queue for a node contains the items waiting to be written to disk. A large disk write queue could indicate that your storage is unable to keep up with the write load on your cluster. |
1 or more of your cluster nodes has a disk write queue of 10,000 or more items. We recommend you review your application performance and scale your affected nodes as needed. |
Index Mutation Lag |
Needs Review |
Index |
The number of queued and pending mutations to be indexed has passed the recommended threshold on 1 or more of your cluster nodes. The Index Service’s mutation lag is the time it takes for a new document mutation to be indexed. High mutation lag can indicate that the Index Service cannot keep up with your write load. |
1 or more of your cluster nodes has 10,000 or more queued mutations and 50,000 or more pending mutations. We recommend you review your application performance for the average mutation rate, and scale your Index Service nodes as needed. |