Alert Reference

  • Capella AI Services
  • reference
Use this reference for the kinds of alerts that Capella AI Services can send you, including their alert conditions and descriptions.

Metric-Based Alerts

If there is a measurable change in your Capella AI Services resource use, Capella sends notifications that include information about potential causes and remedial actions to investigate.

Capella delivers notifications for these alerts to you by:

Display Name Resource Conditions Description Related Documentation

Rate Limit Events

Language Model

Critical: At least 1 new rate limit event detected in the past 1 hour.

Rate limiting has been triggered on your cluster. This indicates that requests are being throttled due to exceeding rate limits. Please review your request patterns and consider optimizing your queries or upgrading your cluster resources to handle the increased load. If you have further queries or assistance required and Technical Support is included in your chosen cluster plan, you can create a support ticket by clicking on the Create Support Ticket button.

Embedding Service Failure

Language Model

Critical: At least 1 new embedding service failure detected in the past 1 hour.

Embedding service has experienced failures. This indicates failures in generating embeddings from the embedding service. Please check the embedding service status and logs for details. Verify that the service is properly configured and accessible. If you have further queries or assistance required and Technical Support is included in your chosen cluster plan, you can create a support ticket by clicking on the Create Support Ticket button.

Embedding Write Failures High

Language Model

Warning: Embedding write failure rate has exceeded 1% over the past 5 minutes.

This indicates issues with writing embeddings to source documents. Please reach out to support to help address the issue.

A node is reporting that the AI gateway service is down and requires immediate attention.

Language Model

Critical: A node is reporting that the AI gateway service is down and requires immediate attention.

A node is reporting that the AI gateway service is down and requires immediate attention. Please reach out to support to help address the issue.

AI Service Instances CPU Usage

Language Model

Critical: At least 1 AI service is experiencing critical CPU usage.

Warning: At least 1 AI service is experiencing high CPU usage.

Please review your AI service and consider scaling your resources to handle the increased load. Contact Support if you experience a critical impact.

Capella Model Services AI Gateway is observing query failures

Language Model

Warning: The Couchbase Model Service Gateway is observing a high number of query failures, breaching the threshold of over 50 within an hour.

We’ve detected a potential issue affecting usage of one of your Language Model Services. Please check your queries to identify the cause of these failures. Contact Support if you experience a critical impact.

Couchbase Model Service AI Gateway Queue Size Exceeded

Language Model

Critical: The Couchbase Model Service Gateway queue size has exceeded 50,000 within a 5-minute period.

We’ve detected that the AI Gateway queue size has grown beyond normal capacity, which may indicate performance issues or high load. Please review your AI Gateway workload and consider scaling your resources. Contact Support if you experience a critical impact.

Capella AI Functions failed invocations

Cluster

Warning: The Couchbase AI Functions Service is observing a high number of failed invocations, breaching the threshold of more than three failures within 5 minutes.

We have detected a potential issue affecting one of your AI Functions, making it unavailable. Please check your functions or create a new AI Function. Contact Support if you experience a critical impact.

Capella AI Workflow Service File Processing Failures

Cluster

Warning: The Couchbase AI Workflow Service is experiencing a high number of file processing failures, breaching the threshold of more than 1% within a 5-minute period.

We have detected a potential issue affecting your AI Workflow Service file processing operations. Please review your workflows to identify the cause of these failures. Contact Support if you experience a critical impact.

A node is reporting that the Vulcan service is down and requires immediate attention

Cluster

Critical: A node is reporting that the Vulcan service is down and requires immediate attention.

A node is reporting that the Vulcan service is down and requires immediate attention. Please reach out to support to help address the issue.