Automatic Workload Repository

    • Couchbase Server 8.0
      +
      Monitor and optimize query performance and workload using Automatic Workload Repository (AWR).

      Overview

      Automatic Workload Repository (AWR) is a feature that captures and maintains performance statistics for queries executed on your Couchbase cluster. It acts as a centralized repository for query performance data, enabling you to track query activities, analyze workload trends, and identify performance bottlenecks.

      For example, some queries may run efficiently with minimal overhead, while others may consume more resources or take longer to complete. With AWR, you can understand these differences and optimize your queries accordingly. It also allows you to generate reports to compare query performances over time.

      When enabled, AWR automatically gathers detailed metrics from the Query Service for every query that you run on your cluster. These metrics include execution time, CPU usage, memory consumption, number of executions, and more. It then aggregates this data into snapshots and stores them in a workload repository.

      You can access the collected data by directly querying the repository or by using Couchbase’s report generation tool. For more information, see View AWR Data and Reports.

      Use Cases

      Here are some use cases of AWR:

      • Troubleshooting Real-Time Issues: You can quickly identify slow running queries or instances of high resource usage. You can extract the SQL ID of the problematic query from the AWR report and use it to trace the query in completed_requests.

      • Analyzing Performance: When rolling out changes, such as introducing new microservices, AWR lets you compare query performance before and after the update.

      • Analyzing Upgrade Impacts: You can assess query performance before and after a cluster upgrade to identify queries impacted by the new version.

      Workload Repository

      The workload repository is a centralized storage location where all AWR data is collected and maintained.

      Before AWR can start collecting data, you must configure this location in the system:awr catalog. The repository can be a bucket or a collection, but not a scope.

      If you specify only the bucket name, AWR uses the default scope (_default) and default collection (_default) within that bucket to store the data. If the bucket or collection does not exist, you must create it.

      For example, a valid location is travel-sample._default.awr, which you can create using the following query:

      CREATE COLLECTION `travel-sample`._default.awr IF NOT EXISTS;

      AWR checks the availability of repository location at the start of each reporting interval. Until this specified location is available, AWR remains in a quiescent (inactive) state. Once the location becomes accessible, AWR transitions to an active state and begins collecting data. If the location becomes unavailable at any point, AWR returns to the quiescent state and resumes activity only when the location is accessible again.

      For more information about setting up the repository location, see Enable and Configure AWR.

      Snapshots

      AWR stores query performance data in the form of snapshots. For each unique statement executed within a specified reporting interval, AWR generates a snapshot. This snapshot contains aggregate metrics for all executions of that statement during the interval. These metrics include execution time, CPU usage, memory consumption, and other performance indicators.

      Snapshots are stored as individual documents in the workload repository. Each document is uniquely identified by its document key (ID), which includes the start time of the reporting interval, making it easier to filter and analyze data.

      Snapshot Retention Management

      AWR retains snapshot documents for long-term analysis, but does not enforce retention policies by default. To manage storage effectively, you need to configure a Time-To-Live (TTL) or expiration for the AWR location. The TTL specifies how long the documents remain in that location before the system automatically purges them. For more information about configuring the TTL, see Expiration.

      Example

      Example 1. Set TTL on AWR Collection

      If you set a TTL of 7 days on a target AWR collection, say travel-sample._default.awr, all snapshot documents older than 7 days are automatically deleted. To create the collection with this TTL setting, use the following query:

      CREATE COLLECTION `travel-sample`._default.awr IF NOT EXISTS WITH { "maxTTL": (7*24*60*60) };

      For more information about creating collections, see CREATE COLLECTION.

      Enable and Configure AWR

      AWR is an opt-in feature that you must explicitly enable and configure. Once enabled, AWR starts collecting data as soon as the repository location is set and is available.

      You can manage these settings through the system:awr catalog.

      system:awr

      This catalog determines how AWR functions including where it stores snapshots, how often it collects statistics, and which queries to include in the report. You can adjust these settings using an UPDATE query on system:awr.

      Only admins or users with the query_manage_system_catalog role can modify settings in system:awr. For more information, see Authentication and Client Privileges.

      The catalog consists of the following attributes:

      Name Description Schema

      enabled

      Indicates whether AWR is enabled or disabled.

      Default: FALSE

      Boolean

      location

      The target keyspace (repository) where the snapshots are stored.

      This can only be a path to a bucket or collection; it cannot be a scope. For more information, see Workload Repository.

      Example: "travel-sample._default.awr" or "travel-sample", in which case it uses the default scope and default collection.

      String

      interval

      The duration of the reporting interval. That is, the time between each snapshot collection. If the interval is set to 10 minutes, AWR captures snapshots every 10 minutes.

      The interval must be at least 1 minute.

      Default: "10m0s"

      Example: "1m30s"

      String (duration)

      threshold

      The minimum time a statement must take to complete for it be captured and included in the snapshot.

      The threshold must be at least 0 seconds.

      Default: "0s", so that by default, all statements are captured by AWR regardless of their execution time.

      Example: "1m30s"

      String (duration)

      num_statements

      The maximum number of unique statements for which aggregate data is collected during each interval.

      Once the specified limit is reached during a reporting interval, AWR does not generate snapshots for any additional unique statements within that same interval.

      Default: 10000

      Max: 100000

      Positive integer

      queue_len

      Length of the processing queue. It’s recommended not to change this value.

      The default value and maximum allowable value for queue_len are internally calculated based on system resources.

      Positive integer

      Examples

      Example 2. Enable AWR and configure settings

      The following query enables AWR, sets the repository location to travel-sample._default.awr, and configures the reporting interval and threshold.

      UPDATE system:awr SET enabled = true, location = "`travel-sample`._default.awr",
      interval = "1m", threshold = "0s";

      If you execute this query in the Query Workbench, you’ll get a warning about running an UPDATE query without specifying a WHERE clause or USE KEYS. You can ignore this warning and proceed.

      Example 3. Retrieve current AWR settings

      The following query retrieves the current AWR configuration settings.

      Query
      SELECT * FROM system:awr;
      Result
      [
        {
          "awr": {
            "enabled": true,
            "interval": "1m0s",
            "location": "`default`:`travel-sample`.`_default`.`awr`",
            "num_statements": 10000,
            "queue_len": 160,
            "threshold": "0s"
          }
        }
      ]

      Monitor AWR

      The current status of AWR is recorded in the query.log and you can view this information in the system:vitals output.

      Example 4. Query AWR status in system:vitals
      Query
      SELECT awr FROM system:vitals;
      Result
      [
        {
          "awr": {
            "requests": 11,
            "snapshots": 6,
            "start": "2025-09-26T05:10:44.789Z",
            "state": "active"
          }
        }
      ]

      View AWR Data and Reports

      You can access the AWR data by:

      Report Generation Tool

      You can generate AWR reports using the cbqueryreportgen command line tool. It provides comprehensive and user-friendly reports by executing SQL++ queries against the collected AWR data.

      For optimal query performance with this tool, it is recommended to create an index on the document key (META().id) in your configured AWR location. If this index is not present, the tool will use sequential scans, which can impact performance.

      For example, if the target location is travel-sample._default.awr, you can create an index as follows:

      CREATE INDEX idx_awr ON `travel-sample`._default.awr(META().id);

      Querying AWR Data Directly

      You can query AWR data directly from the workload repository using SQL++ queries.

      The document keys (IDs) of the snapshot documents include the timestamp of the reporting interval’s start time. This allows you to filter documents based on time ranges without requiring additional indexes (as sequential scans support range-based key patterns). However, you can add indexes to further optimize your queries, if needed.

      Each document contains the following fields:

      Name Description Schema

      cnt

      The number of times the statement was executed.

      Number

      from

      The start time of the interval, represented as an Epoch timestamp in milliseconds.

      Number

      to

      The end time of the interval, represented as an Epoch timestamp in milliseconds.

      Number

      pln

      An array containing the encoded, compressed outlines of the execution plan for both the minimum and maximum execution times of the statement.

      This is just the outline of the plan listing operators and significant objects used. For full execution details, configure the completed_requests system keyspace to capture the executions of the statement.

      You can use UNCOMPRESS() to decompress the execution plan strings, and then pass them to DECODE_JSON() for formatting, if needed.

      Array of strings

      qc

      The query context value.

      String

      sqlID

      The unique hash identifier of the statement.

      This can be used to aggregate information across different reporting periods for the same statement. It’s also included in the completed_requests entries (collected independently of AWR).

      String

      sts

      An ordered array of 51 entries representing the total, min, and max values of 17 statistics. That is, each statistic is represented by three consecutive entries in the array: the total value, the minimum value, and the maximum value. These values have fixed array positions and appear in the sequence specified in the Statistics array.

      For example, the second statistic in the list is the CPU time. Therefore,

      • sts[3] represents the total CPU time.

      • sts[4] represents minimum CPU time.

      • sts[5] represents the maximum CPU time.

      Statistics array

      txt

      The statement text, possibly in a compressed format.

      Typically, this field is accessed using the UNCOMPRESS() function, and the function returns the raw text if it is not compressed.

      String

      ver

      The version of the data record.

      For this release, the value is always 1.

      Number

      Statistics

      Name Description Schema

      total time

      The total time taken for the request, that is the time from when the request was received until the results were returned.

      This includes time spent in the queue and is analogous to elapsedTime in the Query REST API response.

      Number

      cpu time

      The amount of time the operators in the execution plan spent executing operator code.

      This is analogous to cpuTime in the Query Service API response when profiling is enabled.

      Number

      memory usage (quota)

      The amount of document memory used to execute the request. A request will return its document memory usage only if memory-quota is set for the query, or if both node-quota and node-quota-val-percent are set. For more information about these settings, see Configure Queries.

      This is analogous to usedMemory in the Query Service API response.

      Number

      result count

      The total number of objects in the results.

      This is analogous to resultCount in the Query Service API response.

      Number

      result size

      The total number of bytes in the results.

      This is analogous to resultSize in the Query Service API response.

      Number

      error count

      The number of errors that occurred during the request.

      This is analogous to errorCount in the Query Service API response.

      Number

      run time

      The total amount of time taken to execute the query. It does not include time spent in the queue.

      Number

      fetch time

      The total amount of time spent fetching data from the Data service.

      This includes the time spent executing Fetch operator code and waiting for data from the Data service.

      Number

      primary scan time

      The total amount of time spent by primary scan operations.

      This includes the time spent executing the PrimaryScan operator code and waiting for data from the Index service.

      Number

      sequential scan time

      The amount of time spent by sequential scan operations.

      This includes the time spent executing the PrimaryScan operator code and waiting for data from the Data service.

      Number

      primary scan count

      The total number of index keys returned by primary index scans and processed by the Query engine.

      Number

      sequential scan count

      The total number of document keys returned by sequential scans and processed by the Query engine.

      Number

      index scan count

      The total number of items returned by index scans and processed by the Query engine.

      Number

      fetch count

      The total number of documents fetched from the Data service and processed by the Query engine.

      Number

      order count

      The number of items that were sorted.

      Number

      primary scan ops

      The number of primary scan operators in the execution plan.

      Number

      sequential scan ops

      The number of sequential scan operators in the execution plan.

      Number

      Example

      The following example fetches AWR data for a specific SQL ID, including the statement text, max execution plan, number of executions, total time, and max CPU usage.

      Query
      SELECT
          text,
          max_plan,
          the_count,
          avg_total_time,
          max_cpu
      FROM
          default.s1.awr
      LET
          text = uncompress(txt)
      WHERE
          sqlID = 'fcff011269f93c3b7903d746c2914dab'
      GROUP BY
          sqlID, text
      LETTING
          the_count = SUM(cnt),
          max_plan = json_decode(uncompress(MAX(pln[1]))),
          avg_total_time = duration_to_str(SUM(sts[0])/SUM(cnt)),
          max_cpu = duration_to_str(MAX(sts[5]));
      Result
      [
          {
              "text": "select awr from system:vitals;",
              "max_plan": {
                  "#operator": "Sequence",
                  "~children": [
                      {
                          "#operator": "PrimaryScan",
                          "index_id": "#primary",
                          "keyspace": "vitals"
                      },
                      {
                          "#operator": "Fetch",
                          "keyspace": "vitals"
                      },
                      {
                          "#operator": "InitialProject"
                      },
                      {
                          "#operator": "Stream"
                      }
                  ]
              },
              "the_count": 2,
              "avg_total_time": "38.844257ms",
              "max_cpu": "193.409µs"
          }
      ]

      Limitations

      • In AWR reports, COMMIT statements may often show the highest elapsed time. However, the report alone does not provide insights into why the statement took so long to execute.

      • AWR does not capture SQL++ statements that contain sensitive information, such as CREATE USER and ALTER USER.

      • In some cases, SQL++ statements may appear truncated in AWR reports or snapshot documents. To find the complete statement, use its sqlID to look for entries in completed_requests that have the same sqlID. Then use one of those entries to get the full statement text.