Couchbase Cluster Auto-scaling Best Practices

      +
      Recommended best practices, derived from tested performance metrics, for configuring Couchbase cluster auto-scaling using the Couchbase Autonomous Operator.

      How to Use This Page

      This page provides guidance on how to configure the Autonomous Operator’s auto-scaling feature to effectively scale Couchbase clusters. Specifically, it discusses relevant metrics for scaling individual Couchbase Services, and provides recommended settings based on internal benchmark testing performed by Couchbase.

      Auto-scaling is a generic feature and it is possible to use other metrics and options outside those listed in these best practices. If you identify other metrics which you believe are more relevant to your application workload, we recommend that you consider the resources that require scaling, and that you test your specific scenarios with simulated workloads to ensure that your cluster scales as expected and meets the necessary service levels.

      Metrics

      With the exception of General Best Practices, the information on this page is organized into sections for each Couchbase Service, within which are subsections describing relevant metrics that can be used for auto-scaling those particular services.

      The subsections for each metric include a general discussion of the metric in the context of scaling, as well as recommendations on how to effectively use that metric to auto-scale the relevant Couchbase Service. These recommendations include optimal settings tested by Couchbase.

      How to Interpret Recommended Auto-scaling Settings

      Recommended auto-scaling settings are presented in a tabular structure:

      • Recommended Settings: This tab presents the recommended settings in a table. Additional clarifying notes about the recommended settings, if any, are listed below the table.

      • Example Configs: This tab provides YAML snippets that show the recommended settings as they are applied in the relevant resources (e.g. CouchbaseCluster and HorizontalPodAutoscaler).

      • Example Configs

      Example CouchbaseCluster Recommended Settings
      apiVersion: couchbase.com/v2
      kind: CouchbaseCluster
      metadata:
        name: cb-example
      spec:
        autoscaleStabilizationPeriod: 600s (1)
      1 Recommended Couchbase Stabilization Period
      Example HorizontalPodAutoscaler Recommended Settings
      apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      metadata:
        name: query-hpa
      spec:
        scaleTargetRef:
          apiVersion: couchbase.com/v2
          kind: CouchbaseAutoscaler
          name: query.scale-couchbase-cluster
        behavior:
          scaleUp:
            stabilizationWindowSeconds: 30 (1)
            policies:
            - type: Pods
              value: 1
              periodSeconds: 15
          scaleDown:
            stabilizationWindowSeconds: 300 (2)
        minReplicas: 2
        maxReplicas: 6
        metrics:
        - type: Resource
          resource:
            name: cpu (3)
            target:
              type: Utilization
              averageUtilization: 70 (4)
      1 Recommended HPA Stabilization Window for scaleUp
      2 Recommended HPA Stabilization Window for scaleDown
      3 Recommended Metric
      4 Recommended Threshold

      General Best Practices

      CouchbaseCluster Resource

      The following are general best practices that should be considered when configuring the CouchbaseCluster resource for cluster auto-scaling:

      • It is recommended that you always specify a value for couchbaseclusters.spec.autoscaleStabilizationPeriod.

        • When a valid value is set for this field, even if it is 0s, the Horizontal Pod Autoscaler will be kept in maintenance mode until the rebalance operation has completed.

        • When no value is set for this field, the Horizontal Pod Autoscaler may request scaling at any point during the rebalance operation. This is almost never desirable, since metrics are more likely to be unstable than not; and although any scaling request made by the Horizontal Pod Autoscaler during a rebalance operation will not be honored until the current rebalance is complete, the risk remains that the request itself is based on invalid metrics that could’ve been avoided if the Horizontal Pod Autoscaler was in maintenance mode. Therefore, it is recommended that most users set the value of this field to at least 0s, unless there is a high degree of confidence in the stability of a metric during rebalance.

        A general rule of thumb is to set this field to 0s until a high degree of confidence is established in the stability of a metric during rebalance. Refer to Couchbase Stabilization Period in the concept documentation for additional information about this setting.

      HorizontalPodAutoscaler Resource

      The following are general best practices that should be considered when configuring the HorizontalPodAutoscaler resource for cluster auto-scaling:

      • Setting a value for maxReplicas is required because it provides important protection against runaway scaling events. Due to the highly-configurable nature of cluster auto-scaling, setting a cap on the maximum number scaling nodes is the simplest, most valuable protection you can give yourself against a potentially costly misconfiguration.

      • Setting minReplicas is important for maintaining service availability. If left undefined, Kubernetes uses the default value of 1 pod, which may be fine for your non-production deployments. However, when using auto-scaling in production, it is recommended that you set minReplicas to at least 2 or greater (3 or greater for the Data Service). This ensures a minimum level of protection against single-node failures.

        You can technically set minReplicas to 0 by enabling the HPAScaleToZero feature gate. You should never do this, as the Autonomous Operator prevents server class configurations from having sizes less than 1.
      • Depending on the cloud provider, provisioning of persistent volumes may take significantly longer than pods. Therefore, the chances of exceeding a metric threshold while trying to reach its desired value is higher when using persistent volumes. All disk-space related thresholds should rely on volume expansion rather than pod auto-scaling.

      Data Service

      Metric: Bucket Memory Utilization

      Couchbase Server uses a fully integrated in-memory caching layer to provide high-speed access to bucket data. Every bucket has a configurable memory quota that determines how much memory should be consistently maintained within the caching layer for each individual bucket. Each bucket’s individual memory quota is subtracted from the overall memory quota assigned to the Data Service.

      Data associated with Couchbase buckets is stored in memory and persisted on disk, whereas data associated with Ephemeral buckets is exclusively maintained in memory.

      • If a Couchbase bucket’s memory quota is exceeded, infrequently used data items are ejected from memory, leaving only the persisted copies of those data items on disk.

      • If an Ephemeral bucket’s memory quota is exceeded, one of the following will occur depending on how the bucket is configured:

        • Resident data-items remain in memory, but requests to add new data are rejected until an administrator frees enough space to support continued data-ingestion

        • Resident data-items are ejected from memory to make way for new data (and because the data-items in Ephemeral buckets are not persisted on disk, they cannot be reacquired after ejection)

      If your goal is to prevent document ejection and maintain the active data-set for each bucket entirely within memory, then you must prevent each bucket’s memory utilization from ever reaching the high water mark. In practice this means that before the high water mark is reached, you must either increase the bucket’s memory quota, or scale out the cluster with additional nodes running the Data Service (thus increasing the total amount of memory reserved for all buckets).

      The first solution — increasing the bucket’s memory quota — requires manual intervention and may not be possible if there is no reservable memory left on Data Service nodes. This makes the second solution — scaling out the number of Data Service nodes — much more desirable, and an ideal use-case for cluster auto-scaling.

      Recommendations

      Recommended metric: cbbucketinfo_basic_quota_user_percent

      • Example Configs

      The examples below are incomplete configurations that are meant to show the implementation of the recommended settings within the relevant resources. For a more complete example of how to implement the recommended auto-scaling settings, refer to Auto-scaling the Couchbase Data Service.
      Example CouchbaseCluster Recommended Settings
      apiVersion: couchbase.com/v2
      kind: CouchbaseCluster
      metadata:
        name: cb-example
      spec:
        autoscaleStabilizationPeriod: 0s (1)
      1 Recommended Couchbase Stabilization Period
      Example HorizontalPodAutoscaler Recommended Settings
      apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      metadata:
        name: data-hpa
      spec:
        scaleTargetRef:
          apiVersion: couchbase.com/v2
          kind: CouchbaseAutoscaler
          name: data.scale-couchbase-cluster
        behavior:
          scaleUp:
            stabilizationWindowSeconds: 30 (1)
            policies:
            - type: Pods
              value: 1
              periodSeconds: 15
        minReplicas: 2
        maxReplicas: 6
        metrics:
        - type: Pods
          pods:
            metric:
              name: cbbucketinfo_basic_quota_user_percent (2)
              selector:
                matchLabels:
                  bucket: travel-sample (3)
            target:
              type: AverageValue
              averageValue: 70 (4)
      1 Recommended HPA Stabilization Window for scaleUp
      2 Recommended Metric
      3 A label selector to specify the bucket from which the metric should be read.
      4 Recommended Threshold

      Query Service

      Metric: CPU Utilization

      Couchbase nodes that run the Query Service execute N1QL queries for your application needs. When a query string is sent to Couchbase Server, the Query Service will inspect the string and parse it, planning which indexes to query. Once this is done, it generates a query plan. These steps require computation and processing that is heavily dependent on CPU resources.

      Different queries have different performance requirements. A simple query might return results within milliseconds and have very little impact on CPU, while a complex query may require several seconds and have much greater CPU overhead. The Query Service balances query processing across Query Service nodes based on available resources. However, if average CPU utilization across Query Service nodes gets too high, it is likely that the overall latency of queries will start to rise.

      Recommendations

      Recommended metric: cpu

      • Provided by Kubernetes Metrics Server

      • Represents the total CPU utilization of all containers that belong to a pod divided by the sum of all of their requests

      The following considerations should be taken into account when scaling on this metric:

      • The cpu metric represents overall pod CPU utilization, not just the Couchbase container. Therefore, this metric also includes the CPU usage of other containers such as the Couchbase Exporter and Logging sidecars.

        • When scaling on CPU utilization, it is particularly important that you set consistent CPU resource requests and limits for each container in a Couchbase pod.

      • Example Configs

      The examples below are incomplete configurations that are meant to show the implementation of the recommended settings within the relevant resources. For a more complete example of how to implement the recommended auto-scaling settings, refer to Auto-scaling the Couchbase Query Service.
      Example CouchbaseCluster Recommended Settings
      apiVersion: couchbase.com/v2
      kind: CouchbaseCluster
      metadata:
        name: cb-example
      spec:
        autoscaleStabilizationPeriod: 600s (1)
      1 Recommended Couchbase Stabilization Period
      Example HorizontalPodAutoscaler Recommended Settings
      apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      metadata:
        name: query-hpa
      spec:
        scaleTargetRef:
          apiVersion: couchbase.com/v2
          kind: CouchbaseAutoscaler
          name: query.scale-couchbase-cluster
        behavior:
          scaleUp:
            stabilizationWindowSeconds: 300 (1)
            policies:
            - type: Pods
              value: 1
              periodSeconds: 15
        minReplicas: 2
        maxReplicas: 6
        metrics:
        - type: Resource
          resource:
            name: cpu (2)
            target:
              type: Utilization
              averageUtilization: 70 (3)
      1 Recommended HPA Stabilization Window for scaleUp
      2 Recommended Metric
      3 Recommended Threshold

      Index Service

      Metric: Memory Utilization

      The Index Service is used to create indexes from predefined subsets of bucket-data. Most notably, Global Secondary Indexes (also known as secondary indexes) are created by the Index Service to support queries made by the Query Service on attributes within documents.

      Similar to buckets, secondary indexes use available memory to provide high-speed performance for certain operations. However, unlike buckets, secondary indexes do not have individually-configurable memory quotas. All indexes utilize the overall memory quota assigned to the Index Service.

      The Index Service can be configured to use either the standard or memory-optimized storage setting. The index storage setting is cluster-wide: it is established at cluster-initialization for all secondary indexes on the cluster, across all buckets. Standard is the default storage mode setting for secondary indexes: the indexes are saved on disk; in a disk-optimized format that uses both memory and disk for index-update and scanning. Memory-optimized index-storage allows high-speed maintenance and scanning; since the index is kept fully in memory at all times. A snapshot of the memory-optimized index is maintained on disk, to permit rapid recovery if node-failures are experienced.

      • The performance of standard index storage depends on overall I/O performance. As memory becomes constrained, the Index Service relies more heavily on slower-performing disk.

      • Memory-optimized index storage requires that all nodes running the Index Service have a memory quota sufficient for the number and size of their resident indexes, and for the frequency with which the indexes will be updated and scanned.

        • If indexer memory usage goes above 95% of the Index Service memory quota, the indexer goes into the Paused mode on that node; and although the indexes remain in Active state, traffic is routed away from the node.

      If your goal it to maintain peak performance for all index storage types, then you must prevent the memory utilization of the Index Service from ever reaching the point that the indexer goes into the Paused mode. In practice this means that before the indexer becomes paused, you must either increase the Index Service’s memory quota, or scale out the cluster with additional nodes running the Index Service (thus increasing the amount of memory reserved for indexes on all nodes).

      The first solution — increasing the Index Service’s memory quota — requires manual intervention and may not be possible if there is no reservable memory left on Index Service nodes. This makes the second solution — scaling out the number of Index Service nodes — much more desirable, and an ideal use-case for cluster auto-scaling.

      Recommendations

      Recommended metric: cbindex_ram_percent

      • Provided by Couchbase Prometheus Exporter

      • Represents the amount of memory used by the Index Service as a percentage of the Index Service’s memory quota

      The following considerations should be taken into account when scaling on this metric:

      • Indexes must be partitioned in order for the Index Service to be auto-scaled. Indexes that don’t utilize partitioning reside on a single node with underlying memory and compute resources that cannot be resized in-place after creation. You will need to delete and re-create any non-partitioned indexes before you can auto-scale the underlying Index nodes.

        • When new Index nodes are added or removed from the cluster, the rebalance operation attempts to move the index partitions across available Index nodes in order to balance resource consumption. The Index Service will only attempt to balance resource consumption on a best try basis.

      • The index storage mode is controlled by couchbaseclusters.spec.cluster.indexer.storageMode, and defaults to memory-optimized (memory_optimized). To configure the Index Service to use the standard index storage mode, set this field to plasma.

      • Example Configs

      The examples below are incomplete configurations that are meant to show the implementation of the recommended settings within the relevant resources. For a more complete example of how to implement the recommended auto-scaling settings, refer to Auto-scaling the Couchbase Index Service.
      Example CouchbaseCluster Recommended Settings
      apiVersion: couchbase.com/v2
      kind: CouchbaseCluster
      metadata:
        name: cb-example
      spec:
        autoscaleStabilizationPeriod: 0s (1)
      1 Recommended Couchbase Stabilization Period
      Example HorizontalPodAutoscaler Recommended Settings
      apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      metadata:
        name: index-hpa
      spec:
        scaleTargetRef:
          apiVersion: couchbase.com/v2
          kind: CouchbaseAutoscaler
          name: index.scale-couchbase-cluster
        behavior:
          scaleUp:
            stabilizationWindowSeconds: 30 (1)
            policies:
            - type: Pods
              value: 1
              periodSeconds: 15
        minReplicas: 2
        maxReplicas: 6
        metrics:
        - type: Pods
          pods:
            metric:
              name: cbindex_ram_percent (2)
            target:
              type: AverageValue
              averageValue: 50 (3)
      1 Recommended HPA Stabilization Window for scaleUp
      2 Recommended Metric
      3 Recommended Threshold