Couchbase Cluster Auto-scaling Best Practices

Recommended best practices, derived from tested performance metrics, for configuring Couchbase cluster auto-scaling using the Couchbase Kubernetes Operator.

How to Use This Page

This page provides guidance on how to configure the Kubernetes Operator’s auto-scaling feature to effectively scale Couchbase clusters. Specifically, it discusses relevant metrics for scaling individual Couchbase Services, and provides recommended settings based on internal benchmark testing performed by Couchbase.

Auto-scaling is a generic feature and it is possible to use other metrics and options outside those listed in these best practices. If you identify other metrics which you believe are more relevant to your application workload, we recommend that you consider the resources that require scaling, and that you test your specific scenarios with simulated workloads to ensure that your cluster scales as expected and meets the necessary service levels.

Metrics

With the exception of General Best Practices, the information on this page is organized into sections for each Couchbase Service, within which are subsections describing relevant metrics that can be used for auto-scaling those particular services.

The subsections for each metric include a general discussion of the metric in the context of scaling, as well as recommendations on how to effectively use that metric to auto-scale the relevant Couchbase Service. These recommendations include optimal settings tested by Couchbase.

How to Interpret Recommended Auto-scaling Settings

Recommended auto-scaling settings are presented in a tabular structure:

Recommended Settings: This tab presents the recommended settings in a table. Additional clarifying notes about the recommended settings, if any, are listed below the table.
Example Configs: This tab provides YAML snippets that show the recommended settings as they are applied in the relevant resources (e.g. CouchbaseCluster and HorizontalPodAutoscaler).

Recommended Settings
Example Configs

Metric Threshold HPA Stabilization Windows Couchbase Stabilization Period

Metric	Threshold	HPA Stabilization Windows	Couchbase Stabilization Period
The name of the metric that, when properly specified in the `HorizontalPodAutoscaler` resource, instructs the Horizontal Pod Autoscaler (HPA) to monitor the metric for auto-scaling purposes. Example: `cpu`	The target value for the specified metric that, when exceeded, prompts the HPA to consider scaling the cluster. Example: `70%`	The amount of time that the HPA will look into the past when considering `scaleUp` and `scaleDown` events. Example: `scaleUp: 30s`	The value specified in `couchbaseclusters.spec.autoscaleStabilizationPeriod` which defines the amount time that all `HorizontalPodAutoscalers` associated with the Couchbase cluster should remain in maintenance mode after the cluster has begun a rebalance operation. Example: `600s`

The name of the metric that, when properly specified in the HorizontalPodAutoscaler resource, instructs the Horizontal Pod Autoscaler (HPA) to monitor the metric for auto-scaling purposes.

Example: cpu

The target value for the specified metric that, when exceeded, prompts the HPA to consider scaling the cluster.

Example: 70%

The amount of time that the HPA will look into the past when considering scaleUp and scaleDown events.

Example: scaleUp: 30s

The value specified in couchbaseclusters.spec.autoscaleStabilizationPeriod which defines the amount time that all HorizontalPodAutoscalers associated with the Couchbase cluster should remain in maintenance mode after the cluster has begun a rebalance operation.

Example: 600s

Example CouchbaseCluster Recommended Settings

apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
  name: cb-example
spec:
  autoscaleStabilizationPeriod: 600s (1)

1	Recommended Couchbase Stabilization Period

Example HorizontalPodAutoscaler Recommended Settings

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: query-hpa
spec:
  scaleTargetRef:
    apiVersion: couchbase.com/v2
    kind: CouchbaseAutoscaler
    name: query.scale-couchbase-cluster
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30 (1)
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300 (2)
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu (3)
      target:
        type: Utilization
        averageUtilization: 70 (4)

1	Recommended HPA Stabilization Window for `scaleUp`
2	Recommended HPA Stabilization Window for `scaleDown`
3	Recommended Metric
4	Recommended Threshold

General Best Practices

`CouchbaseCluster` Resource

The following are general best practices that should be considered when configuring the CouchbaseCluster resource for cluster auto-scaling:

It is recommended that you always specify a value for couchbaseclusters.spec.autoscaleStabilizationPeriod.
- When a valid value is set for this field, even if it is 0s, the Horizontal Pod Autoscaler will be kept in maintenance mode until the rebalance operation has completed.
- When no value is set for this field, the Horizontal Pod Autoscaler may request scaling at any point during the rebalance operation. This is almost never desirable, since metrics are more likely to be unstable than not; and although any scaling request made by the Horizontal Pod Autoscaler during a rebalance operation will not be honored until the current rebalance is complete, the risk remains that the request itself is based on invalid metrics that could’ve been avoided if the Horizontal Pod Autoscaler was in maintenance mode. Therefore, it is recommended that most users set the value of this field to at least 0s, unless there is a high degree of confidence in the stability of a metric during rebalance.
A general rule of thumb is to set this field to 0s until a high degree of confidence is established in the stability of a metric during rebalance. Refer to Couchbase Stabilization Period in the concept documentation for additional information about this setting.

`HorizontalPodAutoscaler` Resource

The following are general best practices that should be considered when configuring the HorizontalPodAutoscaler resource for cluster auto-scaling:

Setting a value for maxReplicas is required because it provides important protection against runaway scaling events. Due to the highly-configurable nature of cluster auto-scaling, setting a cap on the maximum number scaling nodes is the simplest, most valuable protection you can give yourself against a potentially costly misconfiguration.

Setting minReplicas is important for maintaining service availability. If left undefined, Kubernetes uses the default value of 1 pod, which may be fine for your non-production deployments. However, when using auto-scaling in production, it is recommended that you set minReplicas to at least 2 or greater (3 or greater for the Data Service). This ensures a minimum level of protection against single-node failures.

You can technically set minReplicas to 0 by enabling the HPAScaleToZero feature gate. You should never do this, as the Kubernetes Operator prevents server class configurations from having sizes less than 1.

Depending on the cloud provider, provisioning of persistent volumes may take significantly longer than pods. Therefore, the chances of exceeding a metric threshold while trying to reach its desired value is higher when using persistent volumes. All disk-space related thresholds should rely on volume expansion rather than pod auto-scaling.

Data Service

Metric: Bucket Memory Utilization

Couchbase Server uses a fully integrated in-memory caching layer to provide high-speed access to bucket data. Every bucket has a configurable memory quota that determines how much memory should be consistently maintained within the caching layer for each individual bucket. Each bucket’s individual memory quota is subtracted from the overall memory quota assigned to the Data Service.

Data associated with Couchbase buckets is stored in memory and persisted on disk, whereas data associated with Ephemeral buckets is exclusively maintained in memory.

If a Couchbase bucket’s memory quota is exceeded, infrequently used data items are ejected from memory, leaving only the persisted copies of those data items on disk.
If an Ephemeral bucket’s memory quota is exceeded, one of the following will occur depending on how the bucket is configured:
- Resident data-items remain in memory, but requests to add new data are rejected until an administrator frees enough space to support continued data-ingestion
- Resident data-items are ejected from memory to make way for new data (and because the data-items in Ephemeral buckets are not persisted on disk, they cannot be reacquired after ejection)

If your goal is to prevent document ejection and maintain the active data-set for each bucket entirely within memory, then you must prevent each bucket’s memory utilization from ever reaching the high water mark. In practice this means that before the high water mark is reached, you must either increase the bucket’s memory quota, or scale out the cluster with additional nodes running the Data Service (thus increasing the total amount of memory reserved for all buckets).

The first solution — increasing the bucket’s memory quota — requires manual intervention and may not be possible if there is no reservable memory left on Data Service nodes. This makes the second solution — scaling out the number of Data Service nodes — much more desirable, and an ideal use-case for cluster auto-scaling.

Recommendations

Recommended metric: cbbucketinfo_basic_quota_user_percent

Provided by Couchbase Prometheus Exporter
Represents the percentage of memory used by a bucket in relation to the bucket’s quota

Recommended Settings
Example Configs

Based on internal benchmark testing, we believe the following settings provide a good starting point for scaling the Data Service based on bucket memory utilization:

Metric	Threshold	HPA Stabilization Windows	Couchbase Stabilization Period
`cbbucketinfo_basic_quota_user_percent`	`70%`	`scaleUp: 30s` `scaleDown:` default	`0s`

A label selector must be used with cbbucketinfo_basic_quota_user_percent in order to scale the cluster based on the memory utilization of individual buckets. When configured to scale on this metric without a label selector, the Horizontal Pod Autoscaler will take the sum memory utilization across buckets. (For example, if a cluster has two buckets — bucket1 and bucket2 — and both are utilizing 40% of their individual memory quotas, the Horizontal Pod Autoscaler will read the metric as 80% and thus scale the cluster if the threshold is set to 70%.) Refer to the Example Configs tab for an example of how to define a label selector for this metric.

General notes:

Benchmark tests were based on buckets that had a 30% average write rate, 1 kilobyte average document sizes, and default compaction threshold.
The threshold needs to be set below the high water mark of the bucket’s memory quota, otherwise data may be ejected from memory before the cluster scales up.
- The threshold needs to be set low enough that the cluster has enough time to finish rebalancing before the high water mark is reached — preferably at or just below the low water mark, as it indicates that memory usage is moving toward a critical point.
You should consider setting a lower threshold for buckets with a higher average write rate or larger average doc sizes. This is because rebalance will take longer, thus increasing the risk of exceeding the high water mark.
As a cost saving measure, you can consider setting a higher threshold for buckets that experience lighter workloads.

Notes on Ephemeral buckets:

Ephemeral buckets are more costly, but exhibit more predictable performance when it comes to scaling because no fragmentation occurs. Couchbase buckets are more cost efficient, but they have less predictable scaling performance, thus making auto-scaling configurations more difficult.
Lower compaction thresholds result in fewer auto-scaling events, but higher rebalance times since there is more data per node. Therefore, the memory quota threshold may need to be adjusted down for workloads with set rates higher than 30%.
Higher compaction thresholds will result in more auto-scaling events, which is less cost efficient but leads to faster rebalances since less data exists per node.

The examples below are incomplete configurations that are meant to show the implementation of the recommended settings within the relevant resources. For a more complete example of how to implement the recommended auto-scaling settings, refer to Auto-scaling the Couchbase Data Service.

Example CouchbaseCluster Recommended Settings

apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
  name: cb-example
spec:
  autoscaleStabilizationPeriod: 0s (1)

1	Recommended Couchbase Stabilization Period

Example HorizontalPodAutoscaler Recommended Settings

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: data-hpa
spec:
  scaleTargetRef:
    apiVersion: couchbase.com/v2
    kind: CouchbaseAutoscaler
    name: data.scale-couchbase-cluster
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30 (1)
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metric:
        name: cbbucketinfo_basic_quota_user_percent (2)
        selector:
          matchLabels:
            bucket: travel-sample (3)
      target:
        type: AverageValue
        averageValue: 70 (4)

1	Recommended HPA Stabilization Window for `scaleUp`
2	Recommended Metric
3	A label selector to specify the bucket from which the metric should be read.
4	Recommended Threshold

Query Service

Metric: CPU Utilization

Couchbase nodes that run the Query Service execute N1QL queries for your application needs. When a query string is sent to Couchbase Server, the Query Service will inspect the string and parse it, planning which indexes to query. Once this is done, it generates a query plan. These steps require computation and processing that is heavily dependent on CPU resources.

Different queries have different performance requirements. A simple query might return results within milliseconds and have very little impact on CPU, while a complex query may require several seconds and have much greater CPU overhead. The Query Service balances query processing across Query Service nodes based on available resources. However, if average CPU utilization across Query Service nodes gets too high, it is likely that the overall latency of queries will start to rise.

Recommendations

Recommended metric: cpu

Provided by Kubernetes Metrics Server
Represents the total CPU utilization of all containers that belong to a pod divided by the sum of all of their requests

The following considerations should be taken into account when scaling on this metric:

The cpu metric represents overall pod CPU utilization, not just the Couchbase container. Therefore, this metric also includes the CPU usage of other containers such as the Couchbase Exporter and Logging sidecars.
- When scaling on CPU utilization, it is particularly important that you set consistent CPU resource requests and limits for each container in a Couchbase pod.

Recommended Settings
Example Configs

Based on internal benchmark testing, we believe the following settings provide a good starting point for scaling the Query Service based on CPU utilization:

Metric Threshold HPA Stabilization Windows Couchbase Stabilization Period

Metric	Threshold	HPA Stabilization Windows	Couchbase Stabilization Period
`cpu`	`70%`	`scaleUp: 300s` `scaleDown:` default	`600s`

cpu

70%

scaleUp: 300s

scaleDown: default

600s

General notes:

Scaling on CPU utilization is most effective for queries which are CPU-bound. The Query Service should be scaled at 10-20% below peak CPU usage. Our testing revealed 70% to be a good threshold when experiencing a peak of 90% CPU usage.
- On average, an additional pod provided 5% CPU relief for workloads averaging 10 queries/sec at 500ms. Peak CPU should be less than 95% to ensure scaling will immediately improve CPU usage.
The default threshold tolerance of the HPA scaling algorithm is 0.1. Auto-scaling the Query Service will not perform well if the workload results in CPU usage deviating ±10% at your workload’s “steady state”. This is because auto-scaling will continually counterbalance itself over time. Therefore you should be mindful of this window when choosing a CPU scaling threshold. Note that the deviation becomes smaller as CPU threshold lowers, with the deviation being relatively ideal at the recommended 70% threshold.
- Using a 300s HPA Stabilization Window (both scaleUp and scaleDown) is recommended in order to mitigate quick directional changes in scaling.
- A high 600s Couchbase Stabilization Period ensures additional nodes are fully acclimated to the workload.

Example CouchbaseCluster Recommended Settings

apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
  name: cb-example
spec:
  autoscaleStabilizationPeriod: 600s (1)

1	Recommended Couchbase Stabilization Period

Example HorizontalPodAutoscaler Recommended Settings

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: query-hpa
spec:
  scaleTargetRef:
    apiVersion: couchbase.com/v2
    kind: CouchbaseAutoscaler
    name: query.scale-couchbase-cluster
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 300 (1)
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu (2)
      target:
        type: Utilization
        averageUtilization: 70 (3)

1	Recommended HPA Stabilization Window for `scaleUp`
2	Recommended Metric
3	Recommended Threshold

Index Service

Metric: Memory Utilization

The Index Service is used to create indexes from predefined subsets of bucket-data. Most notably, Global Secondary Indexes (also known as secondary indexes) are created by the Index Service to support queries made by the Query Service on attributes within documents.

Similar to buckets, secondary indexes use available memory to provide high-speed performance for certain operations. However, unlike buckets, secondary indexes do not have individually-configurable memory quotas. All indexes utilize the overall memory quota assigned to the Index Service.

The Index Service can be configured to use either the standard or memory-optimized storage setting. The index storage setting is cluster-wide: it is established at cluster-initialization for all secondary indexes on the cluster, across all buckets. Standard is the default storage mode setting for secondary indexes: the indexes are saved on disk; in a disk-optimized format that uses both memory and disk for index-update and scanning. Memory-optimized index-storage allows high-speed maintenance and scanning; since the index is kept fully in memory at all times. A snapshot of the memory-optimized index is maintained on disk, to permit rapid recovery if node-failures are experienced.

The performance of standard index storage depends on overall I/O performance. As memory becomes constrained, the Index Service relies more heavily on slower-performing disk.
Memory-optimized index storage requires that all nodes running the Index Service have a memory quota sufficient for the number and size of their resident indexes, and for the frequency with which the indexes will be updated and scanned.
- If indexer memory usage goes above 95% of the Index Service memory quota, the indexer goes into the Paused mode on that node; and although the indexes remain in Active state, traffic is routed away from the node.

If your goal it to maintain peak performance for all index storage types, then you must prevent the memory utilization of the Index Service from ever reaching the point that the indexer goes into the Paused mode. In practice this means that before the indexer becomes paused, you must either increase the Index Service’s memory quota, or scale out the cluster with additional nodes running the Index Service (thus increasing the amount of memory reserved for indexes on all nodes).

The first solution — increasing the Index Service’s memory quota — requires manual intervention and may not be possible if there is no reservable memory left on Index Service nodes. This makes the second solution — scaling out the number of Index Service nodes — much more desirable, and an ideal use-case for cluster auto-scaling.

Recommendations

Recommended metric: cbindex_ram_percent

Provided by Couchbase Prometheus Exporter
Represents the amount of memory used by the Index Service as a percentage of the Index Service’s memory quota

The following considerations should be taken into account when scaling on this metric:

Indexes must be partitioned in order for the Index Service to be auto-scaled. Indexes that don’t utilize partitioning reside on a single node with underlying memory and compute resources that cannot be resized in-place after creation. You will need to delete and re-create any non-partitioned indexes before you can auto-scale the underlying Index nodes.
- When new Index nodes are added or removed from the cluster, the rebalance operation attempts to move the index partitions across available Index nodes in order to balance resource consumption. The Index Service will only attempt to balance resource consumption on a best try basis.
The index storage mode is controlled by couchbaseclusters.spec.cluster.indexer.storageMode, and defaults to memory-optimized (memory_optimized). To configure the Index Service to use the standard index storage mode, set this field to plasma.

Recommended Settings
Example Configs

Based on internal benchmark testing, we believe the following settings provide a good starting point for scaling the Index Service based on memory utilization:

Metric Threshold HPA Stabilization Windows Couchbase Stabilization Period

Metric	Threshold	HPA Stabilization Windows	Couchbase Stabilization Period
`cbindex_ram_percent`	`60%`	`scaleUp: 30s` `scaleDown:` default	`0s`

cbindex_ram_percent

60%

scaleUp: 30s

scaleDown: default

0s

General notes:

Benchmark tests were based on memory-optimized indexes with a 30% write rate of 16 byte document sizes. Higher write rates or larger document sizes should use a threshold lower than 60% to account for longer rebalance times.
Index RAM distribution is uncertain and clusters may perform better when an even number of Index nodes are available.
Testing revealed that Index Service RAM usage didn’t always drop immediately when a single index node was added, but when a pair of nodes were added then RAM usage dropped.

Example CouchbaseCluster Recommended Settings

apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
  name: cb-example
spec:
  autoscaleStabilizationPeriod: 0s (1)

1	Recommended Couchbase Stabilization Period

Example HorizontalPodAutoscaler Recommended Settings

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: index-hpa
spec:
  scaleTargetRef:
    apiVersion: couchbase.com/v2
    kind: CouchbaseAutoscaler
    name: index.scale-couchbase-cluster
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30 (1)
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metric:
        name: cbindex_ram_percent (2)
      target:
        type: AverageValue
        averageValue: 50 (3)

1	Recommended HPA Stabilization Window for `scaleUp`
2	Recommended Metric
3	Recommended Threshold

Couchbase Cluster Auto-scaling Best Practices

How to Use This Page

Metrics

General Best Practices

CouchbaseCluster Resource

HorizontalPodAutoscaler Resource

Data Service

Metric: Bucket Memory Utilization

Recommendations

Query Service

Metric: CPU Utilization

Recommendations

Index Service

Metric: Memory Utilization

Recommendations

`CouchbaseCluster` Resource

`HorizontalPodAutoscaler` Resource