Guidelines and Best Practices

The Couchbase Kubernetes Operator makes deploying Couchbase Server incredibly simple. However, there are some external influences and configurations that can cause issues. This topic outlines some of the deployment best practices that can help you avoid some of the most common pitfalls.

Pod Scheduling

Pod scheduling details how to deploy your Couchbase Server clusters to consistently maintain high performance, and also minimize disruption in the case of failure.

Noisy Neighbors

The noisy neighbor problem occurs when multiple VMs are running on the same hypervisor, and one virtual machine (VM) adversely affects the other. This effect may result in any of the following conditions:

CPU Starvation: A highly CPU-intensive neighbor consumes more than its fair share of CPU resources, which results in fewer cycles allocated to the affected pod, higher latencies when processing data and network traffic, and increased cache latencies as the noisy neighbor flushes your entries to slower storage.
Memory Starvation: There are two common scenarios when considering memory starvation. In the cloud, a hypervisor VM will typically run without swap space to page out less-used memory pages. The result is that when memory is starved, the operating system has no other choice but to choose a process that is using too much memory and terminate it. When the hypervisor is on physical hardware with a swap file, performance will suffer as memory pages are transferred to and from a storage device, resulting in slower computation as memory accesses page-fault and the memory is swapped back in, higher latencies, and degraded I/O performance.
Disk I/O Starvation: A storage device has a finite number of input or output operations it can perform per second. Typically for a spinning disk this is in the order of 100 IOPs. For solid state storage, this is far higher — in the order of 10,000s of IOPs. If a noisy neighbor is utilizing a high proportion of that capacity, then an application will see slower reads and writes to and from the file system. In a cloud architecture, where a storage volume may be attached via a network protocol (e.g. iSCSI), you also generate increased network load.
Network Starvation: Like disk I/O, the network only has a limited capacity. Modern data center network interface cards will typically handle 10 Gbps of bandwidth. The problem is however not limited to just the local machine, as top-of-rack leaf switches are often over-subscribed by a ratio of 6:1 (or higher), so a network-intensive process may be affected by a process on a completely separate hypervisor.

To mitigate as many of these factors as is possible, it is recommended that you deploy a Couchbase Server cluster on a set of Kubernetes nodes that are separate from other processes. The Kubernetes cluster administrator should be responsible for labeling nodes as belonging to a specific application type. Developers may then use the spec.servers.pod.spec.nodeSelector configuration property to control which nodes Couchbase pods are scheduled onto. Take care to ensure that other pods (e.g. stateless web applications) are not scheduled on those nodes that are allocated for use by Couchbase.

Likewise, it is recommended that you deploy Couchbase Server clusters with anti-affinity enabled so that cluster pods are not scheduled on the same nodes. This can be achieved with the spec.antiAffinity configuration property.

Anti-affinity is only scoped to a specific cluster. In order to prevent pods from one cluster interfering with another, you should use a separate node label per Couchbase cluster.

Server Group Configuration

The server group feature of the Couchbase data platform allows you group Couchbase Server nodes in such a way that data or index replicas are distributed over failure domains that are larger than that of a single server (e.g. rack, data center, availability zone, etc). When server groups are defined, the Server Group Awareness feature distributes vBuckets on a best-effort basis, while ensuring each instance is under an equal load. As such, under certain circumstances, it’s possible for multiple replicas to be co-located in the same server group.

It is recommended that server groups should have an equal number of pods per server class to make it simpler to maintain server group constraints. It is also recommended that you have more server groups than replicas for the same reason.

For more information, refer to About Using Couchbase Server Groups With the Operator.

Storage

Storage options may have both beneficial and detrimental impacts on your clusters.

Persistent Volume Backed Clusters

Persistent volumes are recommended for production deployments. This allows for the quick recovery of a failed pod by warming up from existing data and allowing quicker back-filling via delta-recovery, rather than a full rebalance.

By having some of the pod’s file system persisted, logging data can be extracted from failed pods where this would normally be lost with a fully ephemeral file system. It also aids in data recovery situations which cannot be rectified by the Operator.

Table 1. Cluster Supportability with Persistent Volumes
Deployment Type	Services	Required Volume Mount(s)	Recoverable?	Supportable?
Production Stateful	Any	`default`	✓	✓
Production Stateless	`query`, `eventing`	`logs`	✗	✓

Production Stateful

These deployments require the default volume mount to be defined. This encompasses all data generated by Couchbase Server at run time and logs. Pods using this deployment method can recover pods that fail by reusing the persistent data on the default volume. The data, index and analytics Services must be deployed with this method to preserve data in the event of a total power failure.

The Operator technically allows the search Service to be deployed in a stateless configuration as defined below. However, it is strongly encouraged that you deploy the search Service in a stateful configuration, otherwise a failure or elimination of enough search nodes will cause it to become unresponsive for considerably longer. This is because without a volume mount, all of the Full Text Indexes will need to be completely rebuilt.

Production Stateless

These deployments require the logs volume mount to be defined. This covers the Couchbase Server logs only so pods that fail cannot be recovered, however the logs can still be accessed and downloaded with the support tool. The query and eventing services may be deployed with this method, if they are not running with a stateful service. Logs are only retained in the event of a pod failure. They will be automatically deleted due to a user initiated action e.g. scaling the cluster down.

Production clusters can also be deployed as a combination of Production Stateful and Production Stateless node deployments, depending on the services running on the individual server class configurations. The default and logs mounts in the CouchbaseCluster configuration are mutually exclusive — if any server class is detected by the cluster validation to be supportable (e.g. default or logs is specified under volumeMounts), then all other server classes must also have supportable configurations in order for the validation check to pass.

It is also highly recommended that in cloud deployments, storage be located in the same data center as the pods that are running in that server group. If a data center were to suffer an outage with storage randomly distributed across a virtual private cloud, not only would you lose all the pods that are resident in the availability zone, but also any pods with backing storage provided by the availability zone.

There will be a slight reduction in performance and latency when using a network-based storage backend.

Ephemeral Clusters

In some use cases, such as an in-memory cache, it is undesirable to have to provision persistent volumes in order to deploy a cluster. In other situations, it may be impossible, if the platform doesn’t support dynamic persistent volume provisioning for example.

The Operator provides the option to run clusters with ephemeral storage only. In this mode of operation, the Operator can recover a partially down cluster — where Couchbase server is unable to automatically failover the pods — by forcing the down pods out of the cluster with the couchbaseclusters.spec.recoveryPolicy attribute. The Operator will also fully rebuild a cluster that has had all pods deleted.

Due to pods being ephemeral in nature, it is highly likely that data will be unrecoverable when a Couchbase server node goes down, so there is little risk in using a forced failover. Ephemeral clusters favor caching use-cases where the data can be repopulated by clients and does not need to be persisted.

Since fully-ephemeral Couchbase clusters only use ephemeral storage, Couchbase Server logs are highly likely to be unavailable in the event of a crash. This can make supporting an ephemeral cluster particularly difficult, and it is recommended that you exercise caution when using this type of deployment.

Starting with version 2.2, the Kubernetes Operator supports forwarding Couchbase Server logs. However, the current implementation requires the use of a (persistent) volume.

Security

Keeping your data safe is of paramount importance. This section contains some best practices that you should follow to keep your deployment secure.

Cluster Scope

When you deploy a cluster, you do so in a namespace. The Operator needs to be given a fairly broad set of permissions in order to dynamically create pods, services, etc. It is recommended that the Operator, and the clusters that it manages, be segregated in their own name space. This limits the set of resources that the Operator can affect, and that support tools can collect data about, which aids in security compliance situations where confidential information may be leaked via the Kubernetes API operations.

Production Deployments

This section contains some best practices for resource configuration values that are recommended when deploying Couchbase Server in a production environment. In some cases these values are different from the default values. These are not enforced by the Operator, however the DAC will warn users when values are set that are not recommended for production use.

Couchbase Clusters

The following are recommended values for the CouchbaseCluster resource:

spec.cluster.autoFailover: Auto failover settings should not be left at their default values, but should be tuned on a case by case basis.
spec.antiAffinity: This should be enabled for clusters in production.
spec.clusterSettings.indexer.storageMode: The storage mode recommended for production clusters is plasma.
spec.Buckets.synchronize: This should be disabled. It is intended for development use only.
spec.clusterSettings.autoCompaction: It’s recommended that auto-compaction settings are used in production.
spec.servers.volumeMounts.default or spec.servers.volumeMounts.logs: To ensure that logs are persisted, one of the above should be configured for at least one server resource.

Couchbase Buckets

The following are recommended values for the CouchbaseBucket resource:

Sample Buckets: These should not be used when deploying to production. They are intended for development purposes only.

Node Configuration

How your Kubernetes nodes are configured may also have a detrimental impact on your ability to deploy a Couchbase Server cluster. These are not officially supported but recommended for cluster administrators.

Node Services

It is recommended that the following services be running on all Kubernetes nodes.

NTP: Like all distributed systems, Couchbase Server needs to have synchronized time in order to resolve conflicts. Newer systems should have NTP enabled by default if running systemd-timesyncd. Older systems should run ntpd or similar and can be automated via an orchestration tool such as Puppet or Ansible.

Kernel Parameters

Couchbase server has some recommended kernel parameters which should be set. These can be done via an orchestration solution or a Kubernetes DaemonSet resource.

mm.transparent_hugepage.enabled: Transparent huge pages (THP) attempt to allocate large areas of contiguous memory so that applications which process data sequentially, or that have high data locality, do so without causing a high number of kernel page faults. For a database such as Couchbase, this actually has a detrimental effect, since its access patterns are typically highly random. It is highly recommended that this parameter be set to never.
mm.transparent_hugepage.defrag: Related to mm.transparent_hugepage.enabled, it is highly recommended that this parameter be set to never.
kernel.core_pattern: The core pattern is used to specify the pattern for core dump files. It is recommended that this be set to /bin/false to disable core dumps.
fs.suid_dumpable: This parameter controls the core dump of setuid binaries. It is recommended that this be set to 0 to disable core dumps.

Couchbase Server recommends that vm.swappiness be set to zero. Kubernetes does not allow swap in containers therefore this parameter can be safely ignored.

For example, you could run something similar to the following:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: couchbase-sysctls
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: couchbase-sysctls
  template:
    metadata:
      labels:
        name: couchbase-sysctls
    spec:
      # Apply to nodes marked as belonging to a couchbase cluster only
      nodeSelector:
        app: couchbase-cluster
      containers:
      - name: couchbase-sysctls
        image: busybox:latest
        # Run as root on the host system
        securityContext:
          privileged: true
        # Constantly reconcile the state to revert any changes
        command:
        - /bin/sh
        - -c
        - |
          set -o xtrace
          while true
          do
            sysctl -w mm.transparent_hugepage.enabled=never
            sysctl -w mm.transparent_hugepage.defrag=never
            sysctl -w kernel.core_pattern='|/bin/false'
            sysctl -w fs.suid_dumpable=0
            sleep 60
          done

Ulimits

These ulimit settings are necessary when running under heavy load. If you are just doing light testing and development, you can omit these settings, and everything will still work.

Red Hat OpenShift 4.0 and above sets the default limit of the maximum number of process IDs in the cgroup to be 1024; you can read about this here (Red Hat subscription required). A workaround is to increase the pids_limit in both the KubeletConfig and ContainerRuntimeConfig.

The Couchbase Pod ulimits are inherited from the default ulimits set on the container runtime daemon on each Kubernetes node. Configuration of this daemon varies according to the container runtime daemon used by the distribution and its service management system.

Systemd

Systemd-compatible hosts can configure ulimits for the daemon by editing its service configuration file.

Run the following command to detect the location of the configuration file for the Docker container runtime:

$ sudo systemctl cat docker
# /lib/systemd/system/docker.service

Add the following values to the service file:

LimitNOFILE=40960
LimitCORE=infinity
LimitMEMLOCK=infinity

Reload the new configuration and restart the docker daemon:

$ sudo systemctl daemon-reload
$ sudo systemctl restart docker

Upstart

Upstart-compatible hosts can configure ulimits for the Docker container runtime by editing the docker.conf file:

$ vi /etc/init/docker.conf

And the following values to the service configuration:

limit nofile 40960
limit core unlimited unlimited
limit memlock unlimited unlimited

Restart the docker service to apply ulimits to the daemon:

$ sudo /etc/init.d/docker restart