CouchbaseCluster Reference Architecture
How to configure a reference production deployment of Couchbase Server.
Couchbase clusters can be configured in many different ways. We advertise features as opt-in, so you have the freedom to configure your cluster as suits your environment. There are, however, a core set of best practices that we recommend.
This page collects together and aggregates those best practices into a single architecture. While it may not suit your environment completely, it may form the basis of your clusters.
The majority of this page can be copied and used verbatim without modification. Non-dynamic elements of the configuration will be highlighted using admonitions such as this one.
The reference architecture makes use of 3rd party resources, for ease of management and security compliance. Before continuing ensure you have installed the following on your Kubernetes cluster:
Role based access control should be managed by the Operator. This allows your Couchbase users and groups to be defined in code. Code can be controlled with a change control system, it can be easily audited and reviewed, and crucially can be automated.
# Applications and the Operator are able to use this secret. # For security, normal users within the namespace should be prohibited # from access to Secrets. It is up to the administrator to also # ensure the passwords are not leaked from the application. apiVersion: v1 kind: Secret metadata: name: application1-authentication type: Opaque stringData: password: pieWrewn5knyk& --- # Applications should have a user that they can use, this allows # strong guarantees of safety when using RBAC. While strictly not # necessary, the labels provide filtering so resources are only # picked up by specific clusters. apiVersion: couchbase.com/v2 kind: CouchbaseUser metadata: name: application1 labels: cluster: cluster1 spec: authDomain: local authSecret: application1-authentication --- # Groups control what the application is allowed to do. The administrator # should limit group permissions to only what is absolutely necessary to # allow the application to function. It also restricts those permissions # to a specific bucket that the application needs to access. While strictly not # necessary, the labels provide filtering so resources are only # picked up by specific clusters. apiVersion: couchbase.com/v2 kind: CouchbaseGroup metadata: name: group1 labels: cluster: cluster1 spec: roles: - bucket: bucket1 name: bucket_full_access --- # Role bindings are a bit of a misnomer. They follow the behaviour of standard # kubernetes role bindings by creating a relationship between users and groups. apiVersion: couchbase.com/v2 kind: CouchbaseRoleBinding metadata: name: group1 labels: cluster: cluster1 spec: roleRef: kind: CouchbaseGroup name: group1 subjects: - kind: CouchbaseUser name: application1
Buckets, like RBAC, should be managed by the Operator. Again this provides change control, peer review and auditing. This is essential for data containers, like buckets, as they have quotas, and these should be centrally controlled and managed to prevent resource starvation by over provisioning.
# Buckets control the amount of data an application can use, and control how that # data is managed. We need at least one data replica to ensure applications can # continue to work when a pod is evicted (usually for the purposes of Kubernetes # rolling upgrades, which should be carried out one every 3-4 months to ensure that # you are up to date with the latest security fixes from both Kubernetes and the # operator). apiVersion: couchbase.com/v2 kind: CouchbaseBucket metadata: name: bucket1 labels: cluster: cluster1 spec: memoryQuota: 100Mi replicas: 2 ioPriority: high enableIndexReplica: true
Security is extremely important when working in a Cloud based environment such as Kubernetes. To this end we use TLS management by default to protect data from eavesdroppers.
TLS is managed by 3rd party tooling, rather than manually. By doing this we simplify the process by removing manual steps, and also secure it by using policy based certificate rotation.
The TLS subject alternative names described in this section contain the namespace of the cluster.
In this example, that namespace is
Cluster configuration is quite complicated, so will not be discussed at length here. Instead comments are provided inline, where they are contextually relevant.
In general, the cluster is designed to be stable, fault tolerant and secure.
Cluster scheduling requires on Kubernetes node being manually labeled for exclusive use by the Couchbase cluster. An example of how to perform this is documented in the cluster definition’s comments.
There is no one-size-fits-all cluster topology. This documents an arbitrary selection of services and server class sizes. Consult Couchbase solutions engineering to determine the correct cluster sizing for your workload.
apiVersion: couchbase.com/v2 kind: CouchbaseCluster metadata: name: cluster1 spec: image: couchbase/server:6.6.2 # Always enable anti-affinity to limit the "blast radius", and also ensure that any # assumptions about data replication hold i.e. a Kubernetes node going down will only # affect at most one pod. antiAffinity: true # Always select RBAC rules based on a label to prevent unexpectedly picking up # and unlabelled resources created in this namespace. security: adminSecret: administrator-authentication rbac: managed: true selector: matchLabels: cluster: cluster1 # Always select buckets based on a label to prevent unexpectedly picking up # and unlabelled resources created in this namespace. buckets: managed: true selector: matchLabels: cluster: cluster1 cluster: # Each service will be on its own pod, each pod will be on its own node. dataServiceMemoryQuota: 1Gi indexServiceMemoryQuota: 1Gi queryServiceMemoryQuota: 1Gi # Fast auto-failover ensures that replica data becomes live quickly and # minimises impact for applications in the face of trouble or upgrades. autoFailoverTimeout: 5s autoFailoverOnDataDiskIssues: true autoFailoverOnDataDiskIssuesTimePeriod: 5s # Auto resource allocation takes the memory quotas defined in the cluster # section and applies them to pods in the various server classes we will # define in the servers section. This manifests itself as Kubernetes # resource requests that ensure fair scheduling of pods across your # Kubernetes cluster. autoResourceAllocation: enabled: true # Enable managed TLS to protect all data from eavesdropping. networking: tls: secretSource: serverSecretName: cluster1-server-tls # Each server class will have its own storage template, this allows independent # scaling as the need arises, and also minimises the number of pods that are # affected by a particular change. Do not under provision storage, use the high # performance solid state variety. volumeClaimTemplates: - metadata: name: data spec: storageClassName: premium-rwo resources: requests: storage: 1Gi - metadata: name: index spec: storageClassName: premium-rwo resources: requests: storage: 1Gi - metadata: name: query spec: storageClassName: premium-rwo resources: requests: storage: 1Gi # Each service is hosted on its own set of pods. This facilitates simple independent # scaling of services, simplifies memory allocation and reduces blast radius, affecting # only a single service at a time. servers: - name: data size: 3 services: - data volumeMounts: default: data pod: spec: # By tainting all the nodes we intend to use, we ensure no other pods # are running on them, and we get exclusive use (noisy-neighbours) # Example: # for i in gke-cluster-default-pool-94815a4b-jkhv \ # gke-cluster-default-pool-94815a4b-phl1 \ # gke-cluster-default-pool-94815a4b-w4bl \ # gke-cluster-default-pool-c9efc654-8krv \ # gke-cluster-default-pool-c9efc654-kt07 \ # gke-cluster-default-pool-c9efc654-kt5r; do \ # kubectl taint nodes $i application-specific=couchbase-server:NoExecute # done tolerations: - key: application-specific value: couchbase-server effect: NoExecute # By selecting only nodes labeled for our use, we don't run where we # should not. # Example: # for i in gke-cluster-default-pool-94815a4b-jkhv \ # gke-cluster-default-pool-94815a4b-phl1 \ # gke-cluster-default-pool-94815a4b-w4bl \ # gke-cluster-default-pool-c9efc654-8krv \ # gke-cluster-default-pool-c9efc654-kt07 \ # gke-cluster-default-pool-c9efc654-kt5r; do \ # kubectl label nodes $i application=couchbase-server # done nodeSelector: application: couchbase-server - name: index size: 2 services: - index volumeMounts: default: index pod: spec: tolerations: - key: application-specific value: couchbase-server effect: NoExecute nodeSelector: application: couchbase-server - name: query size: 1 services: - query volumeMounts: default: query pod: spec: tolerations: - key: application-specific value: couchbase-server effect: NoExecute nodeSelector: application: couchbase-server