+
    How to configure a reference production deployment of Couchbase Server.

    Couchbase clusters can be configured in many different ways. We advertise features as opt-in, so you have the freedom to configure your cluster as suits your environment. There are, however, a core set of best practices that we recommend.

    This page collects together and aggregates those best practices into a single architecture. While it may not suit your environment completely, it may form the basis of your clusters.

    The majority of this page can be copied and used verbatim without modification. Non-dynamic elements of the configuration will be highlighted using admonitions such as this one.

    Prerequisites

    The reference architecture makes use of 3rd party resources, for ease of management and security compliance. Before continuing ensure you have installed the following on your Kubernetes cluster:

    RBAC Management

    Role based access control should be managed by the Operator. This allows your Couchbase users and groups to be defined in code. Code can be controlled with a change control system, it can be easily audited and reviewed, and crucially can be automated.

    yaml
    # Applications and the Operator are able to use this secret. # For security, normal users within the namespace should be prohibited # from access to Secrets. It is up to the administrator to also # ensure the passwords are not leaked from the application. apiVersion: v1 kind: Secret metadata: name: application1-authentication type: Opaque stringData: password: pieWrewn5knyk& --- # Applications should have a user that they can use, this allows # strong guarantees of safety when using RBAC. While strictly not # necessary, the labels provide filtering so resources are only # picked up by specific clusters. apiVersion: couchbase.com/v2 kind: CouchbaseUser metadata: name: application1 labels: cluster: cluster1 spec: authDomain: local authSecret: application1-authentication --- # Groups control what the application is allowed to do. The administrator # should limit group permissions to only what is absolutely necessary to # allow the application to function. It also restricts those permissions # to a specific bucket that the application needs to access. While strictly not # necessary, the labels provide filtering so resources are only # picked up by specific clusters. apiVersion: couchbase.com/v2 kind: CouchbaseGroup metadata: name: group1 labels: cluster: cluster1 spec: roles: - bucket: bucket1 name: bucket_full_access --- # Role bindings are a bit of a misnomer. They follow the behaviour of standard # kubernetes role bindings by creating a relationship between users and groups. apiVersion: couchbase.com/v2 kind: CouchbaseRoleBinding metadata: name: group1 labels: cluster: cluster1 spec: roleRef: kind: CouchbaseGroup name: group1 subjects: - kind: CouchbaseUser name: application1

    Bucket Management

    Buckets, like RBAC, should be managed by the Operator. Again this provides change control, peer review and auditing. This is essential for data containers, like buckets, as they have quotas, and these should be centrally controlled and managed to prevent resource starvation by over provisioning.

    yaml
    # Buckets control the amount of data an application can use, and control how that # data is managed. We need at least one data replica to ensure applications can # continue to work when a pod is evicted (usually for the purposes of Kubernetes # rolling upgrades, which should be carried out one every 3-4 months to ensure that # you are up to date with the latest security fixes from both Kubernetes and the # operator). apiVersion: couchbase.com/v2 kind: CouchbaseBucket metadata: name: bucket1 labels: cluster: cluster1 spec: memoryQuota: 100Mi replicas: 2 ioPriority: high enableIndexReplica: true

    Security Management

    Security is extremely important when working in a Cloud based environment such as Kubernetes. To this end we use TLS management by default to protect data from eavesdroppers.

    TLS is managed by 3rd party tooling, rather than manually. By doing this we simplify the process by removing manual steps, and also secure it by using policy based certificate rotation.

    The TLS subject alternative names described in this section contain the namespace of the cluster. In this example, that namespace is default. If you wish to deploy in a different namespace, then this will need to be updated to reflect that change.

    yaml
    # Admin password is required by the Operator. Like user secrets, this should be # protected from unauthorized reads. apiVersion: v1 kind: Secret metadata: name: administrator-authentication type: Opaque stringData: username: Administrator password: Berv~fradrics3 --- # The CA used to issue and sign server certificates is still a manual step. # This should be distributed to all clients who are going to consume this database # instance. This CA is fixed for the lifetime of the cluster, so must be kept secure, # but does mean that clients will continue to function even when the server certificates # are automatically rotated periodically. apiVersion: v1 kind: Secret metadata: name: ca data: tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBODNvSHArVnNDSGs4UEtsQTJGc1FyNk1yQjhoenpMUjZVTjJtTVdtSFpqWDFIem5WCm1ZYkVSQnhHQmVjbVVHRlRQZWo3Z3Y0NDl5QlV2ZXFIVCtDSjhubHRSbWxFZ24vV3NTYzZUYUl1UWRnRHdreVEKVEZ4UjkvR3JlNEY2M1BwcGNLNFpLQ3FtNjZ5Sk5qTktUQ2hBZEJsdFJyT0hqcDB2TTJrY01JTFl2VDFVVkQxUwpLRDRSQTl3R01XajJTOUQ4eDVnRjFOZHljYndPRWJmRzY3bTVZcFkrcmlIdWEyMHpoamJiZk5RbzN1SVdCeCtmCkVhTmNYOE1MTS9Jc2tVYzFEbTdiK3djaE9ZY2FFbWhIUXhYcHRqMDhCRUN1Z3ZXL3FLTmtjaUI2cE9aYzdlNGUKcVFuRVEvN1hJZ0t4TURUVkJTb1RqMWVJby9sQXhGVS8yYWlHWHdJREFRQUJBb0lCQVFDdGE4WDRPTmx5VDZndwpMUDRiSFFJTm1GTVdBQms3UFhIQ0Y1NUFvOEhsYzVsYzNIemdGYlhHTGIxU2h3b3JScWRiK1k3c0J0ZmNiaEx1CkV4YStObGtMZEtINC9SSG5RZGRSNTNjSHhQVGR3VmNzRmd6UjF4QXJZdCtaNE9mNmJnS2NWK1ZqVHI0R0w2YXMKREd4blFtUm1UWllnUGMvWUxPMXAyUHhUTVYvZnFXTzI3TW9jZml4bUs2MUl3V010bXd0QWNXc1FQZmlhWlRpMApSQlR0TzRyVHV5VnZESURjRlh1ZHVUaHlGNW5KTENsTERiazU5eFIxWlFLTTVoQjcvdjN3LzFneW1FbFNoY1hjClU2ME5TRDFBOTBmV1VMUkxzZXE5RXlyT3U3bGdrNDZBWmpxaGt0NDdqZkM4K3JuR0dUNnh1MWtkUjRJa0FBcnoKM1lzZm1JZUJBb0dCQVBxcjBsckV6R1FqYlk2RjVrTjJGeXVvb0txMFMrakNkRUlONFB1NmhjTnpXazFlTzA5agpBTWJFWlBxVFhkYnF6TlVMOTlkLytkL1JqcXR4dTRXbzZPWHhZSnNKNVU1bVROandVcVFKL3Y1am1ndjVvenlvCjY0RFFQdjE2aG1qSXhwcHRJd25EWC9KL2trdnlPZlVuQzY4dnZEQ3hxQkVHbDZ3UVVJWE12aE01QW9HQkFQaW4KRGhUR3hYZmFvOHhkWVRPbTdFUXVZa3l2Q2h0NUp5Tm05MWRRNEhnOTZPMmRYc3BWSDNrUllSYzFjMFdJZVM3MApHT0lSOGcrNXQvNXYvZVQ2R0pWOGMyc3VZRjErS1pTY3Z0bFFPYjJacitnUEN4SEtDVDY0SE83MnNFVGNYRGpaCjZ1cThiN0JCdnkxZXFtR1FsS2VZSUpEb1RKL0lpbGhFdGFFRHVlNVhBb0dBQWdJUVhGUEpRMkFaUjVRQkJUZFQKOWpDU29PdHkxRG1DanVqbmpYeXdCNkhMN21TNzJ1WHpJcVIrSHBmQm43QWYxZkVUbWpGWFFoaStxTmJ2WnFHMAp3K3JNR0ZIYStXYk9aTXFBRHZwWmhaWXNyTDNpTmVFd2ljYWhTb3lKdVJzcXBDQU5zTTFVM205eEw1U1FMRXVVCngyRjlnM0pZNDFJSE13U3FjSGYwYWRrQ2dZQXFTZThoSlhVc0h5bEFkcGt6ZWE0eElscGhoRnVKdEo4dGJET2cKekFhQkxMWlN3ekw5NG1CSjdPVEFWN3pWRkpMWG8zZ2Y2c0ZxWDBHbHFsSmFBUmJ4UllzenJWMkNTUlMxUzd0QgpwbDFMbTduSkU5WGtIcUpYNG1RNVdBYytqdU80WDRlT2lLSE9Ma0JmYlB3NVA2ZW9vVHpZcUVsdjIyRjhCYU9HClVPWHNYUUtCZ1FEV0xHcThpeWRXK2FkK1k4blZxV1VUZnl4VW5yaW5TVG95ei9nT3dDZ2ZoS2FSdGpYclEvUmEKSVlTczNjR016SldwZXZkWGRDc2JSL0I2L09VSSsyTlRqcEo4c3FVZnhjbHpMeU4xQjlQUFhGeDF5WTV5ZDFJZApVQnRLTG5KRjBSQlRzWnNmVG0vUUpJeGY3Y2ZWUGRlTGcxMEdJeFhzaG5GV0FLUHhveUhSeGc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo= tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURTekNDQWpPZ0F3SUJBZ0lVQ1ptem1IQjI0cUFwUnY3WnpLUGFJclFHam5Nd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0ZqRVVNQklHQTFVRUF3d0xSV0Z6ZVMxU1UwRWdRMEV3SGhjTk1qQXhNakUzTVRJMU9ESXlXaGNOTXpBeApNakUxTVRJMU9ESXlXakFXTVJRd0VnWURWUVFEREF0RllYTjVMVkpUUVNCRFFUQ0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFQTjZCNmZsYkFoNVBEeXBRTmhiRUsrakt3ZkljOHkwZWxEZHBqRnAKaDJZMTlSODUxWm1HeEVRY1JnWG5KbEJoVXozbys0TCtPUGNnVkwzcWgwL2dpZko1YlVacFJJSi8xckVuT2syaQpMa0hZQThKTWtFeGNVZmZ4cTN1QmV0ejZhWEN1R1NncXB1dXNpVFl6U2t3b1FIUVpiVWF6aDQ2ZEx6TnBIRENDCjJMMDlWRlE5VWlnK0VRUGNCakZvOWt2US9NZVlCZFRYY25HOERoRzN4dXU1dVdLV1BxNGg3bXR0TTRZMjIzelUKS043aUZnY2ZueEdqWEYvREN6UHlMSkZITlE1dTIvc0hJVG1IR2hKb1IwTVY2Ylk5UEFSQXJvTDF2NmlqWkhJZwplcVRtWE8zdUhxa0p4RVArMXlJQ3NUQTAxUVVxRTQ5WGlLUDVRTVJWUDltb2hsOENBd0VBQWFPQmtEQ0JqVEFkCkJnTlZIUTRFRmdRVWdOdmtvVkY2dW5CSFFrS0p6Mzk4ZlJSUUs2VXdVUVlEVlIwakJFb3dTSUFVZ052a29WRjYKdW5CSFFrS0p6Mzk4ZlJSUUs2V2hHcVFZTUJZeEZEQVNCZ05WQkFNTUMwVmhjM2t0VWxOQklFTkJnaFFKbWJPWQpjSGJpb0NsRy90bk1vOW9pdEFhT2N6QU1CZ05WSFJNRUJUQURBUUgvTUFzR0ExVWREd1FFQXdJQkJqQU5CZ2txCmhraUc5dzBCQVFzRkFBT0NBUUVBeW9xakNxa1lJSmQ3dUF5TFRHSnB3cFRLd2JTSTJPcXBRNkVDRUNpZjBaWkYKNHBTT1ArYjg1bzF3VmptZ2wvTW92VXBPYTRxN3NSekcyM052Z0lKTjFzOXlYTURlRTB4TDE3dmpzWVFGNUlGSwo0bFEzTVArSFVLMGprUWthNTBNeFZQUTRDWldrUmV0V2d6M2l0bk8zcFVLbGc3bWpxV1hVc3dUYkw1S01PQzZ0CnBWZFBsRzRPaWJIa004czRrNzJhb1ovdzRaMStsVUpORXRNQldkMnI4LytlcEpMOUp2dXN0cGlPcnYvVkF5eC8KWW9lcjRTZitxWDEvand0ZWNHWFFmNHFKMkJwL2h6a3F4YzNETFFiSGFxaEZGQ3o2T2pWRGxzc0tqNjEzeEJoTwpYbkJNT1NCbVFEMkxoZWlVcC9aN1E5ZlovdDE2WW9NZ0tHSngwOUl3VlE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== --- apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: ca spec: ca: secretName: ca --- # Certificates are generated with Jetstack Cert Manager. This simplifies # configuration by keeping it as code, avoiding having to use openssl (or # similar) directly. By using Cert Manager we can readily demonstrate to # security auditors that certificates are rotated on a period basis and also # conform to encryption strength constraints. apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: cluster1-certificate spec: secretName: cluster1-server-tls duration: 720h renewBefore: 24h commonName: couchbase-server isCA: false privateKey: algorithm: RSA encoding: PKCS8 size: 2048 usages: - server auth dnsNames: - "*.cluster1" - "*.cluster1.default" - "*.cluster1.default.svc" - "*.cluster1.default.svc.cluster.local" - "cluster1-srv" - "cluster1-srv.default" - "cluster1-srv.default.svc" - "*.cluster1-srv.default.svc.cluster.local" - localhost issuerRef: name: ca group: cert-manager.io kind: Issuer

    Cluster Management

    Cluster configuration is quite complicated, so will not be discussed at length here. Instead comments are provided inline, where they are contextually relevant.

    In general, the cluster is designed to be stable, fault tolerant and secure.

    Cluster scheduling requires on Kubernetes node being manually labeled for exclusive use by the Couchbase cluster. An example of how to perform this is documented in the cluster definition’s comments.

    There is no one-size-fits-all cluster topology. This documents an arbitrary selection of services and server class sizes. Consult Couchbase solutions engineering to determine the correct cluster sizing for your workload.

    yaml
    apiVersion: couchbase.com/v2 kind: CouchbaseCluster metadata: name: cluster1 spec: image: couchbase/server:7.6.0 # Always enable anti-affinity to limit the "blast radius", and also ensure that any # assumptions about data replication hold i.e. a Kubernetes node going down will only # affect at most one pod. antiAffinity: true # Always select RBAC rules based on a label to prevent unexpectedly picking up # and unlabelled resources created in this namespace. security: adminSecret: administrator-authentication rbac: managed: true selector: matchLabels: cluster: cluster1 # Always select buckets based on a label to prevent unexpectedly picking up # and unlabelled resources created in this namespace. buckets: managed: true selector: matchLabels: cluster: cluster1 cluster: # Each service will be on its own pod, each pod will be on its own node. dataServiceMemoryQuota: 1Gi indexServiceMemoryQuota: 1Gi queryServiceMemoryQuota: 1Gi # Fast auto-failover ensures that replica data becomes live quickly and # minimises impact for applications in the face of trouble or upgrades. autoFailoverTimeout: 5s autoFailoverOnDataDiskIssues: true autoFailoverOnDataDiskIssuesTimePeriod: 5s # Auto resource allocation takes the memory quotas defined in the cluster # section and applies them to pods in the various server classes we will # define in the servers section. This manifests itself as Kubernetes # resource requests that ensure fair scheduling of pods across your # Kubernetes cluster. autoResourceAllocation: enabled: true # Enable managed TLS to protect all data from eavesdropping. networking: tls: secretSource: serverSecretName: cluster1-server-tls # Each server class will have its own storage template, this allows independent # scaling as the need arises, and also minimises the number of pods that are # affected by a particular change. Do not under provision storage, use the high # performance solid state variety. volumeClaimTemplates: - metadata: name: data spec: storageClassName: premium-rwo resources: requests: storage: 1Gi - metadata: name: index spec: storageClassName: premium-rwo resources: requests: storage: 1Gi - metadata: name: query spec: storageClassName: premium-rwo resources: requests: storage: 1Gi # Each service is hosted on its own set of pods. This facilitates simple independent # scaling of services, simplifies memory allocation and reduces blast radius, affecting # only a single service at a time. servers: - name: data size: 3 services: - data volumeMounts: default: data pod: spec: # By tainting all the nodes we intend to use, we ensure no other pods # are running on them, and we get exclusive use (noisy-neighbours) # Example: # for i in gke-cluster-default-pool-94815a4b-jkhv \ # gke-cluster-default-pool-94815a4b-phl1 \ # gke-cluster-default-pool-94815a4b-w4bl \ # gke-cluster-default-pool-c9efc654-8krv \ # gke-cluster-default-pool-c9efc654-kt07 \ # gke-cluster-default-pool-c9efc654-kt5r; do \ # kubectl taint nodes $i application-specific=couchbase-server:NoExecute # done tolerations: - key: application-specific value: couchbase-server effect: NoExecute # By selecting only nodes labeled for our use, we don't run where we # should not. # Example: # for i in gke-cluster-default-pool-94815a4b-jkhv \ # gke-cluster-default-pool-94815a4b-phl1 \ # gke-cluster-default-pool-94815a4b-w4bl \ # gke-cluster-default-pool-c9efc654-8krv \ # gke-cluster-default-pool-c9efc654-kt07 \ # gke-cluster-default-pool-c9efc654-kt5r; do \ # kubectl label nodes $i application=couchbase-server # done nodeSelector: application: couchbase-server - name: index size: 2 services: - index volumeMounts: default: index pod: spec: tolerations: - key: application-specific value: couchbase-server effect: NoExecute nodeSelector: application: couchbase-server - name: query size: 1 services: - query volumeMounts: default: query pod: spec: tolerations: - key: application-specific value: couchbase-server effect: NoExecute nodeSelector: application: couchbase-server