CouchbaseCluster Reference Architecture

    +
    How to configure a reference production deployment of Couchbase Server.

    Couchbase clusters can be configured in many different ways. We advertise features as opt-in, so you have the freedom to configure your cluster as suits your environment. There are, however, a core set of best practices that we recommend.

    This page collects together and aggregates those best practices into a single architecture. While it may not suit your environment completely, it may form the basis of your clusters.

    The majority of this page can be copied and used verbatim without modification. Non-dynamic elements of the configuration will be highlighted using admonitions such as this one.

    Prerequisites

    The reference architecture makes use of 3rd party resources, for ease of management and security compliance. Before continuing ensure you have installed the following on your Kubernetes cluster:

    RBAC Management

    Role based access control should be managed by the Operator. This allows your Couchbase users and groups to be defined in code. Code can be controlled with a change control system, it can be easily audited and reviewed, and crucially can be automated.

    # Applications and the Operator are able to use this secret.
    # For security, normal users within the namespace should be prohibited
    # from access to Secrets.  It is up to the administrator to also
    # ensure the passwords are not leaked from the application.
    apiVersion: v1
    kind: Secret
    metadata:
      name: application1-authentication
    type: Opaque
    stringData:
      password: pieWrewn5knyk&
    ---
    # Applications should have a user that they can use, this allows
    # strong guarantees of safety when using RBAC.  While strictly not
    # necessary, the labels provide filtering so resources are only
    # picked up by specific clusters.
    apiVersion: couchbase.com/v2
    kind: CouchbaseUser
    metadata:
      name: application1
      labels:
        cluster: cluster1
    spec:
      authDomain: local
      authSecret: application1-authentication
    ---
    # Groups control what the application is allowed to do.  The administrator
    # should limit group permissions to only what is absolutely necessary to
    # allow the application to function.  It also restricts those permissions
    # to a specific bucket that the application needs to access. While strictly not
    # necessary, the labels provide filtering so resources are only
    # picked up by specific clusters.
    apiVersion: couchbase.com/v2
    kind: CouchbaseGroup
    metadata:
      name: group1
      labels:
        cluster: cluster1
    spec:
      roles:
      - bucket: bucket1
        name: bucket_full_access
    ---
    # Role bindings are a bit of a misnomer.  They follow the behaviour of standard
    # kubernetes role bindings by creating a relationship between users and groups.
    apiVersion: couchbase.com/v2
    kind: CouchbaseRoleBinding
    metadata:
      name: group1
      labels:
        cluster: cluster1
    spec:
      roleRef:
        kind: CouchbaseGroup
        name: group1
      subjects:
      - kind: CouchbaseUser
        name: application1

    Bucket Management

    Buckets, like RBAC, should be managed by the Operator. Again this provides change control, peer review and auditing. This is essential for data containers, like buckets, as they have quotas, and these should be centrally controlled and managed to prevent resource starvation by over provisioning.

    # Buckets control the amount of data an application can use, and control how that
    # data is managed.  We need at least one data replica to ensure applications can
    # continue to work when a pod is evicted (usually for the purposes of Kubernetes
    # rolling upgrades, which should be carried out one every 3-4 months to ensure that
    # you are up to date with the latest security fixes from both Kubernetes and the
    # operator).
    apiVersion: couchbase.com/v2
    kind: CouchbaseBucket
    metadata:
      name: bucket1
      labels:
        cluster: cluster1
    spec:
      memoryQuota: 100Mi
      replicas: 2
      ioPriority: high
      enableIndexReplica: true

    Security Management

    Security is extremely important when working in a Cloud based environment such as Kubernetes. To this end we use TLS management by default to protect data from eavesdroppers.

    TLS is managed by 3rd party tooling, rather than manually. By doing this we simplify the process by removing manual steps, and also secure it by using policy based certificate rotation.

    The TLS subject alternative names described in this section contain the namespace of the cluster. In this example, that namespace is default. If you wish to deploy in a different namespace, then this will need to be updated to reflect that change.

    # Admin password is required by the Operator.  Like user secrets, this should be
    # protected from unauthorized reads.
    apiVersion: v1
    kind: Secret
    metadata:
      name: administrator-authentication
    type: Opaque
    stringData:
      username: Administrator
      password: Berv~fradrics3
    ---
    # The CA used to issue and sign server certificates is still a manual step.
    # This should be distributed to all clients who are going to consume this database
    # instance.  This CA is fixed for the lifetime of the cluster, so must be kept secure,
    # but does mean that clients will continue to function even when the server certificates
    # are automatically rotated periodically.
    apiVersion: v1
    kind: Secret
    metadata:
      name: ca
    data:
      tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBODNvSHArVnNDSGs4UEtsQTJGc1FyNk1yQjhoenpMUjZVTjJtTVdtSFpqWDFIem5WCm1ZYkVSQnhHQmVjbVVHRlRQZWo3Z3Y0NDl5QlV2ZXFIVCtDSjhubHRSbWxFZ24vV3NTYzZUYUl1UWRnRHdreVEKVEZ4UjkvR3JlNEY2M1BwcGNLNFpLQ3FtNjZ5Sk5qTktUQ2hBZEJsdFJyT0hqcDB2TTJrY01JTFl2VDFVVkQxUwpLRDRSQTl3R01XajJTOUQ4eDVnRjFOZHljYndPRWJmRzY3bTVZcFkrcmlIdWEyMHpoamJiZk5RbzN1SVdCeCtmCkVhTmNYOE1MTS9Jc2tVYzFEbTdiK3djaE9ZY2FFbWhIUXhYcHRqMDhCRUN1Z3ZXL3FLTmtjaUI2cE9aYzdlNGUKcVFuRVEvN1hJZ0t4TURUVkJTb1RqMWVJby9sQXhGVS8yYWlHWHdJREFRQUJBb0lCQVFDdGE4WDRPTmx5VDZndwpMUDRiSFFJTm1GTVdBQms3UFhIQ0Y1NUFvOEhsYzVsYzNIemdGYlhHTGIxU2h3b3JScWRiK1k3c0J0ZmNiaEx1CkV4YStObGtMZEtINC9SSG5RZGRSNTNjSHhQVGR3VmNzRmd6UjF4QXJZdCtaNE9mNmJnS2NWK1ZqVHI0R0w2YXMKREd4blFtUm1UWllnUGMvWUxPMXAyUHhUTVYvZnFXTzI3TW9jZml4bUs2MUl3V010bXd0QWNXc1FQZmlhWlRpMApSQlR0TzRyVHV5VnZESURjRlh1ZHVUaHlGNW5KTENsTERiazU5eFIxWlFLTTVoQjcvdjN3LzFneW1FbFNoY1hjClU2ME5TRDFBOTBmV1VMUkxzZXE5RXlyT3U3bGdrNDZBWmpxaGt0NDdqZkM4K3JuR0dUNnh1MWtkUjRJa0FBcnoKM1lzZm1JZUJBb0dCQVBxcjBsckV6R1FqYlk2RjVrTjJGeXVvb0txMFMrakNkRUlONFB1NmhjTnpXazFlTzA5agpBTWJFWlBxVFhkYnF6TlVMOTlkLytkL1JqcXR4dTRXbzZPWHhZSnNKNVU1bVROandVcVFKL3Y1am1ndjVvenlvCjY0RFFQdjE2aG1qSXhwcHRJd25EWC9KL2trdnlPZlVuQzY4dnZEQ3hxQkVHbDZ3UVVJWE12aE01QW9HQkFQaW4KRGhUR3hYZmFvOHhkWVRPbTdFUXVZa3l2Q2h0NUp5Tm05MWRRNEhnOTZPMmRYc3BWSDNrUllSYzFjMFdJZVM3MApHT0lSOGcrNXQvNXYvZVQ2R0pWOGMyc3VZRjErS1pTY3Z0bFFPYjJacitnUEN4SEtDVDY0SE83MnNFVGNYRGpaCjZ1cThiN0JCdnkxZXFtR1FsS2VZSUpEb1RKL0lpbGhFdGFFRHVlNVhBb0dBQWdJUVhGUEpRMkFaUjVRQkJUZFQKOWpDU29PdHkxRG1DanVqbmpYeXdCNkhMN21TNzJ1WHpJcVIrSHBmQm43QWYxZkVUbWpGWFFoaStxTmJ2WnFHMAp3K3JNR0ZIYStXYk9aTXFBRHZwWmhaWXNyTDNpTmVFd2ljYWhTb3lKdVJzcXBDQU5zTTFVM205eEw1U1FMRXVVCngyRjlnM0pZNDFJSE13U3FjSGYwYWRrQ2dZQXFTZThoSlhVc0h5bEFkcGt6ZWE0eElscGhoRnVKdEo4dGJET2cKekFhQkxMWlN3ekw5NG1CSjdPVEFWN3pWRkpMWG8zZ2Y2c0ZxWDBHbHFsSmFBUmJ4UllzenJWMkNTUlMxUzd0QgpwbDFMbTduSkU5WGtIcUpYNG1RNVdBYytqdU80WDRlT2lLSE9Ma0JmYlB3NVA2ZW9vVHpZcUVsdjIyRjhCYU9HClVPWHNYUUtCZ1FEV0xHcThpeWRXK2FkK1k4blZxV1VUZnl4VW5yaW5TVG95ei9nT3dDZ2ZoS2FSdGpYclEvUmEKSVlTczNjR016SldwZXZkWGRDc2JSL0I2L09VSSsyTlRqcEo4c3FVZnhjbHpMeU4xQjlQUFhGeDF5WTV5ZDFJZApVQnRLTG5KRjBSQlRzWnNmVG0vUUpJeGY3Y2ZWUGRlTGcxMEdJeFhzaG5GV0FLUHhveUhSeGc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
      tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURTekNDQWpPZ0F3SUJBZ0lVQ1ptem1IQjI0cUFwUnY3WnpLUGFJclFHam5Nd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0ZqRVVNQklHQTFVRUF3d0xSV0Z6ZVMxU1UwRWdRMEV3SGhjTk1qQXhNakUzTVRJMU9ESXlXaGNOTXpBeApNakUxTVRJMU9ESXlXakFXTVJRd0VnWURWUVFEREF0RllYTjVMVkpUUVNCRFFUQ0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFQTjZCNmZsYkFoNVBEeXBRTmhiRUsrakt3ZkljOHkwZWxEZHBqRnAKaDJZMTlSODUxWm1HeEVRY1JnWG5KbEJoVXozbys0TCtPUGNnVkwzcWgwL2dpZko1YlVacFJJSi8xckVuT2syaQpMa0hZQThKTWtFeGNVZmZ4cTN1QmV0ejZhWEN1R1NncXB1dXNpVFl6U2t3b1FIUVpiVWF6aDQ2ZEx6TnBIRENDCjJMMDlWRlE5VWlnK0VRUGNCakZvOWt2US9NZVlCZFRYY25HOERoRzN4dXU1dVdLV1BxNGg3bXR0TTRZMjIzelUKS043aUZnY2ZueEdqWEYvREN6UHlMSkZITlE1dTIvc0hJVG1IR2hKb1IwTVY2Ylk5UEFSQXJvTDF2NmlqWkhJZwplcVRtWE8zdUhxa0p4RVArMXlJQ3NUQTAxUVVxRTQ5WGlLUDVRTVJWUDltb2hsOENBd0VBQWFPQmtEQ0JqVEFkCkJnTlZIUTRFRmdRVWdOdmtvVkY2dW5CSFFrS0p6Mzk4ZlJSUUs2VXdVUVlEVlIwakJFb3dTSUFVZ052a29WRjYKdW5CSFFrS0p6Mzk4ZlJSUUs2V2hHcVFZTUJZeEZEQVNCZ05WQkFNTUMwVmhjM2t0VWxOQklFTkJnaFFKbWJPWQpjSGJpb0NsRy90bk1vOW9pdEFhT2N6QU1CZ05WSFJNRUJUQURBUUgvTUFzR0ExVWREd1FFQXdJQkJqQU5CZ2txCmhraUc5dzBCQVFzRkFBT0NBUUVBeW9xakNxa1lJSmQ3dUF5TFRHSnB3cFRLd2JTSTJPcXBRNkVDRUNpZjBaWkYKNHBTT1ArYjg1bzF3VmptZ2wvTW92VXBPYTRxN3NSekcyM052Z0lKTjFzOXlYTURlRTB4TDE3dmpzWVFGNUlGSwo0bFEzTVArSFVLMGprUWthNTBNeFZQUTRDWldrUmV0V2d6M2l0bk8zcFVLbGc3bWpxV1hVc3dUYkw1S01PQzZ0CnBWZFBsRzRPaWJIa004czRrNzJhb1ovdzRaMStsVUpORXRNQldkMnI4LytlcEpMOUp2dXN0cGlPcnYvVkF5eC8KWW9lcjRTZitxWDEvand0ZWNHWFFmNHFKMkJwL2h6a3F4YzNETFFiSGFxaEZGQ3o2T2pWRGxzc0tqNjEzeEJoTwpYbkJNT1NCbVFEMkxoZWlVcC9aN1E5ZlovdDE2WW9NZ0tHSngwOUl3VlE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    ---
    apiVersion: cert-manager.io/v1
    kind: Issuer
    metadata:
      name: ca
    spec:
      ca:
        secretName: ca
    ---
    # Certificates are generated with Jetstack Cert Manager.  This simplifies
    # configuration by keeping it as code, avoiding having to use openssl (or
    # similar) directly.  By using Cert Manager we can readily demonstrate to
    # security auditors that certificates are rotated on a period basis and also
    # conform to encryption strength constraints.
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: cluster1-certificate
    spec:
      secretName: cluster1-server-tls
      duration: 720h
      renewBefore: 24h
      commonName: couchbase-server
      isCA: false
      privateKey:
        algorithm: RSA
        encoding: PKCS8
        size: 2048
      usages:
        - server auth
      dnsNames:
        - "*.cluster1"
        - "*.cluster1.default"
        - "*.cluster1.default.svc"
        - "*.cluster1.default.svc.cluster.local"
        - "cluster1-srv"
        - "cluster1-srv.default"
        - "cluster1-srv.default.svc"
        - "*.cluster1-srv.default.svc.cluster.local"
        - localhost
      issuerRef:
        name: ca
        group: cert-manager.io
        kind: Issuer

    Cluster Management

    Cluster configuration is quite complicated, so will not be discussed at length here. Instead comments are provided inline, where they are contextually relevant.

    In general, the cluster is designed to be stable, fault tolerant and secure.

    Cluster scheduling requires on Kubernetes node being manually labeled for exclusive use by the Couchbase cluster. An example of how to perform this is documented in the cluster definition’s comments.

    There is no one-size-fits-all cluster topology. This documents an arbitrary selection of services and server class sizes. Consult Couchbase solutions engineering to determine the correct cluster sizing for your workload.

    apiVersion: couchbase.com/v2
    kind: CouchbaseCluster
    metadata:
      name: cluster1
    spec:
      image: couchbase/server:6.6.2
      # Always enable anti-affinity to limit the "blast radius", and also ensure that any
      # assumptions about data replication hold i.e. a Kubernetes node going down will only
      # affect at most one pod.
      antiAffinity: true
      # Always select RBAC rules based on a label to prevent unexpectedly picking up
      # and unlabelled resources created in this namespace.
      security:
        adminSecret: administrator-authentication
        rbac:
          managed: true
          selector:
            matchLabels:
              cluster: cluster1
      # Always select buckets based on a label to prevent unexpectedly picking up
      # and unlabelled resources created in this namespace.
      buckets:
        managed: true
        selector:
          matchLabels:
            cluster: cluster1
      cluster:
        # Each service will be on its own pod, each pod will be on its own node.
        dataServiceMemoryQuota: 1Gi
        indexServiceMemoryQuota: 1Gi
        queryServiceMemoryQuota: 1Gi
        # Fast auto-failover ensures that replica data becomes live quickly and
        # minimises impact for applications in the face of trouble or upgrades.
        autoFailoverTimeout: 5s
        autoFailoverOnDataDiskIssues: true
        autoFailoverOnDataDiskIssuesTimePeriod: 5s
      # Auto resource allocation takes the memory quotas defined in the cluster
      # section and applies them to pods in the various server classes we will
      # define in the servers section.  This manifests itself as Kubernetes
      # resource requests that ensure fair scheduling of pods across your
      # Kubernetes cluster.
      autoResourceAllocation:
        enabled: true
      # Enable managed TLS to protect all data from eavesdropping.
      networking:
        tls:
          secretSource:
            serverSecretName: cluster1-server-tls
      # Each server class will have its own storage template, this allows independent
      # scaling as the need arises, and also minimises the number of pods that are
      # affected by a particular change.  Do not under provision storage, use the high
      # performance solid state variety.
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          storageClassName: premium-rwo
          resources:
            requests:
              storage: 1Gi
      - metadata:
          name: index
        spec:
          storageClassName: premium-rwo
          resources:
            requests:
              storage: 1Gi
      - metadata:
          name: query
        spec:
          storageClassName: premium-rwo
          resources:
            requests:
              storage: 1Gi
      # Each service is hosted on its own set of pods.  This facilitates simple independent
      # scaling of services, simplifies memory allocation and reduces blast radius, affecting
      # only a single service at a time.
      servers:
      - name: data
        size: 3
        services:
        - data
        volumeMounts:
          default: data
        pod:
          spec:
            # By tainting all the nodes we intend to use, we ensure no other pods
            # are running on them, and we get exclusive use (noisy-neighbours)
            # Example:
            #   for i in gke-cluster-default-pool-94815a4b-jkhv \
            #            gke-cluster-default-pool-94815a4b-phl1 \
            #            gke-cluster-default-pool-94815a4b-w4bl \
            #            gke-cluster-default-pool-c9efc654-8krv \
            #            gke-cluster-default-pool-c9efc654-kt07 \
            #            gke-cluster-default-pool-c9efc654-kt5r; do \
            #     kubectl taint nodes $i application-specific=couchbase-server:NoExecute
            #   done
            tolerations:
            - key: application-specific
              value: couchbase-server
              effect: NoExecute
            # By selecting only nodes labeled for our use, we don't run where we
            # should not.
            # Example:
            #   for i in gke-cluster-default-pool-94815a4b-jkhv \
            #            gke-cluster-default-pool-94815a4b-phl1 \
            #            gke-cluster-default-pool-94815a4b-w4bl \
            #            gke-cluster-default-pool-c9efc654-8krv \
            #            gke-cluster-default-pool-c9efc654-kt07 \
            #            gke-cluster-default-pool-c9efc654-kt5r; do \
            #     kubectl label nodes $i application=couchbase-server
            #   done
            nodeSelector:
              application: couchbase-server
      - name: index
        size: 2
        services:
        - index
        volumeMounts:
          default: index
        pod:
          spec:
            tolerations:
            - key: application-specific
              value: couchbase-server
              effect: NoExecute
            nodeSelector:
              application: couchbase-server
      - name: query
        size: 1
        services:
        - query
        volumeMounts:
          default: query
        pod:
          spec:
            tolerations:
            - key: application-specific
              value: couchbase-server
              effect: NoExecute
            nodeSelector:
              application: couchbase-server