Couchbase Operator Architecture

    +
    This section gives a high level overview of the Operator architecture.

    Custom Resource Definitions

    A custom resource definition (CRD) is a user defined type in Kubernetes. These allow us to create domain specific resources such as Couchbase clusters or Couchbase buckets that cannot be represented by other native types.

    A CRD simply defines a type name within a group e.g. CouchbaseCluster in couchbase.com. The Operator distributes CRDs with full JSON schema definitions attached to them. This allows the Kubernetes API server to check the structure and types of incoming custom resources for validity.

    Dynamic Admission Controller

    Builtin Kubernetes resources function differently to Couchbase custom resources. Kubernetes will raise errors when creating Pod resources with additional illegal fields. Kubernetes will also insert default values for certain attributes. Neither of these things happen by default with custom resources. While this functionality is now present in Kubernetes 1.16+ with v1 CRDs, the Operator still supports earlier versions so cannot make use of this functionality. To this end the Operator comes with a dynamic admission controller (DAC).

    The DAC allows custom resources to be modified and interrogated before a resource is accepted and committed to etcd. Running the DAC allows us to add sensible defaults to Couchbase cluster configurations thus minimizing the size of specifications. It also allows us to maintain backwards compatibility when new attributes are added and must be populated. This makes the experience of using Couchbase resources similar to that of native resource types.

    Another benefit is that Couchbase specific configuration errors are synchronously reported back to the user in real time, rather than errors appearing in the Operator log and going unnoticed.

    For these reasons the DAC is a required component of the Operator and must be installed. The DAC is a standalone service and processes Couchbase cluster resources for the entire Kubernetes cluster, therefore only a single instance is required.

    By default the DAC will check that Kubernetes secrets and storage classes exist and have not been misconfigured. This is important because the validity of things like TLS certificates can be checked before attempting to create the cluster. If the required permissions are too permissive for your environment then you can remove them. The DAC will ignore permissions errors when polling for resources.

    Dynamic Admission Controller Architecture

    The following is a simplified illustration of how the admission controller works:

    Admission Controller Architecture

    1. A client connects to the Kubernetes API and sends a request to create a resource. The resource specification is encoded as JSON.

    2. The API forwards the JSON to the mutating endpoint of the admission controller. A mutating webhook is responsible for altering the resource (applying default values, for example). It may optionally choose to accept or reject the create request.

    3. The API forwards the JSON to the validating endpoint of the admission controller. A validating webhook is responsible for validating specification constraints above and beyond those offered by JSON schema validation provided by the custom resource definition. It may optionally choose to accept or reject the create request.

    4. Once all admission checks have passed, the resource is persisted in the database (etcd).

    5. The API responds to the client that the create request has been accepted.

    If either of the admission checks in stages 2 and 3 respond that the resource is not acceptable, the API will go directly to stage 5 and return any errors returned by the admission controller.

    Dynamic Admission Controller Resources

    The admission controller is implemented as a simple web server. The application layer protocol is HTTP over TLS. The admission controller is deployed using Kubernetes native primitives, such as a Deployment, providing high availability and fault tolerance.

    The DAC is stateless so more than one replica may be run for high-availability. The DAC is a statically compiled binary, so does not require an operating system image. The DAC does not require any elevated privileges and may be run as any user.

    architecture dac
    Figure 1. Dynamic Admission Controller Resources

    The dotted box in the diagram denotes namespaced resources. Resources highlighted in red must be created by an administrator who has permission to create cluster scoped resources, or those that grant privilege escalation.

    The admission controller Deployment is associated with a ServiceAccount that grants the admission controller permissions to access other resources with a role. Detailed role requirements are documented in the dynamic admission controller RBAC reference guide. Access to resource types allows the admission controller to check that any resources, such as Secrets, are present for the Operator to access and use. It also allows the admission controller to poll for existing CouchbaseCluster resources to check for invariance of certain specification attributes.

    A Secret is used to provide TLS certificates to the DAC container. A service endpoint is exposed with a Kubernetes Service resource that provides a stable DNS name, fault tolerance, and load balancing. The service endpoint is finally bound to the Kubernetes API with MutatingWebhookConfiguration and ValidatingWebhookConfiguration resources. The webhooks identify the resource type and version, and the types of operation to respond to. They also define the TLS CA certificate to use for validation of the service endpoint and the HTTP path to route requests to.

    Operator

    The Operator watches for events related to CouchbaseCluster resources. The Operator reacts to creation events by provisioning new resources and initializing the Couchbase cluster.

    During the lifetime of the Couchbase cluster the Operator continually compares the state of Kubernetes resources with what is requested in the CouchbaseCluster resource, reconciling as necessary to make reality match what was requested. The Operator is also Couchbase Server aware, so can detect and fix faults that would not otherwise be visible to Kubernetes.

    Polling the Kubernetes API continually to check for resource statuses is a costly operation. Etcd is commonly shown to be a bottleneck. To prevent the Operator from causing unnecessary API traffic and database accesses it uses local caching of every resource type it manages. Subsequently it needs list and watch permissions on all managed resources. After the initial list operation, the API only informs the Operator of changes that have happened, reducing API traffic to the absolute minimum.

    The Operator is designed to run in the same namespace as the Couchbase clusters it is managing. The Operator therefore needs one instance per namespace where Couchbase clusters are required to be provisioned. The Operator is a statically compiled binary, so does not require an operating system image. The Operator does not require any elevated privileges and may be run as any user.

    Operator Resources

    The Operator is a basic application that uses a Deployment to provide high-availability.

    architecture operator
    Figure 2. Operator Resources

    The dotted box in the diagram denotes namespaced resources. Resources highlighted in red must be created by an administrator who has permission to create cluster scoped resources, or those that grant privilege escalation.

    The Operator Deployment is associated with a ServiceAccount that grants the Operator permissions to discover, create, modify and delete resources required to manage a Couchbase cluster. Detailed role requirements are documented in the Operator RBAC reference guide.

    A Service is provided to allow access to Operator Prometheus metrics, if desired.

    Couchbase Cluster

    Couchbase clusters are create by the Operator responding to CouchbaseCluster resources.

    architecture cbc
    Figure 3. Couchbase Cluster Resources

    All resources are linked to their parent CouchbaseCluster resource with owner references. If a CouchbaseCluster is deleted this will cascade and delete all child resources.

    ConfigMaps are used to persist state required per-cluster.

    Pods are used to create Couchbase server instances.

    PodDisruptionBudgets are used to control Kubernetes rolling-upgrades. These prevent Kubernetes from draining nodes in a way that would result in data loss.

    Services are used to establish DNS entries for communication with Couchbase server endpoints. Per-node services can also be used to provide addressability to clients operating outside of the Kubernetes cluster.

    Jobs and CronJobs are used to restore data to, and backup data periodically from, a Couchbase cluster.

    PersistentVolumeClaims are used to provide high-performance disaster recovery in the event of a Couchbase server crash, accidental deletion or data center failure. They are used as backing storage for Couchbase backups. PersistentVolumeClaims related to Couchbase backups are not associated with the parent CouchbaseCluster, and are not deleted when the parent is.