Configure Automated Backup and Restore

You can configure the Autonomous Operator to take periodic, automated backups of your Couchbase Cluster with the existing functionality provided by cbbackupmgr.

Overview

The Autonomous Operator provides automated backup and restore capabilities through a native integration with the Couchbase Server tool cbbackupmgr.

Automated backup is enabled in the CouchbaseCluster resource. The configuration allows you to specify a Couchbase-provided container image that contains the cbbackupmgr tool.

Once automated backup is enabled, individual backup policies can be configured using CouchbaseBackup resources, which define things like schedule and backup strategy. Each CouchbaseBackup resource creates one or two Kubernetes CronJobs that will spawn backup jobs according to the given Cron schedule(s). These backup jobs execute a helper script which performs logging and cleanup, and utilizes cbbackupmgr to perform backup and restore.

Because backup policies are configured with a separate resource, you can use custom resource RBAC to allow individuals who may not have access to CouchbaseCluster resources to still perform backup administration.

The Autonomous Operator supports two of the backup strategies available in cbbackupmgr: Full Only and Full/Incremental. Complete descriptions and explanations of these strategies can be found in the cbbackupmgr documentation. The examples on this page assume a backup schedule based on the Full/Incremental strategy for both creating backups and performing restores.

Important Considerations

  • Only the official Autonomous Operator Backup image provided by Couchbase is supported. Note that this image is designed to only ever be pulled and run by the Autonomous Operator — it should not be used in any other context.

    In addition, you should ensure that your image source is trusted. The backup image requires access to the Couchbase cluster administrative credentials in order to login and perform collection. Granting these credentials to arbitrary code is potentially harmful.

  • The Autonomous Operator runs the backup utility in a separate Pod. Where this Pod is scheduled can have implications on backup performance, and can affect whether backup jobs are able to complete within the desired time window.

    You should schedule backup Pods onto Kubernetes nodes that have enough resources to successfully fulfill your backup schedule. It is also recommended that you do not schedule backup Pods onto Kubernetes nodes that host Couchbase cluster Pods, since your Couchbase cluster would be competing for resources with the backup utility. Refer to the Pod Scheduling section on this page for further details.

  • Backup Pods require access permissions that necessitate the creation of ServiceAccount, Role, and RoleBinding resources. This is covered in the Grant Access Permissions section on this page.

  • You can enable and disable automated backup at any time in the CouchbaseCluster configuration. Disabling automated backup does not delete CouchbaseBackup resources. When you re-enabled automated backup, any applicable CouchbaseBackup resources will continue to be used.

Enable Automated Backup

In order for the Autonomous Operator to manage the automated backup of a cluster, the feature must be enabled in the CouchbaseCluster resource.

apiVersion: couchbase/v2
kind: CouchbaseCluster
spec:
  backup:
    managed: true (1)
    image: couchbase/operator-backup:6.5.0 (2)
    serviceAccountName: couchbase-backup (3)
1 The only required field to enable automated backup is spec.backup.managed.
2 If the spec.backup.image field is left unspecified, then the dynamic admission controller will automatically populate it with the most recent container image that was available when the installed version of the Autonomous Operator was released. The default image for open source Kubernetes comes from Docker Hub, and the default image for OpenShift comes from the Red Hat Container Catalog.

If pulling directly from the the Red Hat Container Catalog, then the path will be something similar to registry.connect.redhat.com/couchbase/operator-backup:6.5.0-4 (you can refer to the catalog for the most recent images). If ImagePullSecrets are required to access the image, they are inherited from the Couchbase Server Pod and can be set explicitly with the spec.servers[].pod.spec.imagePullSecrets field or implicitly with a service account specified with the spec.servers[].pod.spec.serviceAccountName field.

3 If left unspecified, spec.backup.serviceAccountName will default to the value of couchbase-backup. These Kubernetes resources must exist, otherwise backup jobs will not have the required permissions to complete successfully. This will be covered in the next section.

Grant Backup Permissions

Backup Pods need read-only access to Kubernetes resources such as Pods, Cronjobs, and Jobs. They also need write access to Events and the CouchbaseBackup/CouchbaseBackupRestore custom resources.

To grant these permissions, run the following command that creates the required resources.

cat <<EOF | kubectl create -f -
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: null
  name: couchbase-backup
rules:
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - get
  - list
- apiGroups:
  - batch
  resources:
  - cronjobs
  verbs:
  - get
  - list
  - delete
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - list
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
- apiGroups:
  - couchbase.com
  resources:
  - couchbasebackups
  - couchbasebackuprestores
  verbs:
  - get
  - list
  - watch
  - patch
  - update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: null
  name: couchbase-backup
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: couchbase-backup
subjects:
- kind: ServiceAccount
  name: couchbase-backup
  namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: null
  name: couchbase-backup
---
EOF

Without these resources, backup jobs will still run as scheduled, but they will ultimately fail as the pods won’t have the required permissions.

Configure Backups

After automated backup is enabled for the cluster, individual backup policies can be configured using CouchbaseBackup resources. The following is a very simple configuration with only the minimum required fields set.

apiVersion: couchbase.com/v2
kind: CouchbaseBackup
metadata:
  name: my-backup
spec:
  strategy: full_incremental
  full:
    schedule: "0 3 * * 0" (1)
  incremental:
    schedule: "0 3 * * 1-6" (1)
  size: 20Gi (2)
1 On detection of the CouchbaseBackup resource, the Autonomous Operator creates the correct CronJobs for the spec.full.schedule and the spec.incremental.schedule. In this example a full backup would be performed at 3:00AM on a Sunday and then an incremental backup on every other day of the week at 3:00AM.
2 The Autonomous Operator will also create a Persistent Volume Claim to store the backups and logs with the same name that is specified in metadata.name. So if a PVC called "my-backup" does not yet exist in this case, one will be created. This would also happen if for some reason the PVC was deleted.

Once you have created a CouchbaseBackup, we can check that for the expected behaviour by viewing the Operator logs.

  • Kubernetes

  • OpenShift

$ kubectl logs -f deployments/couchbase-operator
$ oc logs deployments/couchbase-operator

You should observe that a Persistent Volume Claim and the correct number of CronJobs have been created along with the CouchbaseBackup itself. The output should be similar to:

{"level":"info","ts":1587134718.3592374,"logger":"cluster","msg":"Backup Cronjob created","cbbackup":"my-backup","cronjob":"my-backup-incremental"}
{"level":"info","ts":1587134718.3727212,"logger":"cluster","msg":"Backup Cronjob created","cbbackup":"my-backup","cronjob":"my-backup-full"}
{"level":"info","ts":1587134718.3722592,"logger":"cluster","msg":"Backup PVC created","cbbackup":"my-backup"}
{"level":"info","ts":1587134718.3727608,"logger":"cluster","msg":"Backup created","cbbackup":"my-backup"}

You can then validate for yourself that these resources exist and check that their details match up with what was defined in the CouchbaseBackup configuration.

  • Kubernetes

  • OpenShift

$ kubectl get cronjob
$ kubectl get pvc
$ oc get cronjob
$ oc get pvc

For example, the output should look like:

NAME                        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
my-backup-full              0 3 * * 0     False     0        <none>          18s
my-backup-incremental       0 3 * * 1-6   False     0        <none>          18s
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
my-backup   Bound    pvc-0c3c717f-e10b-423e-9279-a99edf81019b   5Gi        RWO            standard       14s

Deleting Persistent Volume Claims or Persistent Volumes will delete the backup data and backup log data permanently.

Once the first Job has been spawned by a backup CronJob, the status fields of a CouchbaseBackup resource will update, and you can start monitoring backup progress.

Restoring From a Backup

Restoring from a backup requires that you create a CouchbaseBackupRestore resource.

apiVersion: couchbase.com/v2
kind: CouchbaseBackupRestore
metadata:
  name: my-restore
spec:
  backup: my-backup
  repo: cb-example-2020-02-12T19_00_03
  start:
    int: 1

A CouchbaseBackupRestore resource behaves differently from a CouchbaseBackup resource in that it spawns just a singular, one-time job which attempts to restore the requested backup or range of backups.

In the example above, the CouchbaseBackupRestore resource configuration is restoring the first backup in the repo "cb-example-2020-02-12T19_00_03". The first backup in any repo will be a full backup since the Autonomous Operator performs a full backup of the cluster after the creation of each backup repo.

If you don’t know the name of the backup repo that you want to restore from, you can find the name without having to explore the contents of a Persistent Volume Claim by simply referring to the status object of the existing CouchbaseBackup resource.

You also have the option to restore a range of backups from the latest backup repo.

apiVersion: couchbase.com/v2
kind: CouchbaseBackupRestore
metadata:
  name: my-restore
spec:
  backup: my-backup
  start:
    str: oldest
  end:
    str: latest

In this example above, the Autonomous Operator would restore a range of backups from the latest backup repo. The omission of the spec.repo field means that the Autonomous Operator will look for the most recent backup repo either from the CouchbaseBackup object defined by spec.backup or from the PVC of the same name. If the spec.repo field is unable to be populated by the Autonomous Operator, then the resource will be rejected and no restore job will be created.

Monitor and Manage Backups

It’s important to regularly monitor backup performance to ensure you’re backing up all the required data within your desired time window.

For the simplest overview, run get commands on the CouchbaseBackup resources.

  • Kubernetes

  • OpenShift

$ kubectl get coucbasebackup my-backup -o yaml
$ oc get couchbasebackup my-backup -o yaml

The shortnames cbbackup and cbrestore are available for CouchbaseBackup and CouchbaseBackupRestore respectively. So instead of executing kubectl get couchbasebackup you can instead write kubectl get cbbackup. To find out if any other of your current Kubernetes resources support a shortname, run kubectl api-resources.

The command output should show the given CouchbaseBackup specification and also a status section containing useful information similar to the following output.

status:
  archive: /data/backups
  backups:
  - full: 2020-02-12T15_25_10.712665995Z
    incrementals:
    - 2020-02-12T15_28_11.986341497Z
    - 2020-02-12T15_26_09.875255309Z
    name: cb-example-2020-02-12T15_25_09
  - full: 2020-02-12T15_15_08.443231128Z
    incrementals:
    - 2020-02-12T15_18_12.465643387Z
    - 2020-02-12T15_16_08.037612813Z
    - 2020-02-12T15_24_10.264088039Z
    - 2020-02-12T15_22_11.215924706Z
    name: cb-example-2020-02-12T15_15_07
  capacityUsed: 1.47Gi
  cronjob: cbbackup-full-incr-incremental
  duration: 17s
  job: cbbackup-full-incr-incremental-1587137280
  lastRun: "2020-02-12T15:28:11Z"
  lastSuccess: "2020-02-12T15:28:28Z"
  output: '{"location": "2020-02-12T15_28_11.986341497Z", "duration_seconds": "15.429305462",
    "avg_data_transfer_rate_bytes_sec": 1853, "total_items": 0, "total_items_size_bytes":
    28672, "buckets": {"default": {"mutations_backedup": "0", "mutations_failed":
    "0", "deletions_backedup": "0", "deletions_failed": "0"}}}'
  pod: cbbackup-full-incr-incremental-1581521912-mnng9
  repo: repo
  running: false

Furthermore you can check that the Cronjobs have updated and their status fields look correct.

  • Kubernetes

  • OpenShift

$ kubectl get cronjob
$ oc get cronjob
NAME                        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
my-backup-full              0 3 * * 0     False     0        2d              2d
my-backup-incremental       0 3 * * 1-6   False     0        16h             2d
  • Kubernetes

  • OpenShift

$ kubectl get cronjob my-backup-full -o yaml
$ oc get cronjob my-backup-full -o yaml

And finally we can check that the backup Jobs and their respective pods are there, and there is no more than the set limit specified in spec.failedJobsHistoryLimit and spec.successfulJobsHistorylimit. These default to 5 and 3 respectively.

  • Kubernetes

  • OpenShift

$ kubectl get jobs
$ oc get jobs
NAME                                        COMPLETIONS   DURATION   AGE
cbbackup-full-incr-full-1587138300          1/1           33s        11m
cbbackup-full-incr-incremental-1587138600   1/1           43s        6m8s
  • Kubernetes

  • OpenShift

$ kubectl get pods
$ oc get pods
NAME                                              READY   STATUS      RESTARTS   AGE
cb-example-0000                                   1/1     Running     0          72m
cb-example-0001                                   1/1     Running     0          72m
cb-example-0002                                   1/1     Running     0          72m
cbbackup-full-incr-full-1587138300-92rfp          0/1     Completed   0          11m
cbbackup-full-incr-incremental-1587138600-vzmd2   0/1     Completed   0          6m5s
couchbase-operator-admission-7ccbd85455-6g64p     1/1     Running     0          73m
couchbase-operator-b6496564f-qpqsb                1/1     Running     0          73m

Editing a Backup Configuration

CouchbaseBackup resources cannot be edited, and once created, will need to be deleted and recreated if any edits wish to be made.

Viewing Detailed Logs

If anything goes wrong during a backup job, and backup pods return the Error status, detailed logging is stored on the Persistent Volume Claim for the backup. You can access these logs by creating a Kubernetes job that creates a pod that mounts this PVC and then running kubectl exec to shell into this pod. From there you can access the logs and backup data directly.

The following is an example file that creates such a Kubernetes job. The job creates a pod and mounts the PVC on the path /data as the backup and restore pods themselves would.

kind: Job
apiVersion: batch/v1
metadata:
  name: backup-exec
spec:
  template:
    spec:
      containers:
        - name: couchbase-cluster-backup-create
          image: couchbase/operator-backup:6.5.0
          command: ["sleep"]
          args: ["30000"] (1)
          volumeMounts:
            - name: "couchbase-cluster-backup-volume"
              mountPath: "/data" (2)
      volumes:
        - name: couchbase-cluster-backup-volume
          persistentVolumeClaim:
            claimName: my-backup (3)
      restartPolicy: Never
      serviceAccountName: couchbase-backup
1 The time in seconds to keep the pod running — make sure you give this argument sufficient time so you are not interrupted by the pod completing and any exec connection shutting down.
2 The mountPath may be any valid path, but for purposes of consistency it should be set to /data.
3 The claimName refers to the name of the PVC to be accessed, and also the same name of the CouchbaseBackup resource.

Backups are available to view at /data/backups and their respective logs at /data/scriptlogs. Inside /data/scriptlogs will be three folders, full_only, incremental, and restore. The first two folders correspond to any logs run under the relevant CouchbaseBackup strategy and the last folder is for CouchbaseBackupRestore operations exclusively.

Advanced Backup Management

Backup Scheduling

As backups are performed on separate pods you will need to consider careful node scheduling when it comes to these pods in order to avoid performance issues and noisy neighbour problems. The following YAML example builds upon the initial YAML in Enable Automated Backup.

apiVersion: couchbase/v2
kind: CouchbaseCluster
spec:
  backup:
    managed: true
    image: couchbase/operator-backup:6.5.0
    serviceAccountName: couchbase-backup
    nodeSelector:
      instanceType: large (1)
    resources:
      requests:
        cpu: 100m
        memory: 100Mi (2)
    selector:
      matchLabels:
        cluster: my-cluster (3)
    tolerations: (4)
     - key: app
       operator: Equal
       value: cbbackup
       effect: NoSchedule
1 The nodeSelector field defines which Kubernetes nodes the pods running the automated backup process will be constrained to. In this case we have specified that backup pods will be constrained to running on nodes of instanceType large.
2 If your Kubernetes environment requires it, you can set requests and limits for the pods that run the backup and restore jobs.
3 If you have more than one CouchbaseCluster resource deployed in the same namespace, you’ll need to use resource label selection to ensure that CouchbaseBackup and CouchbaseBackupRestore resources get created on the correct cluster. Like with other Couchbase custom resources, this means specifying a label for RBAC resources which matches the corresponding label selector of the CouchbaseCluster resource that you want the resources aggregated to.
4 Tolerations are applied to pods, and allow (but do not require) the pods to be scheduled onto nodes with matching taints. With taints and tolerations, you can grant backup pods exclusive access to specific nodes. In this example, if we wish to run all backup pods on a dedicated node and isolate them from the rest of the Autonomous Operator pods, we can do this by tainting a node with the key-value of app:cbbackup and defining a matching toleration.

Further reference on all of these fields can be found in the CouchbaseCluster resource configuration. For more overall information please see Couchbase Scheduling and Isolation.

Backup Time Scheduling

When deciding on the Cron schedules for the Full/Incremental strategy, you should take care that the schedules are not defined in a way for a potential clash between Full and Incremental backups. For the example given in this documentation and the cbbackupmgr documentation, this is obviously very unlikely but in a scenario where a backup is not given enough of a time window to complete, this could cause problems. This particularly common in situations where backups have been scheduled too frequently.