Configure Automated Backup and Restore
You can configure the Autonomous Operator to take periodic, automated backups of your Couchbase cluster with the existing functionality provided by
This page details how to backup a Couchbase cluster and restore data in the face of disaster. A conceptual overview of using the Autonomous Operator to backup and restore Couchbase clusters can be found in Couchbase Backup and Restore.
The Autonomous Operator supports two of the backup strategies available in
cbbackupmgr: Full Only and Full/Incremental.
Complete descriptions and explanations of these strategies can be found in the
The examples on this page assume a backup schedule based on the Full/Incremental strategy for both creating backups and performing restores.
Backup and restore jobs rely on a shared persistent volume claim (PVC) when in use.
On Kubernetes platforms you must specify a value for
For further information about setting file system groups see the persistent volume concepts page.
In order for the Autonomous Operator to manage the automated backup of a cluster, the feature must be enabled in the
apiVersion: couchbase/v2 kind: CouchbaseCluster spec: backup: managed: true (1) image: couchbase/operator-backup:1.3.2 (2) serviceAccountName: couchbase-backup (3)
|1||The only required field to enable automated backup is
When running on Red Hat OpenShift, you will want to modify this to use the Red Hat Container Catalog image.
The image will be something similar to
|3||If left unspecified,
Backup Pods need read-only access to Kubernetes resources such as
They also need write access to Events and the
CouchbaseBackupRestore custom resources.
Without these resources, backup jobs will still run as scheduled, but they will ultimately fail as the pods won’t have the required permissions.
You can use the
cao tool to create the resources that grant the required permissions.
The following command creates the necessary resources in the default namespace:
$ bin/cao create backup
To create the resources in a custom namespace, use the
$ bin/cao create backup -n my-namespace
To make your own edits to these resources, you can use
cao generate backup to generate the YAML output instead of creating the resources in Kubernetes immediately.
After automated backup is enabled for the cluster, individual backup policies can be configured using
The following is a very simple configuration with only the minimum required fields set.
apiVersion: couchbase.com/v2 kind: CouchbaseBackup metadata: name: my-backup spec: strategy: full_incremental full: schedule: "0 3 * * 0" (1) incremental: schedule: "0 3 * * 1-6" (1) size: 20Gi (2)
|1||On detection of the
|2||The Autonomous Operator will also create a PersistentVolumeClaim (PVC) to store the backups and logs with the same name that is specified in
Once you have created a
CouchbaseBackup, we can check that for the expected behavior by viewing the Operator logs.
$ kubectl logs -f deployments/couchbase-operator
$ oc logs deployments/couchbase-operator
You should observe that the expected
CouchbaseBackup is created. The output should be similar to:
You can then validate for yourself that these resources exist and check that their details match up with what was defined in the
$ kubectl get cronjob $ kubectl get pvc
$ oc get cronjob $ oc get pvc
For example, the output should look like:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE my-backup-full 0 3 * * 0 False 0 <none> 18s my-backup-incremental 0 3 * * 1-6 False 0 <none> 18s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE my-backup Bound pvc-0c3c717f-e10b-423e-9279-a99edf81019b 5Gi RWO standard 14s
Deleting Persistent Volume Claims or Persistent Volumes will delete the backup data and backup log data permanently.
Once the first Job has been spawned by a backup cron job, the status fields of a
CouchbaseBackup resource will update, and you can start monitoring backup progress.
Restoring from a backup requires that you create a
apiVersion: couchbase.com/v2 kind: CouchbaseBackupRestore metadata: name: my-restore spec: backup: my-backup repo: cb-example-2020-02-12T19_00_03 start: int: 1
CouchbaseBackupRestore resource behaves differently from a
CouchbaseBackup resource in that it spawns just a singular, one-time job which attempts to restore the requested backup or range of backups.
In the example above, the
CouchbaseBackupRestore resource configuration is restoring the first backup in the repository
The first backup in any repository will be a full backup since the Autonomous Operator performs a full backup of the cluster after the creation of each backup repository.
If you don’t know the name of the backup repository that you want to restore from, you can find the name without having to explore the contents of a Persistent Volume Claim by simply referring to the
couchbasebackups.status object of the existing
You also have the option to restore a range of backups from the latest backup repository.
apiVersion: couchbase.com/v2 kind: CouchbaseBackupRestore metadata: name: my-restore spec: backup: my-backup start: str: oldest end: str: latest
In this example above, the Autonomous Operator would restore a range of backups from the latest backup repository.
The omission of the
spec.repo field means that the Autonomous Operator will look for the most recent backup repository.
Backups allow data to be filtered so that you only backup what you need, minimizing storage space and improving performance. Backup options can only be modified on creation of a new backup repository, so when using a full/incremental backup strategy, modifications will be deferred until the next full backup.
Consider the following specification:
apiVersion: couchbase.com/v2 kind: CouchbaseBackup metadata: name: my-backup spec: data: include: (1) - bucket1 - bucket1.scope - bucket2.scope.collection exclude: (2) - bucket3 services: (3) analytics: true bucketConfig: true bucketQuery: true clusterAnalytics: true clusterQuery: true data: true eventing: true ftsAliases: true ftsIndexes: true gsIndexes: true views: true threads: 16 (4)
Further details can be found on the
CouchbaseBackup resource reference.
Additional options from
cbbackupmgr restore may also be specified.
apiVersion: couchbase.com/v2 kind: CouchbaseBackupRestore metadata: name: my-restore spec: backup: my-backup repo: cb-example-2020-02-12T19_00_03 start: int: 1 data: include: (1) - default - peanutbutter - princess.caroline exclude: - horseman map: (2) - source: default target: new-default - source: peanutbutter target: pickles filterKeys: "^cat.*" (3) services: (4) analytics: true bucketConfig: false bucketQuery: true clusterAnalytics: true clusterQuery: true data: true eventing: true ftAlias: true ftIndex: true gsiIndex: true views: true threads: 1 (5)
This field requires a pair of fields named
In order to skip restoring a particular service, simply set the service to
It’s important to regularly monitor backup performance to ensure you’re backing up all the required data within your desired time window.
For the simplest overview, run
get commands on the
$ kubectl get couchbasebackup my-backup -o yaml
$ oc get couchbasebackup my-backup -o yaml
The short names
The command output should show the given
CouchbaseBackup specification and also a
couchbasebackups.status section containing useful information similar to the following output.
status: archive: /data/backups backups: - full: 2020-02-12T15_25_10.712665995Z incrementals: - 2020-02-12T15_28_11.986341497Z - 2020-02-12T15_26_09.875255309Z name: cb-example-2020-02-12T15_25_09 - full: 2020-02-12T15_15_08.443231128Z incrementals: - 2020-02-12T15_18_12.465643387Z - 2020-02-12T15_16_08.037612813Z - 2020-02-12T15_24_10.264088039Z - 2020-02-12T15_22_11.215924706Z name: cb-example-2020-02-12T15_15_07 capacityUsed: 1.47Gi duration: 17s job: cbbackup-full-incr-incremental-1587137280 lastRun: "2020-02-12T15:28:11Z" lastSuccess: "2020-02-12T15:28:28Z" repo: repo running: false
Furthermore you can check that the cron jobs have updated and their status fields look correct.
$ kubectl get cronjob
$ oc get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE my-backup-full 0 3 * * 0 False 0 2d 2d my-backup-incremental 0 3 * * 1-6 False 0 16h 2d
$ kubectl get cronjob my-backup-full -o yaml
$ oc get cronjob my-backup-full -o yaml
And finally we can check that the backup Jobs and their respective pods are there, and there is no more than the set limit specified in
These default to 5 and 3 respectively.
$ kubectl get jobs
$ oc get jobs
NAME COMPLETIONS DURATION AGE cbbackup-full-incr-full-1587138300 1/1 33s 11m cbbackup-full-incr-incremental-1587138600 1/1 43s 6m8s
$ kubectl get pods
$ oc get pods
NAME READY STATUS RESTARTS AGE cb-example-0000 1/1 Running 0 72m cb-example-0001 1/1 Running 0 72m cb-example-0002 1/1 Running 0 72m cbbackup-full-incr-full-1587138300-92rfp 0/1 Completed 0 11m cbbackup-full-incr-incremental-1587138600-vzmd2 0/1 Completed 0 6m5s couchbase-operator-admission-7ccbd85455-6g64p 1/1 Running 0 73m couchbase-operator-b6496564f-qpqsb 1/1 Running 0 73m
Only the preexisting schedules and volume size of a
CouchbaseBackup resource can be edited.
Attempts to edit things like the name or strategy will fail.
A Backup PVC that is referenced by an existing
CouchbaseBackup resource can be resized manually by the user, or automatically by the Autonomous Operator.
A Backup PVC can only be resized if its associated StorageClass is configured to allow volume expansion.
This means the default StorageClass in your Kubernetes environment should have
Ensure that the StorageClass is configured to allow volume expansion before creating the
To perform a manual resize, simply edit
couchbasebackups.spec.size and change it to a value that is larger than the current size.
The resize will then be performed with the next scheduled backup job.
|The underlying StorageClass must be configured to allow volume expansion in order to modify the size of the Backup PVC (as stated previously). Changes to the volume size may go through,but the Autonomous Operator will error until the change is reverted.|
CouchbaseBackup resource can be modified to allow the Autonomous Operator to automatically resize the Backup PVC once a specific percentage of space is left.
apiVersion: couchbase.com/v2 kind: CouchbaseBackup metadata: name: my-backup spec: strategy: full_incremental full: schedule: "0 3 * * 0" incremental: schedule: "0 3 * * 1-6" size: 20Gi (1) autoscaling: thresholdPercent: 20 (2) incrementPercent: 20 (3) limit: 100Gi (4)
|The underlying StorageClass must be configured to allow volume expansion in order to modify the size of the Backup PVC (as stated previously). Changes to the volume size may go through, but the Autonomous Operator will error until the change is reverted.|
CouchbaseBackup resource is deleted, any associated
Cronjob(s) are deleted.
Jobs and their respective
Pods from those
Cronjobs are orphaned; the number of these resources that are left over is determined by the limits
If a backup job is running whilst the parent
CouchbaseBackup is deleted then the job will continue until completion or eventual failure.
If anything goes wrong during a backup job, and backup pods return the
Error status, detailed logging is stored on the Persistent Volume Claim for the backup.
You can access these logs by creating a Kubernetes job that creates a pod that mounts this PVC and then running
kubectl exec to shell into this pod.
From there you can access the logs and backup data directly.
The following is an example file that creates such a Kubernetes job.
The job creates a pod and mounts the PVC on the path
/data as the backup and restore pods themselves would.
kind: Job apiVersion: batch/v1 metadata: name: backup-exec spec: template: spec: containers: - name: couchbase-cluster-backup-create image: couchbase/operator-backup:1.3.2 command: ["sleep"] args: ["30000"] (1) volumeMounts: - name: "couchbase-cluster-backup-volume" mountPath: "/data" (2) volumes: - name: couchbase-cluster-backup-volume persistentVolumeClaim: claimName: my-backup (3) restartPolicy: Never serviceAccountName: couchbase-backup
|1||The time in seconds to keep the pod running — make sure you give this argument sufficient time so you are not interrupted by the pod completing and any
Backups are available to view at
/data/backups and their respective logs at
/data/scriptlogs will be three folders,
The first two folders correspond to any logs run under the relevant
CouchbaseBackup strategy and the last folder is for
CouchbaseBackupRestore operations exclusively.
As backups are performed on separate pods you will need to consider careful node scheduling when it comes to these pods in order to avoid performance issues and noisy neighbor problems. The following YAML example builds upon the initial YAML in Enable Automated Backup.
apiVersion: couchbase/v2 kind: CouchbaseCluster spec: backup: managed: true image: couchbase/operator-backup:1.3.2 serviceAccountName: couchbase-backup nodeSelector: instanceType: large (1) resources: requests: cpu: 100m memory: 100Mi (2) selector: matchLabels: cluster: my-cluster (3) tolerations: (4) - key: app operator: Equal value: cbbackup effect: NoSchedule
|2||If your Kubernetes environment requires it, you can set requests and limits for the pods that run the backup and restore jobs.|
|3||If you have more than one
|4||Tolerations are applied to pods, and allow (but do not require) the pods to be scheduled onto nodes with matching taints.
With taints and tolerations, you can grant backup pods exclusive access to specific nodes.
In this example, if we wish to run all backup pods on a dedicated node and isolate them from the rest of the Autonomous Operator pods, we can do this by tainting a node with the key-value of
When deciding on the Cron schedules for the Full/Incremental strategy, you should take care that the schedules are not defined in a way for a potential clash between Full and Incremental backups.
For the example given in this documentation and the
cbbackupmgr documentation, this is obviously very unlikely but in a scenario where a backup is not given enough of a time window to complete, this could cause problems.
This particularly common in situations where backups have been scheduled too frequently.
Scheduling of backup and restore jobs is exactly the same as the mechanism used for Couchbase Server pods. The affinity and anti-affinity mechanisms are described in Couchbase Scheduling and Isolation.
If you are running a Couchbase cluster version 6.6.x or higher and using the backup image
operator-backup:1.3.2 or higher, the ability to backup and restore to and from AWS, Azure, and GCP is available.
There are two ways to configure access to a cloud store.
Manually through providing credentials via a secret.
Automatically by using instance metadata API to grant role access.
When manually providing credentials, a separate
Secret must be created that holds the credentials required for the cloud store. More information on the individual fields can be found under cbbackupmgr.
For AWS S3 three fields are expected in the secret.
Region name, access key ID, and secret access key under the keys
apiVersion: v1 kind: Secret metadata: name: s3-secret type: Opaque data: region: <aws-region> access-key-id: <access key id> secret-access-key: <secret access key>
For Azure Blob Storage the account name and the account key are expected under the keys
apiVersion: v1 kind: Secret metadata: name: gcp-secret type: Opaque data: access-key-id: <account name> secret-access-key: <account key>
For Google Cloud Cloud Storage a client id, client secret and refresh token are expected under the keys
apiVersion: v1 kind: Secret metadata: name: s3-secret type: Opaque data: access-key-id: <client id> secret-access-key: <client secret> refresh-token: <refresh-token>
CouchbaseBackupRestore object, you will need to reference this secret so the Operator knows where to extract the credentials from.
apiVersion: couchbase/v2 kind: CouchbaseBackup spec: ... objectStore: secret: cloud-secret uri: [az|s3|gcp]://example
To allow backup to automatically use the instance metadata API for authentication, enable the
couchbasebackups.spec.objectStore.useIAM parameter. By default this is disabled.
When using Azure and GCP this is all that’s required, for AWS a secret with the region key set must be provided.
apiVersion: couchbase/v2 kind: CouchbaseBackup spec: ... objectStore: useIAM: true secret: s3-region-secret uri: s3://example-bucket
When using AWS, if you have attached the IAM Role to an EKS node directly then this is sufficient configuration.
If you have setup IAM roles for service accounts, the role
ARN annotation must be applied to the backup service account either manually or when running
cao create backup --iam-role-arn arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE_NAME>
Applications running on a GKE cluster will attempt to use the default Compute Engine service account. To provide more granular control use Workload Identity. For how to setup Workload Identity with GCP follow Use Workload Identity
By default the service account to annotate will be
The cloud store that we wish to hold backups needs to be specified in the desired
Note that the prefix must be either
az://, otherwise the Admission Controller will not allow the creation of the CRD.
apiVersion: couchbase.com/v2 kind: CouchbaseBackup metadata: name: my-backup spec: strategy: full_incremental full: schedule: "0 3 * * 0" incremental: schedule: "0 3 * * 1-6" size: 20Gi objectStore: uri: s3://my-backup-bucket
apiVersion: couchbase.com/v2 kind: CouchbaseBackupRestore metadata: name: my-restore spec: backup: my-backup start: str: oldest end: str: latest objectStore: uri: s3://my-backup-bucket
Please note that operations involving remote cloud stores take more time to complete in comparison to regular backup to PVCs so please bear this in mind when configuring your automated backup schedules.
Backing up to cloud store still requires a local PVC with enough space for a
If you are using Couchbase Operator version 2.4.0 or higher, and Couchbase Operator Backup 1.3.2 version or higher, the ability to backup/restore using a compatible cloud store is available. Please see Compatible Object Stores for limitations.
Users wishing to use a compatible store should set
couchbasebackups.spec.objectStore.endpoint.url to the host/address of the object store.
couchbasebackups.spec.objectStore.endpoint.secret can be set to the name of the secret containing the CA certificate the compatible object endpoint is using.
For example: to use Minio, a S3 compatible API.
apiVersion: couchbase/v2 kind: CouchbaseBackup spec: strategy: full_incremental full: schedule: "0 3 * * 0" incremental: schedule: "0 3 * * 1-6" size: 20Gi objectStore: secret: s3-secret uri: s3://example-bucket endpoint: url: https://minio.minio (1) secret: my-tls-secret (2) useVirtualPath: false (3)
|1||The only required parameter for using a custom object endpoint.|
|2||Only required if the custom object store is using a custom CA certificate for communication|
|3||Only required if the custom object store uses virtual-hosted style addressing instead of path-style addressing. e.g
my-tls-secret contains the certificate of the endpoint, similarly to below.
apiVersion: v1 kind: Secret metadata: name: my-tls-secret type: kubernetes.io/tls data: tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1J...
If you are using Couchbase Operator version 2.4.0 or higher, and Couchbase Operator Backup 1.3.2 version or higher, the ability to backup/restore using a generic ephemeral volumes volume is available.
This can only be used when backing up or restoring from a remote cloud store and may be useful for high availability setups.
To enable ephemeral staging volumes for backup set
couchbasebackups.spec.ephemeralVolume to true, defaults to false.
couchbasebackups.spec.size will apply to the ephemeral PVC.
When enabled, the backup PVC will share it’s lifecycle with the backup/restore pod, and will not be removed until the pod is removed.
It may be useful to tweak
apiVersion: couchbase/v2 kind: CouchbaseBackup spec: strategy: full_incremental full: schedule: "0 3 * * 0" incremental: schedule: "0 3 * * 1-6" size: 20Gi objectStore: secret: s3-secret uri: s3://example-bucket ephemeralVolume: true
When restoring, if a backup PVC is not found, an ephemeral volume will be used instead. To change either the size of this volume or the storage class use
apiVersion: couchbase.com/v2 kind: CouchbaseBackupRestore metadata: name: my-restore spec: backup: my-backup start: str: oldest end: str: latest objectStore: uri: s3://my-backup-bucket stagingVolume: size: 20Gi storageClassName: "default"