Configure Automated Backup and Restore
You can configure the Autonomous Operator to take periodic, automated backups of your Couchbase cluster with the existing functionality provided by cbbackupmgr
.
Overview
This page details how to backup a Couchbase cluster and restore data in the face of disaster. A conceptual overview of using the Autonomous Operator to backup and restore Couchbase clusters can be found in Couchbase Backup and Restore.
The Autonomous Operator supports two of the backup strategies available in cbbackupmgr
: Full Only and Full/Incremental.
Complete descriptions and explanations of these strategies can be found in the cbbackupmgr
documentation.
The examples on this page assume a backup schedule based on the Full/Incremental strategy for both creating backups and performing restores.
Enable Automated Backup
In order for the Autonomous Operator to manage the automated backup of a cluster, the feature must be enabled in the CouchbaseCluster
resource.
apiVersion: couchbase/v2
kind: CouchbaseCluster
spec:
backup:
managed: true (1)
image: couchbase/operator-backup:1.1.0 (2)
serviceAccountName: couchbase-backup (3)
1 | The only required field to enable automated backup is couchbaseclusters.spec.backup.managed . |
2 | If the couchbaseclusters.spec.backup.image field is left unspecified, then the dynamic admission controller will automatically populate it with the most recent container image that was available when the installed version of the Autonomous Operator was released.
The default image for open source Kubernetes comes from Docker Hub, and the default image for OpenShift comes from the Red Hat Container Catalog.
If pulling directly from the the Red Hat Container Catalog, then the path will be something similar to |
3 | If left unspecified, couchbaseclusters.spec.backup.serviceAccountName will default to the value of couchbase-backup .
These Kubernetes resources must exist, otherwise backup jobs will not have the required permissions to complete successfully.
This will be covered in the next section. |
Grant Backup Permissions
Backup Pods need read-only access to Kubernetes resources such as Pods
, CronJobs
, and Jobs
.
They also need write access to Events and the CouchbaseBackup
/CouchbaseBackupRestore
custom resources.
Without these resources, backup jobs will still run as scheduled, but they will ultimately fail as the pods won’t have the required permissions.
You can use the cbopcfg
tool to create the resources that grant the required permissions.
The following command creates the necessary resources in the default namespace:
$ bin/cbopcfg create backup
To create the resources in a custom namespace, use the -n
flag:
$ bin/cbopcfg create backup -n my-namespace
To make your own edits to these resources, you can use cbopcfg generate backup
to generate the YAML output instead of creating the resources in Kubernetes immediately.
Configure Backups
After automated backup is enabled for the cluster, individual backup policies can be configured using CouchbaseBackup
resources.
The following is a very simple configuration with only the minimum required fields set.
apiVersion: couchbase.com/v2
kind: CouchbaseBackup
metadata:
name: my-backup
spec:
strategy: full_incremental
full:
schedule: "0 3 * * 0" (1)
incremental:
schedule: "0 3 * * 1-6" (1)
size: 20Gi (2)
1 | On detection of the CouchbaseBackup resource, the Autonomous Operator creates the correct cron jobs for the spec.full.schedule and the spec.incremental.schedule .
In this example a full backup would be performed at 3:00AM on a Sunday and then an incremental backup on every other day of the week at 3:00AM. |
2 | The Autonomous Operator will also create a PersistentVolumeClaim (PVC) to store the backups and logs with the same name that is specified in metadata.name .
So if a PVC called "my-backup" does not yet exist in this case, one will be created.
This would also happen if for some reason the PVC was deleted. |
Once you have created a CouchbaseBackup
, we can check that for the expected behavior by viewing the Operator logs.
$ kubectl logs -f deployments/couchbase-operator
$ oc logs deployments/couchbase-operator
You should observe that a Persistent Volume Claim and the correct number of cron jobs have been created along with the CouchbaseBackup
itself. The output should be similar to:
{"level":"info","ts":1587134718.3592374,"logger":"cluster","msg":"Backup Cronjob created","cbbackup":"my-backup","cronjob":"my-backup-incremental"}
{"level":"info","ts":1587134718.3727212,"logger":"cluster","msg":"Backup Cronjob created","cbbackup":"my-backup","cronjob":"my-backup-full"}
{"level":"info","ts":1587134718.3722592,"logger":"cluster","msg":"Backup PVC created","cbbackup":"my-backup"}
{"level":"info","ts":1587134718.3727608,"logger":"cluster","msg":"Backup created","cbbackup":"my-backup"}
You can then validate for yourself that these resources exist and check that their details match up with what was defined in the CouchbaseBackup
configuration.
For example, the output should look like:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
my-backup-full 0 3 * * 0 False 0 <none> 18s
my-backup-incremental 0 3 * * 1-6 False 0 <none> 18s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-backup Bound pvc-0c3c717f-e10b-423e-9279-a99edf81019b 5Gi RWO standard 14s
Deleting Persistent Volume Claims or Persistent Volumes will delete the backup data and backup log data permanently. |
Once the first Job has been spawned by a backup cron job, the status fields of a CouchbaseBackup
resource will update, and you can start monitoring backup progress.
Restoring From a Backup
Restoring from a backup requires that you create a CouchbaseBackupRestore
resource.
apiVersion: couchbase.com/v2
kind: CouchbaseBackupRestore
metadata:
name: my-restore
spec:
backup: my-backup
repo: cb-example-2020-02-12T19_00_03
start:
int: 1
A CouchbaseBackupRestore
resource behaves differently from a CouchbaseBackup
resource in that it spawns just a singular, one-time job which attempts to restore the requested backup or range of backups.
In the example above, the CouchbaseBackupRestore
resource configuration is restoring the first backup in the repository "cb-example-2020-02-12T19_00_03"
.
The first backup in any repository will be a full backup since the Autonomous Operator performs a full backup of the cluster after the creation of each backup repository.
If you don’t know the name of the backup repository that you want to restore from, you can find the name without having to explore the contents of a Persistent Volume Claim by simply referring to the couchbasebackups.status
object of the existing CouchbaseBackup
resource.
You also have the option to restore a range of backups from the latest backup repository.
apiVersion: couchbase.com/v2
kind: CouchbaseBackupRestore
metadata:
name: my-restore
spec:
backup: my-backup
start:
str: oldest
end:
str: latest
In this example above, the Autonomous Operator would restore a range of backups from the latest backup repository.
The omission of the spec.repo
field means that the Autonomous Operator will look for the most recent backup repository either from the CouchbaseBackup
object defined by spec.backup
or from the PVC of the same name.
If the spec.repo
field is unable to be populated by the Autonomous Operator, then the resource will be rejected and no restore job will be created.
Any |
When using the |
Additional Restore Options
Additional options from cbbackupmgr restore
may also be specified.
apiVersion: couchbase.com/v2
kind: CouchbaseBackupRestore
metadata:
name: my-restore
spec:
backup: my-backup
repo: cb-example-2020-02-12T19_00_03
start:
int: 1
buckets:
include: (1)
- default
- peanutbutter
exclude: (1)
- horseman
bucketMap: (2)
- source: default
destination: new-default
- source: peanutbutter
destination: pickles
services: (3)
analytics: true
bucketConfig: false
data: true
eventing: true
ftAlias: true
ftIndex: true
gsiIndex: true
views: true
threads: 1 (4)
1 | couchbasebackuprestores.spec.buckets.include : Explicitly restore only the specified list of buckets.
|
||
2 | couchbasebackuprestores.spec.buckets.bucketMap : Specified when you want to restore a backup to a destination bucket that has a different name than the bucket that was originally backed up.
This field requires a pair of fields named
Multiple |
||
3 | couchbasebackuprestores.spec.services : By default, all data and configuration settings, for all services, are restored to the Couchbase cluster, apart from bucket configuration settings.
In order to skip restoring a particular service, simply set the service to |
||
4 | couchbasebackuprestores.spec.threads : An integer that specifies the number of concurrent cbbackupmgr clients to use when restoring data.
Refer to the cbbackupmgr restore --threads documentation for more information. |
Monitor and Manage Backups
It’s important to regularly monitor backup performance to ensure you’re backing up all the required data within your desired time window.
For the simplest overview, run get
commands on the CouchbaseBackup
resources.
$ kubectl get couchbasebackup my-backup -o yaml
$ oc get couchbasebackup my-backup -o yaml
The short names |
The command output should show the given CouchbaseBackup
specification and also a couchbasebackups.status
section containing useful information similar to the following output.
status:
archive: /data/backups
backups:
- full: 2020-02-12T15_25_10.712665995Z
incrementals:
- 2020-02-12T15_28_11.986341497Z
- 2020-02-12T15_26_09.875255309Z
name: cb-example-2020-02-12T15_25_09
- full: 2020-02-12T15_15_08.443231128Z
incrementals:
- 2020-02-12T15_18_12.465643387Z
- 2020-02-12T15_16_08.037612813Z
- 2020-02-12T15_24_10.264088039Z
- 2020-02-12T15_22_11.215924706Z
name: cb-example-2020-02-12T15_15_07
capacityUsed: 1.47Gi
cronjob: cbbackup-full-incr-incremental
duration: 17s
job: cbbackup-full-incr-incremental-1587137280
lastRun: "2020-02-12T15:28:11Z"
lastSuccess: "2020-02-12T15:28:28Z"
output: '{"location": "2020-02-12T15_28_11.986341497Z", "duration_seconds": "15.429305462",
"avg_data_transfer_rate_bytes_sec": 1853, "total_items": 0, "total_items_size_bytes":
28672, "buckets": {"default": {"mutations_backedup": "0", "mutations_failed":
"0", "deletions_backedup": "0", "deletions_failed": "0"}}}'
pod: cbbackup-full-incr-incremental-1581521912-mnng9
repo: repo
running: false
Furthermore you can check that the cron jobs have updated and their status fields look correct.
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
my-backup-full 0 3 * * 0 False 0 2d 2d
my-backup-incremental 0 3 * * 1-6 False 0 16h 2d
$ kubectl get cronjob my-backup-full -o yaml
$ oc get cronjob my-backup-full -o yaml
And finally we can check that the backup Jobs and their respective pods are there, and there is no more than the set limit specified in couchbasebackups.spec.failedJobsHistoryLimit
and couchbasebackups.spec.successfulJobsHistoryLimit
.
These default to 5 and 3 respectively.
NAME COMPLETIONS DURATION AGE
cbbackup-full-incr-full-1587138300 1/1 33s 11m
cbbackup-full-incr-incremental-1587138600 1/1 43s 6m8s
NAME READY STATUS RESTARTS AGE
cb-example-0000 1/1 Running 0 72m
cb-example-0001 1/1 Running 0 72m
cb-example-0002 1/1 Running 0 72m
cbbackup-full-incr-full-1587138300-92rfp 0/1 Completed 0 11m
cbbackup-full-incr-incremental-1587138600-vzmd2 0/1 Completed 0 6m5s
couchbase-operator-admission-7ccbd85455-6g64p 1/1 Running 0 73m
couchbase-operator-b6496564f-qpqsb 1/1 Running 0 73m
Editing a Backup Configuration
Only the preexisting schedules and volume size of a CouchbaseBackup
resource can be edited.
Attempts to edit things like the name or strategy will fail.
Online Backup Volume Resizing
A Backup PVC that is referenced by an existing CouchbaseBackup
resource can be resized manually by the user, or automatically by the Autonomous Operator.
A Backup PVC can only be resized if its associated StorageClass is configured to allow volume expansion.
This means the default StorageClass in your Kubernetes environment should have Ensure that the StorageClass is configured to allow volume expansion before creating the |
Manual Backup Volume Resizing
To perform a manual resize, simply edit couchbasebackups.spec.size
and change it to a value that is larger than the current size.
The resize will then be performed with the next scheduled backup job.
The underlying StorageClass must be configured to allow volume expansion in order to modify the size of the Backup PVC (as stated previously). Changes to the volume size may go through,but the Autonomous Operator will error until the change is reverted. |
Automated Backup Volume Resizing
A CouchbaseBackup
resource can be modified to allow the Autonomous Operator to automatically resize the Backup PVC once a specific percentage of space is left.
apiVersion: couchbase.com/v2
kind: CouchbaseBackup
metadata:
name: my-backup
spec:
strategy: full_incremental
full:
schedule: "0 3 * * 0"
incremental:
schedule: "0 3 * * 1-6"
size: 20Gi (1)
autoscaling:
thresholdPercent: 20 (2)
incrementPercent: 20 (3)
limit: 100Gi (4)
1 | couchbasebackups.spec.size is set to the initial size when the CouchbaseBackup resource is created.
Here, the size is set to 20Gi (the default). |
2 | couchbasebackups.spec.autoScaling.thresholdPercent represents the percentage of free space remaining on the volume at which point a volume expansion will be triggered.
Here, the threshold is set to 20 (the default).
In this case, if the volume is currently 80 GiB, a volume expansion will be triggered once the used capacity reaches 64 GiB and free space is less than 16 GiB. |
3 | couchbasebackups.spec.autoScaling.incrementPercent controls how much the volume is increased each time the threshold is exceeded.
Here, the increment is set to 20 (the default).
In this case, if the volume is currently 80 GiB when the threshold is reached, the volume will be expanded to 100 GiB. |
4 | couchbasebackups.spec.autoScaling.limit imposes a hard limit on the size of the Backup PVC, at which point the volume size will no longer be incremented.
When this field is not defined, no bounds are imposed. |
The underlying StorageClass must be configured to allow volume expansion in order to modify the size of the Backup PVC (as stated previously). Changes to the volume size may go through, but the Autonomous Operator will error until the change is reverted. |
Deleting a Backup Configuration
When a CouchbaseBackup
resource is deleted, any associated Cronjob
(s) are deleted.
Jobs
and their respective Pods
from those Cronjobs
are orphaned; the number of these resources that are left over is determined by the limits spec.successfulJobsHistoryLimit
and spec.failedJobsHistoryLimit
.
If a backup job is running whilst the parent CouchbaseBackup
is deleted then the job will continue until completion or eventual failure.
Viewing Detailed Logs
If anything goes wrong during a backup job, and backup pods return the Error
status, detailed logging is stored on the Persistent Volume Claim for the backup.
You can access these logs by creating a Kubernetes job that creates a pod that mounts this PVC and then running kubectl exec
to shell into this pod.
From there you can access the logs and backup data directly.
The following is an example file that creates such a Kubernetes job.
The job creates a pod and mounts the PVC on the path /data
as the backup and restore pods themselves would.
kind: Job
apiVersion: batch/v1
metadata:
name: backup-exec
spec:
template:
spec:
containers:
- name: couchbase-cluster-backup-create
image: couchbase/operator-backup:1.1.0
command: ["sleep"]
args: ["30000"] (1)
volumeMounts:
- name: "couchbase-cluster-backup-volume"
mountPath: "/data" (2)
volumes:
- name: couchbase-cluster-backup-volume
persistentVolumeClaim:
claimName: my-backup (3)
restartPolicy: Never
serviceAccountName: couchbase-backup
1 | The time in seconds to keep the pod running — make sure you give this argument sufficient time so you are not interrupted by the pod completing and any exec connection shutting down. |
2 | The mountPath may be any valid path, but for purposes of consistency it should be set to /data . |
3 | The claimName refers to the name of the PVC to be accessed, and also the same name of the CouchbaseBackup resource. |
Backups are available to view at /data/backups
and their respective logs at /data/scriptlogs
.
Inside /data/scriptlogs
will be three folders, full_only
, incremental
, and restore
.
The first two folders correspond to any logs run under the relevant CouchbaseBackup
strategy and the last folder is for CouchbaseBackupRestore
operations exclusively.
Advanced Backup Management
Backup Scheduling
As backups are performed on separate pods you will need to consider careful node scheduling when it comes to these pods in order to avoid performance issues and noisy neighbor problems. The following YAML example builds upon the initial YAML in Enable Automated Backup.
apiVersion: couchbase/v2
kind: CouchbaseCluster
spec:
backup:
managed: true
image: couchbase/operator-backup:1.1.0
serviceAccountName: couchbase-backup
nodeSelector:
instanceType: large (1)
resources:
requests:
cpu: 100m
memory: 100Mi (2)
selector:
matchLabels:
cluster: my-cluster (3)
tolerations: (4)
- key: app
operator: Equal
value: cbbackup
effect: NoSchedule
1 | The nodeSelector field defines which Kubernetes nodes the pods running the automated backup process will be constrained to.
In this case we have specified that backup pods will be constrained to running on nodes of instanceType large. |
2 | If your Kubernetes environment requires it, you can set requests and limits for the pods that run the backup and restore jobs. |
3 | If you have more than one CouchbaseCluster resource deployed in the same namespace, you’ll need to use resource label selection to ensure that CouchbaseBackup and CouchbaseBackupRestore resources get created on the correct cluster.
Like with other Couchbase custom resources, this means specifying a label for RBAC resources which matches the corresponding label selector of the CouchbaseCluster resource that you want the resources aggregated to. |
4 | Tolerations are applied to pods, and allow (but do not require) the pods to be scheduled onto nodes with matching taints.
With taints and tolerations, you can grant backup pods exclusive access to specific nodes.
In this example, if we wish to run all backup pods on a dedicated node and isolate them from the rest of the Autonomous Operator pods, we can do this by tainting a node with the key-value of app:cbbackup and defining a matching toleration. |
Further reference on all of these fields can be found in the couchbaseclusters.spec.backup
resource configuration.
For more overall information please see Couchbase Scheduling and Isolation.
Backup Time Scheduling
When deciding on the Cron schedules for the Full/Incremental strategy, you should take care that the schedules are not defined in a way for a potential clash between Full and Incremental backups.
For the example given in this documentation and the cbbackupmgr
documentation, this is obviously very unlikely but in a scenario where a backup is not given enough of a time window to complete, this could cause problems.
This particularly common in situations where backups have been scheduled too frequently.
Pod Scheduling
Scheduling of backup and restore jobs is exactly the same as the mechanism used for Couchbase Server pods. The affinity and anti-affinity mechanisms are described in Couchbase Scheduling and Isolation.
Backup and restore job affinity can be set, per CouchbaseCluster
, with the couchbaseclusters.spec.backup.nodeSelector
attribute, and toleration of anti-affinity rules can be set with the couchbaseclusters.spec.backup.tolerations
attribute.
Backup and Restore to S3
If you are running a Couchbase cluster version 6.6.x or higher and using the backup image operator-backup:1.1.0
or higher, the ability to backup and restore to and from AWS S3 buckets is available.
This option requires some extra configuration related to your AWS credentials and the name of the S3 bucket to perform operations on.
First of all, a separate Secret
must be created that holds the AWS region name, access key ID, and secret access key.
apiVersion: v1
kind: Secret
metadata:
name: s3-secret
type: Opaque
data:
region: aGV5IG1h...
access-key-id: bXVzaHJvb20ga2l...
secret-access-key: cm9zY29lJ3Mgd2V0IHN...
In your CouchbaseCluster
object, you will need to reference this secret so the Operator knows where to extract the credentials from.
apiVersion: couchbase/v2
kind: CouchbaseCluster
spec:
backup:
managed: true
image: couchbase/operator-backup:1.1.0
serviceAccountName: couchbase-backup
s3Secret: s3-secret
And finally, the S3 bucket that we wish to hold backups needs to be specified in the desired CouchbaseBackup
and CouchbaseBackupRestore
CRDs.
Note that the s3://
prefix is required, otherwise the Admission Controller will not allow the creation of the CRD.
apiVersion: couchbase.com/v2
kind: CouchbaseBackup
metadata:
name: my-backup
spec:
strategy: full_incremental
full:
schedule: "0 3 * * 0"
incremental:
schedule: "0 3 * * 1-6"
size: 20Gi
s3bucket: s3://my-backup-bucket
apiVersion: couchbase.com/v2
kind: CouchbaseBackupRestore
metadata:
name: my-restore
spec:
backup: my-backup
start:
str: oldest
end:
str: latest
s3bucket: s3://my-backup-bucket
Please note that operations involving S3 take more time to complete in comparison to regular backup to PVCs so please bear this in mind when configuring your automated backup schedules.
Backing up to S3 still requires a local PVC with enough space for a The |