cbbackupmgr tutorial

A quick guide to using cbbackupmgr

DESCRIPTION

A tutorial that goes gives examples of how to use all of the commands in cbbackupmgr effectively.

TUTORIAL

In this tutorial we will show how to take backups and restore data using cbbackupmgr. This tutorial uses a cluster that contains both the travel-sample and beer-sample buckets installed and requires modifying some of the documents in the travel-sample bucket. To make it easier to set up a cluster and edit/get documents the following scripts are provided at http://github.com/couchbaselabs/backup-tutorial. You can then find scripts corresponding to your version of Couchbase. We will reference other scripts in this github repository later in the tutorial so it is recommended that you download these scripts. The only requirement for running the scripts is that you have curl installed. To automatically setup the cluster in the appropriate state required for this tutorial download and install Couchbase and then run the 01-initialize.sh script. If you do not want to use this script then you can navigate through the Couchbase setup process and initialize the cluster will all available services and install the travel-sample and beer-sample sample data bucket.

Using this cluster we will show how the incremental/merge approach taken by cbbackupmgr reduces time and overhead on your cluster.

Configuring a Backup

Before getting started with cbbackupmgr we must first decide the directory where to store all of our backups. This directory is referred to as the backup archive. The backup archive contains one or more backup repositories. These backup repositories are where your backups will be contained. The easiest way to think of a backup repository is that it corresponds directly to a single cluster that you want to back up. The backup repository also contains a configuration for how to back that cluster up. A backup repository is created by using the config subcommand. In this tutorial we will use a backup archive located at /data/backup. The backup archive is automatically created if the directory specified is empty. Below is an example of how to create a backup repository called "cluster" which will backup all data and index definitions from all buckets in the target cluster.

$ cbbackupmgr config -a /data/backup -r cluster
Backup repository `cluster` created successfully in archive `/data/backup`

One of the most important aspects of backup repository creation is that we can configure that backup repository in many different ways to change the way backups in each backup repository are taken. Let’s say we want a separate backup of only the index definitions in the travel-sample bucket. To do this we can create a separate backup repository called "single".

$ cbbackupmgr config -a /data/backup -r single \
 --include-buckets travel-sample --disable-data
Backup repository `single` created successfully in archive `/data/backup`

The config subcommand provides many options in order to customize how you backup your data. See the cbbackupmgr-config page for more information about what options are available and how they are used.

Backing up a Cluster

Now that we have created some backup repositories we should take a look at our backup archive to see what it looks like. The easiest way to do this is to use the list subcommand. This subcommand is used to examine a backup archive and gives information about how much data is stored in it. To see the entire backup archive we can run the command below.

$ cbbackupmgr list -a /data/backup
Size      Items          Name
0B        -              /
0B        -              + cluster
0B        -              + single

The list subcommand gives us a directory print out of all of the backup repositories and backups in out backup archive. Since there are no backups yet we can just see our archives list in the output of this command. There is also information about how much disk space each folder and file contains and, if applicable, how many items are backed up in those folders/files. More information about the list subcommand can be found in the cbbackupmgr-list page.

Now that we have our backup repositories configured it’s time to start taking backups. Since the backup repository contains all of the configuration information for how the backup should be taken we just need to specify the backup repository name and the information for the target cluster we intend to back up. Below is an example of how to take a backup on the "cluster" backup repository. We will assume that we have our cluster running on localhost.

$ cbbackupmgr backup -a /data/backup -r cluster \
 -c couchbase://127.0.0.1 -u Administrator -p password
Backing up to 2016-03-22T10_26_08.933579821-07_00
Copied all data in 6s (Avg. 6.67MB/Sec)        38894 items / 40.02MB
travel-sample           [==================================] 100.00%
beer-sample             [==================================] 100.00%
Backup successfully completed

When the backup command is executed it will by default print out a progress bar which is helpful for understand how long your backup will take to complete and the rate of data movement. While the bakcup is running the progress bar will give an estimated time to completion, but this will change to average backup rate when the backup finishes. Information is also provided on the total data and items already backed up and the current rate of data movement. If the backup completes successfully you will see the "Backup completed successfully" message as the last line printed.

Let’s also run the backup on the "single" backup repository to see how the two backup runs differ.

$ cbbackupmgr backup -a /data/backup -r single \
 -c couchbase://127.0.0.1 -u Administrator -p password
Backing up to 2016-03-22T10_33_20.812668465-07_00
Copied all data in 1s (Avg. 480B/Sec)                 0 items / 480B
travel-sample           [==================================] 100.00%

Since the "single" backup repository is only configured to back up index definitions for the travel-sample bucket we can we do not see a progress bar for the beer-sample bucket. We can also see that the backup executed quicker since the was much less data to actually back up.

Since we now have backups in our backup archive let’s take a look at the state of our backup archive has changed by using the list subcommand.

$ cbbackupmgr list -a /data/backup
Size      Items          Name
154.25MB  -              /
154.21MB  -              + cluster
154.21MB  -                  + 2016-03-22T10_26_08.933579821-07_00
55.85MB   -                      + beer-sample
298B      0                          bucket-config.json
55.84MB   7303                       + data
55.84MB   7303                           shard_0.fdb
2B        0                          full-text.json
10.07KB   8                          gsi.json
784B      1                          views.json
98.36MB   -                      + travel-sample
300B      0                          bucket-config.json
98.35MB   31591                      + data
98.35MB   31591                          shard_0.fdb
2B        0                          full-text.json
10.07KB   8                          gsi.json
1.72KB    1                          views.json
40.08KB   -              + single
40.08KB   -                  + 2016-03-22T10_33_20.812668465-07_00
40.08KB   -                      + travel-sample
300B      0                          bucket-config.json
28.00KB   0                          + data
28.00KB   0                              shard_0.fdb
2B        0                          full-text.json
10.07KB   8                          gsi.json
1.72KB    1                          views.json

Now that we have some backups defined the output of the list subcommand is much more useful. We can see that our "cluster" backup repository contains one backup with a name corresponding to the time the backup was taken. That backup also contains two buckets and we can see various files in each of those backups with their size and item counts. The "single" backup repository also contains one backup, but this backup only contains the travel-sample bucket and contains 0 data items.

One of the most important features of cbbackupmgr is that it is an incremental-only backup utility. This means that once we have backed up some data we will never need to back it up again. In order to simulate some changes on the cluster we can run the 02-modify.sh script from the backup-tutorial github repository mentioned at the beginning of the tutorial. If you do not have this script then you will need to modify two documents and add two new documents to the travel-sample bucket. After we have modified some data we will run the backup subcommand on the "cluster" backup repository again.

$ cbbackupmgr backup -a /data/backup -r cluster \
 -c couchbase://127.0.0.1 -u Administrator -p password
Backing up to 2016-03-22T14_00_38.668068342-07_00
Copied all data in 3s (Avg. 18.98KB/Sec)           4 items / 56.95KB
travel-sample           [==================================] 100.00%
beer-sample             [==================================] 100.00%
Backup successfully completed

In this backup notice that since we updated 2 items and created two items that this is all that we need back up during this run. If we list the backup archive using the list subcommand then we will see that the backup archive looks like something like what is below.

$ cbbackupmgr list -a /data/backup
Size      Items          Name
254.31MB  -              /
254.28MB  -              + cluster
154.19MB  -                  + 2016-03-22T10_26_08.933579821-07_00
55.84MB   -                      + beer-sample
298B      0                          bucket-config.json
55.83MB   7303                       + data
55.83MB   7303                           shard_0.fdb
2B        0                          full-text.json
9.99KB    8                          gsi.json
784B      1                          views.json
98.35MB   -                      + travel-sample
300B      0                          bucket-config.json
98.34MB   31591                      + data
98.34MB   31591                          shard_0.fdb
2B        0                          full-text.json
9.99KB    8                          gsi.json
1.72KB    1                          views.json
100.08MB  -                  + 2016-03-22T14_00_38.668068342-07_00
50.03MB   -                      + beer-sample
298B      0                          bucket-config.json
50.02MB   0                          + data
50.02MB   0                              shard_0.fdb
2B        0                          full-text.json
9.99KB    8                          gsi.json
784B      1                          views.json
50.05MB   -                      + travel-sample
300B      0                          bucket-config.json
50.04MB   4                          + data
50.04MB   4                              shard_0.fdb
2B        0                          full-text.json
9.99KB    8                          gsi.json
1.72KB    1                          views.json
40.08KB   -              + single
40.08KB   -                  + 2016-03-22T10_33_20.812668465-07_00
40.08KB   -                      + travel-sample
300B      0                          bucket-config.json
28.00KB   0                          + data
28.00KB   0                              shard_0.fdb
2B        0                          full-text.json
10.07KB   8                          gsi.json
1.72KB    1                          views.json

Restoring a Backup

Now that we have some backup data let’s restore that data backup to the cluster. In order to restore data we just need to know the name of the backup that we want to restore. To find the name we can again use the list subcommand in order to see what is in our backup archive. The backup name will always be a timestamp. For example, let’s say we want to restore the 2016-03-22T10_26_08.933579821-07_00 from the "cluster" backup repository. In order to do this we run the command below.

$ cbbackupmgr restore -a /tmp/backup -r cluster \
 -c http://127.0.0.1:9000 -u Administrator -p password \
 --start 2016-03-22T14_00_16.892277632-07_00 \
 --end 2016-03-22T14_00_16.892277632-07_00 --force-updates
(1/1) Restoring backup 2016-03-22T14_00_16.892277632-07_00
Copied all data in 2s (Avg. 19.96MB/Sec)       38894 items / 39.91MB
travel-sample           [==================================] 100.00%
beer-sample             [==================================] 100.00%
Restore completed successfully

In the command above we use the --start and --end flags to specify the range of backups we want to restore. Since we are only restoring one backup we specify the same value for both --start and --end. We also added the --force-updates flag in order to skip Couchbase conflict resolution. This tells cbbackupmgr to force overwrite key-value pairs being restored even if the key-value pair on the cluster is newer and the one being restored. If we look at the two values that we updated on the cluster we will now see that they have been reverted back to what they were at the time we took the initial backup. If you used the script in the backup-tutorial github repository to update documents then you an use the 03-insepct.sh script to see the state of the updated documents after the restore.

The restore subcommand also allows for you to exclude data that was backed up from the restore and provides various other options. See the cbbackupmgr-restore page for more information on restoring data.

Merging backups

Using an incremental backup solution means that each backup we take increases disk space. Since disk space in not infinite we need to be able to reclaim this disk space. In order to do this we use the cbbackupmgr-merge subcommand to merge two or more backups together. Since we have two backups in the "cluster" backup repository we will merge these backups together using the command below.

$ cbbackupmgr merge -a /data/backup -r cluster \
 --start 2016-03-22T14_00_16.892277632-07_00 \
 --end 2016-03-22T14_00_38.668068342-07_00
Merge completed successfully

After merging the backups together we can use the list subcommand to see the effect the merge subcommand we just ran had on the backup archive.

$ cbbackupmgr list --archive /data/backup
Size      Items          Name
154.41MB  -              /
154.37MB  -              + cluster
154.37MB  -                  + 2016-03-22T14_00_38.668068342-07_00
55.84MB   -                      + beer-sample
298B      0                          bucket-config.json
55.83MB   7303                       + data
55.83MB   7303                           shard_0.fdb
2B        0                          full-text.json
9.99KB    8                          gsi.json
784B      1                          views.json
98.53MB   -                      + travel-sample
300B      0                          bucket-config.json
98.52MB   31593                      + data
98.52MB   31593                          shard_0.fdb
2B        0                          full-text.json
9.99KB    8                          gsi.json
1.72KB    1                          views.json
40.08KB   -              + single
40.08KB   -                  + 2016-03-22T10_33_20.812668465-07_00
40.08KB   -                      + travel-sample
300B      0                          bucket-config.json
28.00KB   0                          + data
28.00KB   0                              shard_0.fdb
2B        0                          full-text.json
10.07KB   8                          gsi.json
1.72KB    1                          views.json

We can see from the list command that there is now a single backup in the "cluster" backup repository. This backup has a name that reflects the name of the most recent backup in the merge. It also has 31593 data items in the travel-sample bucket. This is two more items than the original backup we took because the second backup had two new items. The two items that were updated were de-duplicated during the merge so they do not add extra items to the count displayed by the list subcommand.

For more information on how the merge command works as well as information on other ways the merge command can be used see the cbbackupmgr-merge page.

Removing a Backup Repository

If no longer need a backup repository then we can use the remove subcommand to remove the backup repository. Below is an example showing how to remove the "cluster" backup repository.

$ cbbackupmgr remove -a /data/backup -r cluster
Backup repository `cluster` deleted successfully from archive `/data/backup`

If we now run the list subcommand you will see that the "cluster" backup repository no longer exists. For more information on the remove subcommand see the cbbackupmgr-remove page.

CBBACKUPMGR

Part of the cbbackupmgr suite