cbbackupmgr merge

    +

    Merges two or more backups together

    SYNOPSIS

    cbbackupmgr merge [--archive <archive_dir>] [--repo <repo_name>]
                      [--start <backup>] [--end <backup>] [--threads <num>]
                      [--date-range <range>]

    DESCRIPTION

    The merge command is used in order to merge two or more backups together. Since cbbackupmgr is a utility that always takes incremental backups it is necessary to reclaim disk space from time to time. Merging data will de-duplicate similar keys in backup files being merged together in order to create a single smaller backup file. Doing merges should replace the full backup step by taking multiple incremental backups of a Couchbase cluster and converting them into a single full backup. Since this process takes place in the backup archive there is no cluster overhead to merging data together. See cbbackupmgr-strategies for suggestions on using the merge command in your backup process.

    OPTIONS

    Below are a list of required parameters for the merge command.

    Required

    -a,--archive <archive_dir>

    The archive directory to merge data in.

    -r,--repo <repo_name>

    The name of the backup repository to merge data in.

    Optional

    --start <backup>

    The name of the first backup to be merged or an index value which references an incremental backup. Valid index values are any positive integer, "oldest", and "latest". If a positive integer is used then it should reference the index of the incremental backup starting from the oldest to the most recent backup. For example, "1" corresponds to the oldest backup, "2" corresponds to the second oldest backup, and so on. Specifying "oldest" means that the index of the oldest backup should be used and specifying "latest" means the index of the most recent backup should be used.

    --end <backup>

    The name of the last backup to be merged or a index value which references an incremental backup. Valid index values are any positive integer, "oldest", and "latest". If a positive integer is used then it should reference the index of the incremental backup starting from the oldest to the most recent backup. For example, "1" corresponds to the oldest backup, "2" corresponds to the second oldest backup, and so on. Specifying "oldest" means that the index of the oldest backup should be used and specifying "latest" means the index of the most recent backup should be used.

    --date-range <range>

    This flag takes a comma separated range of start backup day to merge and end backup to merge (inclusive). The accepted formats are dd-mm-yy, backup directory name or backup index, with the first backup being number 0. To read more about format go to the section BACKUP RANGES.

    -t,--threads <num>

    Specifies the number of concurrent vBuckets that will be merged at a time. Increasing the threads will make the merge faster but will also increase the resource used by the client. This parameter defaults to 1 but it is recommended to increase it to match the number of CPUs in the client machine.

    BACKUP RANGES

    The merge command accepts a pair of dates, indices or backup directory names using the --backups argument which can be used to refine which backups in the repository to merge.

    When given the backup range '0,5' merge will merge all of the backups in chronological order starting from the first backup, finishing with the fifth backup.

    When given the backup range '20-08-2019,23-08-2019' merge will merge all the backups which fall within these two dates (inclusive). Note that the format must be 'day-month-year' this means that '01-30-19' is an invalid date and will be rejected by merge.

    Merge also accepts a backup range using the names of backups e.g. '2019-08-23T09_36_56.957232625Z'. Therefore, given the backup range '2019-08-20T11_39_34.232308323Z,2019-08-23T09_36_56.957232625Z' merge will merge all backups which fall within these two backups.

    EXAMPLES

    In order to merge data, you need to have a backup repository with at least two backups. Below is an example of merging a backup repository named "example" that has two backups in it. The first backup contains the initial dataset. The second backup was taken after four items were updated.

    $ cbbackupmgr list -a /data/backups
    Size      Items          Name
    148.70MB  -              /
    148.70MB  -              + example
    98.66MB   -                  + 2016-03-01T16_27_10.093782029-08_00
    98.66MB   -                      + travel-sample
    300B      0                          bucket-config.json
    98.66MB   31592                      + data
    98.66MB   31592                          shard_0.fdb
    2B        0                          full-text.json
    4B        0                          gsi.json
    1.72KB    1                          views.json
    50.04MB   -                  + 2016-03-01T16_27_51.349151165-08_00
    50.04MB   -                      + travel-sample
    300B      0                          bucket-config.json
    50.04MB   4                          + data
    50.04MB   4                              shard_0.fdb
    2B        0                          full-text.json
    4B        0                          gsi.json
    1.72KB    1                          views.json
    $ cbbackupmgr merge -a /tmp/backup -r example \
     --start 2016-03-01T16_27_10.093782029-08_00 \
     --end 2016-03-01T16_27_51.349151165-08_00
    $ cbbackupmgr list -a /tmp/backup
    Size      Items          Name
    98.84MB   -              /
    98.84MB   -              + example
    98.84MB   -                  + 2016-03-01T16_27_51.349151165-08_00
    98.84MB   -                      + travel-sample
    300B      0                          bucket-config.json
    98.84MB   31592                      + data
    98.84MB   31592                          shard_0.fdb
    2B        0                          full-text.json
    4B        0                          gsi.json
    1.72KB    1                          views.json

    Upon completion of the merge the number of items in the backup files is the same. This is because the keys in the second backup were also contained in the first backup, but the keys in the second backup contained newer values and overwrote the keys in the first backup during the merge. The timestamp of the backup folder is also the same as the timestamp of the latest backup because the new backup is a snapshot of the cluster at that point in time.

    DISCUSSION

    It is important that internally the merge command is able to merge backups together without corrupting the backup archive or leaving it in an intermediate state. In order to ensure this behavior cbbackupmgr always creates a new backup and completely merges all data before removing any backup files. When a merge is started a .merge_status file is created in the backup repository to track the merge progress. cbbackupmgr then copies the first backup to the .merge folder and begins merging the other backups into .merge folder. After each backup is merged the .merge_status file is updated to track the merge progress. if all backups are merged together successfully, cbbackupmgr will start deleting the old backups and then copy the fully merged backup into a folder containing the same name as the backup specified by the --end flag. If the cbbackupmgr utility fails during this process, then the merge will either be completed or the partial merge files will be removed from the backup repository during the next invocation of the cbbackupmgr.

    Since the merge command creates a new backup file before it removes the old ones it is necessary to have at least as much free space as the backups that are to be merge together.

    For more information on suggestions for how to use the merge command in your backup process see cbbackupmgr-strategies

    ENVIRONMENT AND CONFIGURATION VARIABLES

    CB_ARCHIVE_PATH

    Specifies the path to the backup archive. If the archive path is supplied as a command line argument then this value is overridden.

    FILES

    .merge_status

    File storing information on the progress of the merge.

    .merge

    Directory storing intermediate merge data.

    CBBACKUPMGR

    Part of the cbbackupmgr suite