cbbackupmgr restore

    +

    Restores data from the backup archive to a Couchbase cluster

    SYNOPSIS

    cbbackupmgr restore [--archive <archive_dir>] [--repo <repo_name>]
                        [--cluster <host>] [--username <username>]
                        [--password <password>] [--start <backup>] [--end <backup>]
                        [--include-buckets <buckets>]
                        [--exclude-bucket <buckets>] [--map-buckets <list>]
                        [--disable-bucket-config] [--disable-analytics]
                        [--disable-views] [--disable-gsi-indexes]
                        [--disable-ft-indexes] [--disable-ft-alias]
                        [--disable-data] [--disable-eventing] [--replace-ttl <type>]
                        [--replace-ttl-with <timestamp>] [--force-updates]
                        [--threads <integer>] [--vbucket-filter <integer_list>]
                        [--no-progress-bar] [--auto-create-buckets]
                        [--continue-on-cs-failure] [--restore-partial-backups]
                        [--obj-access-key-id <access_key_id>] [--obj-cacert <cert_path>]
                        [--obj-endpoint <endpoint>] [--obj-read-only-mode]
                        [--obj-no-ssl-verify] [--obj-region <region>]
                        [--obj-staging-dir <staging_dir>]
                        [--obj-secret-access-key <secret_access_key>]
                        [--s3-force-path-style] [--s3-log-level <level>]

    DESCRIPTION

    Restores data from the backup archive to a target Couchbase cluster. By default all data, index definitions, view definitions, full-text index definitions and bucket configuration data are restored to the cluster unless specified otherwise in the repos backup config or though command line parameters when running the restore command. For example, if you changed bucket configuration settings since your last backup then restoring a previous backup will by default overwrite these settings unless you explicitly tell cbbackupmgr not to restore the bucket settings using the --disable-bucket-config flag.

    The restore command is capable of restoring a single backup or a range of backups. When restoring a single backup, all data from that backup is restored. If a range of backups is restored, then cbbackupmgr will take into account any failovers that may have occurred in between the time that the backups were originally taken. If a failover did occur in between the backups, and the backup archive contains data that no longer exists in the cluster, then the data that no longer exists will be skipped during the restore. If no failovers occurred in between backups then restoring a range of backups will restore all data from each backup. If all data must be restored regardless of whether a failover occurred in between the original backups, then data should be restored one backup at a time.

    The restore command is guaranteed to work during rebalances and failovers. If a rebalance is taking place, cbbackupmgr will track the movement of vbuckets around a Couchbase cluster and ensure that data is restored to the appropriate node. If a failover occurs during the restore then the client will wait 180 seconds for the failed node to be removed from the cluster. If the failed node is not removed in 180 seconds then the restore will fail, but if the failed node is removed before the timeout then data will continue to be restored.

    Note that if you are restoring indexes then it is highly likely that you will need to take some manual steps in order to properly restore them. This is because by default indexes will only be built if they are restored to the exact same index node that they were backed up from. If the index node they were backed up from does not exist then the indexes will be restored in round-robin fashion among the current indexer nodes. These indexes will be created, but not built and will required the administrator to manually build them. We do this because we cannot know the optimal index topology ahead of time. By not building the indexes the administrator can move each index between nodes and build them when they deem that the index topology is optimal.

    OPTIONS

    Below is a list of required and optional parameters for the restore command.

    Required

    -a,--archive <archive_dir>

    The directory containing the backup repository to restore data from. When restoring from an archive stored in S3 prefix the archive path with s3://${BUCKET_NAME}/.

    -r,--repo <repo_name>

    The name of the backup repository to restore data from.

    -c,--cluster <hostname>

    The hostname of one of the nodes in the cluster to restore data to. See the Host Formats section below for hostname specification details.

    -u,--username <username>

    The username for cluster authentication. The user must have the appropriate privileges to take a backup.

    -p,--password <password>

    The password for cluster authentication. The user must have the appropriate privileges to take a backup. If not password is supplied to this option then you will be prompted to enter your password.

    Optional

    --start <backup>

    The name of the first backup in the backup repository to restore or an index value which references an incremental backup. Valid index values are any positive integer, "oldest", and "latest". If a positive integer is used then it should reference the index of the incremental backup starting from the oldest to the most recent backup. For example, "1" corresponds to the oldest backup, "2" corresponds to the second oldest backup, and so on. Specifying "oldest" means that the index of the oldest backup should be used and specifying "latest" means the index of the most recent backup should be used. If this flag is not specified then the restore will start with the oldest backup in the backup repository.

    --end <backup>

    The name of the last backup in the backup repository to restore or an index value which references an incremental backup. Valid index values are any positive integer, "oldest", and "latest". If a positive integer is used then it should reference the index of the incremental backup starting from the oldest to the most recent backup. For example, "1" corresponds to the oldest backup, "2" corresponds to the second oldest backup, and so on. Specifying "oldest" means that the index of the oldest backup should be used and specifying "latest" means the index of the most recent backup should be used. If this flag is not specified then the restore will end with the most recent backup in the backup repository.

    --include-buckets <buckets>

    Only restore the buckets in the comma separated list <buckets>.

    --exclude-buckets <buckets>

    Skip the restore for the buckets in the comma separated list <buckets>.

    --filter-keys

    Only restore data where the key matches a particular regular expression. The regular expressions provided must follow RE2 syntax.

    --filter-values

    Only restore data where the value matches a particular regular expression. The regular expressions provided must follow RE2 syntax.

    --enable-bucket-config

    Enables restoring the bucket configuration.

    --disable-views

    Skips restoring view definitions for all buckets.

    --disable-gsi-indexes

    Skips restoring gsi index definitions for all buckets.

    --disable-ft-indexes

    Skips restoring full-text index definitions for all buckets.

    --disable-ft-alias

    Skips restoring full-text alias definitions.

    --disable-data

    Skips restoring all key-value data for all buckets.

    --disable-analytics

    Skips restoring analytics data.

    --disable-eventing

    Skips restoring the eventing service metadata.

    --force-updates

    Forces data in the Couchbase cluster to be overwritten even if the data in the cluster is newer. By default updates are not forced and all updates use Couchbase’s conflict resolution mechanism to ensure that if newer data exists on the cluster that is not overwritten by older restore data.

    --map-buckets <bucket_mapping>

    Specified when you want to restore a backup to a destination bucket that has a different name than the bucket that was originally backed up. This parameter takes a list of mappings since multiple buckets may be restored at the same time. Each bucket mapping is separated by an "=" and if multiple bucket mappings are specified then they should be comma separated. If we have two buckets, bucket-1 and bucket-2, and we want to restore them to renamed-1 and renamed-2 then we would denote the mapping as "bucket-1=renamed-1,bucket-2=renamed-2". This option will only restore data to the Data service and will not restore the metadata for any other service.

    --replace-ttl <type>

    Sets a new expiration (time-to-live) value for the specified keys. This parameter can either be set to "none", "all" or "expired" and should be used along with the --replace-ttl-with flag. If "none" is supplied then the TTL values are not changed. If "all" is specified then the TTL values for all keys are replaced with the value of the --replace-ttl-with flag. If "expired" is set then only keys which have already expired will have the TTL’s replaced.

    --replace-ttl-with <timestamp>

    Updates the expiration for the keys specified by the --replace-ttl parameter. The parameter has to be set when --replace-ttl is set to "all". There are two options, RFC3339 time stamp format (2006-01-02T15:04:05-07:00) or "0". When "0" is specified the expiration will be removed. Please note that the RFC3339 value is converted to a Unix time stamp on the cbbackupmgr client. It is important that the time on both the client and the Couchbase Server are the same to ensure expiry happens correctly.

    --vbucket-filter <list>

    Specifies a list of VBuckets that should be restored. VBuckets are specified as a comma separated list of integers. If this parameter is not set then all vBuckets which were backed up are restored.

    --no-ssl-verify

    Skips the SSL verification phase. Specifying this flag will allow a connection using SSL encryption, but will not verify the identity of the server you connect to. You are vulnerable to a man-in-the-middle attack if you use this flag. Either this flag or the --cacert flag must be specified when using an SSL encrypted connection.

    --cacert <cert_path>

    Specifies a CA certificate that will be used to verify the identity of the server being connecting to. Either this flag or the --no-ssl-verify flag must be specified when using an SSL encrypted connection.

    -t,--threads <num>

    Specifies the number of concurrent clients to use when restoring data. Fewer clients means restores will take longer, but there will be less cluster resources used to complete the restore. More clients means faster restores, but at the cost of more cluster resource usage. This parameter defaults to 1 if it is not specified and it is recommended that this parameter is not set to be higher than the number of CPUs on the machine where the restore is taking place.

    --no-progress-bar

    By default, a progress bar is printed to stdout so that the user can see how long the restore is expected to take, the amount of data that is being transferred per second, and the amount of data that has been restored. Specifying this flag disables the progress bar and is useful when running automated jobs.

    --auto-create-buckets

    It will create the destination buckets if not present in the server.

    --continue-on-cs-failure

    It’s possible that during a restore, a checksum validation will fail; in this case the restore will fail fast. Supplying this flag will mean that the restore will attempt to continue upon receiving a checksum failure. See CHECKSUM FAILURE for more information.

    --restore-partial-backups

    Allow a restore to continue when the final backup in the restore range is incomplete. This flag is incompatible with the --obj-read-only flag.

    Cloud integration

    Required

    --obj-staging-dir <staging_dir>

    When performing an operation on an archive which is located in the cloud such as AWS, the staging directory is used to store local meta data files. This directory can be temporary (it’s not treated as a persistent store) and is only used during the backup. NOTE: Do not use /tmp as the your obj-staging-dir. See OBJECT STORE STAGING DIRECTORY in cbbackupmgr-config for more information.

    Optional

    --obj-access-key-id <access_key_id>

    The access key id which has access to your chosen object store. This option can be omitted when using the shared config functionality provided by your chosen object store. Can alternatively be provided using the CB_OBJSTORE_ACCESS_KEY_ID environment variable.

    --obj-cacert <cert_path>

    Specifies a CA certificate that will be used to verify the identity of the object store being connected to.

    --obj-endpoint <endpoint>

    The host/address of your object store.

    --obj-read-only

    Enable read only mode. When interacting with a cloud archive modifications will be made e.g. a lockfile will be created, log rotation will take place and the modified logs will be uploaded upon completion of the subcommand. This flag disables these features should you wish to interact with an archive in a container where you lack write permissions. This flag should be used with caution and you should be aware that your logs will not be uploaded to the cloud. This means that it’s important that if you encounter an error you don’t remove you staging directory (since logs will still be created in there and collected by the collect-logs subcommand).

    --obj-no-ssl-verify

    Skips the SSL verification phase when connecting to the object store. Specifiying this flag will allow a connection using SSL encryption, but you are vulnerable to a man-in-the-middle attack.

    --obj-region <region>

    The region in which your bucket/container resides. For AWS this option may be omitted when using the shared config functionality. See the AWS section of the cloud documentation for more information.

    --obj-secret-access-key <secret_access_key>

    The secret access key which has access to you chosen object store. This option can be omitted when using the shared config functionality provided by your chosen object store. Can alternatively be provided using the CB_OBJSTORE_SECRET_ACCESS_KEY environment variable.

    AWS S3 Options

    Optional
    --s3-force-path-style

    By default the updated virtual style paths will be used when interfacing with AWS S3. This option will force the AWS SDK to use the alternative path style URLs which are often required by S3 compatible object stores.

    --s3-log-level <level>

    Set the log level for the AWS SDK. By default logging will be disabled. Valid options are debug, debug-with-signing, debug-with-body, debug-with-request-retries, debug-with-request-errors, and debug-with-event-stream-body. :!supports_read_only_mode:

    HOST FORMATS

    When specifying a host for the couchbase-cli command the following formats are expected:

    • couchbase://<addr>

    • <addr>:<port>

    • http://<addr>:<port>

    It is recommended to use the couchbase://<addr> format for standard installations. The other two formats allow an option to take a port number which is needed for non-default installations where the admin port has been set up on a port other that 8091.

    EXAMPLES

    The restore command can be used to restore a single backup or range of backups in a backup repository. In the examples below, we will look a few different ways to restore data from a backup repository. All examples will assume that the backup archive is located at /data/backups and that all backups are located in the "example" backup repository.

    The first thing to do when getting ready to restore data is to decide which backups to restore. The easiest way to do this is to use the list command to see which backups are available to restore.

    $ cbbackupmgr list --archive /data/backups --repo example
    Size      Items          Name
    2.24GB    -              + example
    1.11GB    -                  + 2016-03-08T14_41_10.757145596-08_00
    1.11GB    -                      + default
    295B      0                          bucket-config.json
    1.11GB    983797                     + data
    1.11GB    983797                         shard_0.fdb
    2B        0                          full-text.json
    128B      0                          gsi.json
    2B        0                          views.json
    430.52MB  -                  + 2016-03-09T14_42_24.024494032-08_00
    430.52MB  -                      + default
    295B      0                          bucket-config.json
    430.52MB  334400                     + data
    430.52MB  334400                         shard_0.fdb
    2B        0                          full-text.json
    128B      0                          gsi.json
    2B        0                          views.json
    728.72MB  -                  + 2016-03-10T14_42_58.743250296-08_00
    728.72MB  -                      + default
    295B      0                          bucket-config.json
    728.72MB  607500                     + data
    728.72MB  607500                         shard_0.fdb
    2B        0                          full-text.json
    128B      0                          gsi.json
    2B        0                          views.json

    From listing the backup repository we can see we have three backups that we can restore in the "examples" backup repository. If we just want to restore one of them we set the --start and --end flags in the restore command to the same backup name and specify the cluster that we want to restore the data to. In the example below we will restore only the oldest backup.

    $ cbbackupmgr restore -a /data/backups -r example \
     -c couchbase://127.0.0.1 -u Administrator -p password \
     --start 2016-03-08T14_41_10.757145596-08_00 \
     --end 2016-03-08T14_41_10.757145596-08_00

    If we want to restore only the two most recent backups then we specify the --start and --end flags with different backup names in order to specify the range we want to restore.

    $ cbbackupmgr restore -a /data/backups -r example \
     -c couchbase://127.0.0.1 -u Administrator -p password \
     --start 2016-03-09T14_42_24.024494032-08_00 \
     --end 2016-03-10T14_42_58.743250296-08_00

    If we want to restore all of the backups in the "examples" directory then we can omit the --start and --end flags since their default values are the oldest and most recent backup in the backup repository.

    $ cbbackupmgr restore -a /data/backups -r example \
     -c couchbase://127.0.0.1 -u Administrator -p password

    Restore also allows filtering the data restored by document key and/or value by passing regular expressions to the flags --filter-keys and --filter-values respectively. Say we backup the sample bucket 'beer-sample' if we only wanted to restore only the documents that have a key that starts with '21st_amendment_brewery_cafe'. This can be done using the flag --filter-keys as shown bellow.

    $ cbbackupmg restore -c http://127.0.0.1:8091 -u Administrator -p password \
     -a /data/backups -r beer --filter-keys '^21st_amendment_brewery_cafe.*'

    Restore also allows filtering by value. Let’s say we only want to restore documents that contain the JSON field address. This could be done by passing the regular expression {.*"address":.*} to the --filter-values flag as illustrated below.

    $ cbbackupmgr -c http://127.0.0.1:8091 -u Administrator -p password \
     -a /data/backups -r beer --filter-values '{.*"address":.*}'

    Finally, we can combine both flags to filter by both key and value. Imagine you want to restore the values for beers that start with the key '21st_amendment_brewery_cafe' and have the JSON field "category":"North American Ale". This can be done by using the command bellow.

    $ cbbackupmgr -c http://127.0.0.1:8091 -u Administrator -p password \
     -a /data/backups -r beer --filter-values '{.*"category":"North American Ale".*}' \
     --filter-keys '^21st_amendment_brewery_cafe.*'

    The regular expressions provided must follow RE2 syntax.

    CHECKSUM FAILURE

    A checksum failure may occur during a restore and indicates that a document has changed since the creation of the backup. Depending on the type of corruption we may be able to restore by skipping only the corrupted documents. However, if the size of the data file has changed (e.g. not a bit flip or byte for byte modification) all documents after the corruption (for that vBucket) will be unusable.

    DISCUSSION

    The restore command works by replaying the data recorded in backup files. During a restore each key-value pair backed up by cbbackupmgr will be sent to the cluster as either a "set" or "delete" operation. The restore command replays data from each file in order of backup time to guarantee that older backup data does not overwrite newer backup data. The restore command uses Couchbase’s conflict resolution mechanism by default to ensure this behavior. The conflict resolution mechanism can be disable by specifying the --force-updates flag when executing a restore.

    Starting in Couchbase 4.6 each bucket can have different conflict resolution mechanisms. cbbackupmgr will backup all meta data used for conflict resolution, but since each conflict resolution mechanism is different cbbackupmgr will prevent restores to a bucket when the source and destination conflict resolution methods differ. This is done because by default cbbackupmgr will use the conflict resolution mechanism of the destination bucket to ensure an older value does not overwrite a newer value. If you want to restore a backup to a bucket with a different conflict resolution type you can do by using the --force-updates flag. This is allowed because forcing updates means that cbbackupmgr will skip doing conflict resolution on the destination bucket.

    Also keep in mind that unlike backups, restores cannot be resumed if they fail.

    ENVIRONMENT AND CONFIGURATION VARIABLES

    CB_CLUSTER

    Specifies the hostname of the Couchbase cluster to connect to. If the hostname is supplied as a command line argument then this value is overridden.

    CB_USERNAME

    Specifies the username for authentication to a Couchbase cluster. If the username is supplied as a command line argument then this value is overridden.

    CB_PASSWORD

    Specifies the password for authentication to a Couchbase cluster. If the password is supplied as a command line argument then this value is overridden.

    CB_ARCHIVE_PATH

    Specifies the path to the backup archive. If the archive path is supplied as a command line argument then this value is overridden.

    CB_OBJSTORE_STAGING_DIRECTORY

    Specifies the path to the staging directory. If the --obj-staging-dir argument is provided in the command line then this value is overridden.

    CB_OBJSTORE_REGION

    Specifies the object store region. If the --obj-region argument is provided in the command line then this value is overridden.

    CB_OBJSTORE_ACCESS_KEY_ID

    Specifies the object store access key id. If the --obj-access-key-id argument is provided in the command line this value is overridden.

    CB_OBJSTORE_SECRET_ACCESS_KEY

    Specifies the object store secret access key. If the --obj-secret-access-key argument is provided in the command line this value is overridden.

    CB_AWS_ENABLE_EC2_METADATA

    By default cbbackupmgr will disable fetching EC2 instance metadata. Setting this environment variable to true will allow the AWS SDK to fetch metadata from the EC2 instance endpoint.

    CBBACKUPMGR

    Part of the cbbackupmgr suite