cbdocloader

      +

      (Deprecated) A utility to import sample buckets into Couchbase Server.

      Description

      cbdocloader has been deprecated, please use cbimport with the --format sample flag instead.

      The cbdocloader tool loads Couchbase sample datasets into Couchbase Server. Sample datasets are zip files, provided by Couchbase, which contain documents and index definitions. These datasets are intended to allow users to explore Couchbase features prior to loading their own datasets.

      For Linux, the tool’s location is /opt/couchbase/bin/cbdocloader; for Windows, C:\Program Files\Couchbase\Server\bin\cbdocloader; and for Mac OS X, /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/cbdocloader.

      Syntax

      The syntax for cbdocloader is as follows:

      cbdocloader --cluster [hostname] --username [username]
      --p [password] --bucket [bucket-name] --bucket-quota [quota]
      --dataset [path] --thread [number] --verbose

      Options

      Command options are as follows:

      Option Description

      -h --help

      Show this help message and exit.

      -c [hostname], --cluster [hostname]

      The hostname of one of the nodes in the cluster to load data into. See the Host Formats section below for hostname specification details.

      -u [username], --user [username]

      The username for cluster authentication. The user must have the appropriate privileges to create a bucket, write data, and create view, secondary index and full-text index definitions. This can be specified by means of the environment variable CB_REST_USERNAME.

      -p [password], --password [password]

      The password for cluster authentication. The user must have the appropriate privileges to create a bucket, write data, and create view, secondary index and full-text index definitions. This can be specified by means of the environment variable CB_REST_PASSWORD.

      -b [bucket-name], --bucket [bucket-name]

      The name of the bucket to create and load data into. If the bucket already exists, bucket creation is skipped, and data is loaded into the existing bucket.

      -m [quota], --bucket-quota [quota]

      The amount of memory to assign to the bucket. If the bucket already exists, this parameter is ignored.

      -d [path], --dataset [path]

      The path of the dataset to be loaded. The path can refer to either a zip file or a directory to load data from.

      -t [number],--threads [number]

      Specifies the number of concurrent clients to use when loading data. Fewer clients means data loading will take longer, but with fewer cluster resources used. More clients means faster data loading, but with higher cluster-resource usage. This parameter defaults to 1 if it is not specified. This parameter is recommended not to be set to be higher than the number of CPUs on the machine where the command is being run.

      -v,--verbose

      Prints log messages to stdout. This flag is useful for debugging errors in the data loading process.

      --no-ssl-verify

      Skips the SSL verification phase. Specifying this flag will allow a connection using SSL encryption, but will not verify the identity of the server you connect to. You are vulnerable to a man-in-the-middle attack if you use this flag. Either this flag or the --cacert flag must be specified when using an SSL encrypted connection.

      --cacert

      Specifies a CA certificate that will be used to verify the identity of the server being connecting to. Either this flag or the --no-ssl-verify flag must be specified when using an SSL encrypted connection.

      Host Formats

      When a host is specified for the cbdoclader command, the following formats are expected:

      • couchbase://<addr>

      • <addr>:<port>

      • http://<addr>:<port>

      The couchbase://<addr> format is recommended for standard installations. The other two formats allow an option to take a port number, which is needed for non-default installations where the admin port has been set up on a port other than 8091.

      Examples

      To load the dataset travel-sample.zip, which is located in /opt/couchbase/samples/, into a bucket with a memory quota of 1024MB, run the following command.

      cbdocloader -c couchbase://127.0.0.1 -u Administrator \
      -p password -m 1024 -b travel-sample \
      -d /opt/couchbase/samples/travel-sample.zip

      To increase the parallelism of data-loading, use the threads option. In the example below, 4 threads are specified.

      cbdocloader -c couchbase://127.0.0.1 -u Administrator \
      -p password -m 1024 -b travel-sample \
      -d /opt/couchbase/samples/travel-sample.zip -t 4

      The cbdocloader command can also be used to load data from a folder. The folder must contain files that correspond to the samples format. See the Sample Data Format section below, for more information. The following example show to load data from folder /home/alice/census-sample:

      cbdocloader -c couchbase://127.0.0.1 -u Administrator \
      -p password -m 1024 -b census-sample -d /home/alice/census-sample

      Sample Data Format

      The cbdocloader command is used to load data from zip files or folders that correspond to the Couchbase sample data format. This is exemplified as follows:

      + sample_folder
        + design_docs
          indexes.json
          design_doc.json
        + docs
          document1.json
          document2.json
          document3.json
          document4.json

      The top level directory can be given any name and will always contain two folders. The design_docs folder is where index definitions are kept. This folder will contain zero or more JSON files that contain the various indexes that should be created when the sample dataset is loaded. Global Secondary Indexes (GSI) should always be in a file named indexes.json. Below is an example of the format for defining GSI indexes.

      {
       "statements": [
         {
           "statement": "CREATE PRIMARY INDEX on `bucket1`",
           "args": null
         },
         {
           "statement": "CREATE INDEX by_type on `bucket1`(name) WHERE _type='User'"
           "args": null
         }
       ]
      }

      GSI indexes are defined as a JSON document where each index definition is contained in a list called statements. Each element in the list is an object that contains two keys. The statement key contains the actual index definition, and the args key is used if the statement contains any positional arguments.

      All other files in the design_docs folder are used to define view design documents, and each design document should be put into a separate file. These files can be named anything, but should always have the .json file extension. Below is an example of a view design document definition.

      {
         "_id": "_design/players"
         "views": {
           "total_experience": {
             "map": "function(doc,meta){if(doc.jsonType ==
             "reduce": "_sum"
           },
           "player_list": {
             "map": "function (doc, meta){if(doc.jsonType ==
           }
         }
       }

      In the document above, the _id field is used to name the design document. This name should always be prefixed with _design/. The other field in the top level of the document is the views field. This field contains a map of view definitions. The key for each element in the map is the name of the view. Each view must contain a map element that defines the map function, and may also contain an optional reduce element that defines the reduce function.

      View design documents support map-reduce views as well as spatial views. Below is an example of a spatial view definition. Spatial views follow similar rules as the map-reduce views above.

       {
         "_id": "_design/spatial"
         "spatial": {
      	 	"position": "<spatial view function definition>",
      		"location": "<spatial view function definition>"
         }
       }

      Note that spatial views only use a single function to define the index. As a result this function is defined as the value of the spatial views name.

      The other folder at the top level directory of a sample data folder is the docs folder. This folder will contain all of the documents to load into Couchbase. Each document in this folder is contained in a separate file and each file should contain a single JSON document. The key name for the document will be the name of the file. Each file should also have a .json file extension which will be removed from the key name when the data is loaded. Since each document to be loaded is in a separate file, there can potentially be many files. The docs folder allows subfolders to help categorize documents.