You are viewing the documentation for a prerelease version.

View Latest

cbdocloader

The cbdocloader tool loads Couchbase sample datasets into Couchbase Server.

Description

The cbdocloader tool loads Couchbase sample datasets into Couchbase Server. Sample datasets are zip files, provided by Couchbase, which contain documents and index definitions. These datasets are intended to allow users to explore Couchbase features prior to loading their own datasets.

For Linux, the tool’s location is /opt/couchbase/bin/cbdocloader; for Windows, C:\Program¬†Files\Couchbase\Server\bin\cbdocloader; and for Mac OS X, /Applications/Couchbase¬†Server.app/Contents/Resources/couchbase-core/bin/cbdocloader.

Syntax

The syntax for cbdocloader is as follows:

cbdocloader --cluster [hostname] --username [username]
--p [password] --bucket [bucket-name] --bucket-quota [quota]
--dataset [path] --thread [number] --verbose

Options

Command options are as follows:

Option Description

-h --help

Show this help message and exit.

-c [hostname], --cluster [hostname]

The hostname of one of the nodes in the cluster to load data into. See the Host Formats section below for hostname specification details.

-u [username], --user [username]

The username for cluster authentication. The user must have the appropriate privileges to create a bucket, write data, and create view, secondary index and full-text index definitions. This can be specified by means of the environment variable CB_REST_USERNAME.

-p [password], --password [password]

The password for cluster authentication. The user must have the appropriate privileges to create a bucket, write data, and create view, secondary index and full-text index definitions. This can be specified by means of the environment variable CB_REST_PASSWORD.

-b [bucket-name], --bucket [bucket-name]

The name of the bucket to create and load data into. If the bucket already exists, bucket creation is skipped, and data is loaded into the existing bucket.

-m [quota], --bucket-quota [quota]

The amount of memory to assign to the bucket. If the bucket already exists, this parameter is ignored.

-d [path], --dataset [path]

The path of the dataset to be loaded. The path can refer to either a zip file or a directory to load data from.

-t [number],--threads [number]

Specifies the number of concurrent clients to use when loading data. Fewer clients means data loading will take longer, but with fewer cluster resources used. More clients means faster data loading, but with higher cluster-resource usage. This parameter defaults to 1 if it is not specified. This parameter is recommended not to be set to be higher than the number of CPUs on the machine where the command is being run.

-v,--verbose

Prints log messages to stdout. This flag is useful for debugging errors in the data loading process.

Host Formats

When a host is specified for the cbdoclader command, the following formats are expected:

  • couchbase://<addr>

  • <addr>:<port>

  • http://<addr>:<port>

The couchbase://<addr> format is recommended for standard installations. The other two formats allow an option to take a port number, which is needed for non-default installations where the admin port has been set up on a port other than 8091.

Examples

To load the dataset travel-sample.zip, which is located in /opt/couchbase/samples/, into a bucket with a memory quota of 1024MB, run the following command.

cbdocloader -c couchbase://127.0.0.1 -u Administrator \
-p password -m 1024 -b travel-sample \
-d /opt/couchbase/samples/travel-sample.zip

To increase the parallelism of data-loading, use the threads option. In the example below, 4 threads are specified.

cbdocloader -c couchbase://127.0.0.1 -u Administrator \
-p password -m 1024 -b travel-sample \
-d /opt/couchbase/samples/travel-sample.zip -t 4

The cbdocloader command can also be used to load data from a folder. The folder must contain files that correspond to the samples format. See the Sample Data Format section below, for more information. The following example show to load data from folder /home/alice/census-sample:

cbdocloader -c couchbase://127.0.0.1 -u Administrator \
-p password -m 1024 -b census-sample -d /home/alice/census-sample

Sample Data Format

The cbdocloader command is used to load data from zip files or folders that correspond to the Couchbase sample data format. This is exemplified as follows:

+ sample_folder
  + design_docs
    indexes.json
    design_doc.json
  + docs
    document1.json
    document2.json
    document3.json
    document4.json

The top level directory can be given any name and will always contain two folders. The design_docs folder is where index definitions are kept. This folder will contain zero or more JSON files that contain the various indexes that should be created when the sample dataset is loaded. Global Secondary Indexes (GSI) should always be in a file named indexes.json. Below is an example of the format for defining GSI indexes.

{
 "statements": [
   {
     "statement": "CREATE PRIMARY INDEX on `bucket1`",
     "args": null
   },
   {
     "statement": "CREATE INDEX by_type on `bucket1`(name) WHERE _type='User'"
     "args": null
   }
 ]
}

GSI indexes are defined as a JSON document where each index definition is contained in a list called statements. Each element in the list is an object that contains two keys. The statement key contains the actual index definition, and the args key is used if the statement contains any positional arguments.

All other files in the design_docs folder are used to define view design documents, and each design document should be put into a separate file. These files can be named anything, but should always have the .json file extension. Below is an example of a view design document definition.

{
   "_id": "_design/players"
   "views": {
     "total_experience": {
       "map": "function(doc,meta){if(doc.jsonType ==
       "reduce": "_sum"
     },
     "player_list": {
       "map": "function (doc, meta){if(doc.jsonType ==
     }
   }
 }

In the document above, the _id field is used to name the design document. This name should always be prefixed with _design/. The other field in the top level of the document is the views field. This field contains a map of view definitions. The key for each element in the map is the name of the view. Each view must contain a map element that defines the map function, and may also contain an optional reduce element that defines the reduce function.

View design documents support map-reduce views as well as spatial views. Below is an example of a spatial view definition. Spatial views follow similar rules as the map-reduce views above.

 {
   "_id": "_design/spatial"
   "spatial": {
	 	"position": "<spatial view function definition>",
		"location": "<spatial view function definition>"
   }
 }

Note that spatial views only use a single function to define the index. As a result this function is defined as the value of the spatial views name.

The other folder at the top level directory of a sample data folder is the docs folder. This folder will contain all of the documents to load into Couchbase. Each document in this folder is contained in a separate file and each file should contain a single JSON document. The key name for the document will be the name of the file. Each file should also have a .json file extension which will be removed from the key name when the data is loaded. Since each document to be loaded is in a separate file, there can potentially be many files. The docs folder allows subfolders to help categorize documents.