Advanced usage

Important: A newer version of this software with updated documentation is available. Visit the Couchbase Developer Portal for more information.

Advanced settings

The Couchbase Plug-in for Elasticsearch has several settings in a YAML file that can be update. These additional settings include:

  • couchbase.port: The port the plug-in listens on. Default: 9091.
  • couchbase.username: The username for HTTP basic authentication. Default: Administrator.
  • couchbase.password: The password for HTTP basic authentication. No default.
  • couchbase.num_vbuckets: number of data partitions Elasticsearch should specify to Couchbase Server. Default corresponds to the number of partitions expected by Couchbase Server and that exist on the source Couchbase cluster. For Mac OSX, the value is 64, and for all other platforms it is 1024.
  • couchbase.defaultDocumentType: The type of documents stored in Elasticsearch. These documents contain indexing information from Elasticsearch. Defaults to couchbaseDocument. You can change this if you define and implement your own document type which provides specialized Elasticsearch search features.
  • couchbase.checkpointDocumentType: Type of document that stores status information about replication. Default: couchbaseCheckpoint.
  • couchbase.typeSelector: The mechanism used to select the Elasticsearch type from the document ID.
  • couchbase.documentTypes: Elasticsearch type setting.
  • couchbase.documentTypeParentFields: Parent fields.
  • couchbase.documentTypeRoutingFields: Routing fields.

Setting document type

The Elasticsearch plug-in allows the Elasticsearch type to be set.

The following example uses a regular expression selector for the plug-in to set the Elasticsearch type by matching regular expressions on the document ID. The specification needs to be on the Elasticsearch configuration file. The default is config/elasticsearch.yml.


couchbase.typeSelector: org.elasticsearch.transport.couchbase.capi.RegexTypeSelector


// To set the elasticsearch type by matching regular expressions on the document ID.
// The following example matches document IDs like review-123 or review-789.
couchbase.documentTypes.review: ^review-(\d)+$
            
// Parent field is configured by using
couchbase.documentTypeParentFields.review: doc.bookId
            
// Routing is configured as
couchbase.documentTypeRoutingFields.review: doc.bookId
            
// Where bookId is a couchbase document key. The plug-in requires it to be a string.    
            

Understanding metadata

As you get more advanced in your usage of Couchbase Plug-in for Elasticsearch, it might be helpful for you to understand what is actually sent via the plug-in and how Elasticsearch uses it. When you send a JSON document to Couchbase Server to store, it looks similar to the following:



{
   "name": "Green Monsta Ale",
   "abv": 7.3,
   "ibu": 0,
   "srm": 0,
   "upc": 0,
   "type": "beer",
   "brewery_id": "wachusetts_brewing_company",
   "updated": "2010-07-22 20:00:20",
   "description": "A BIG PALE ALE with an awsome balance of Belgian malts with Fuggles and East Kent Golding hops.",
   "style": "American-Style Strong Pale Ale",
   "category": "North American Ale"
}

                

Here we have a JSON document with all the information for a beer in our application. When Couchbase stores this document, it adds metadata about the document so that we now have JSON in Couchbase that looks like this:


{
    {
   "id": "wachusetts_brewing_company-green_monsta_ale",
   "rev": "1-00000005ce01e6210000000000000000",
   "expiration": 0,
   "flags": 0,
   "type": "json"
    }
    {
       "name": "Green Monsta Ale",
       "abv": 7.3,
       "ibu": 0,
       "srm": 0,
       "upc": 0,
       "type": "beer",
       "brewery_id": "wachusetts_brewing_company",
       "updated": "2010-07-22 20:00:20",
       "description": "A BIG PALE ALE with an awsome balance of Belgian malts with Fuggles and East Kent Golding hops.",
       "style": "American-Style Strong Pale Ale",
       "category": "North American Ale"
    }
}

                

The metadata that Couchbase Server stores with our beer document contains the key for the document, an internal revision number, expiration, flags and the type of document. When Couchbase Server replicates data to Elasticsearch via the plug-in, it sends this entire JSON including the metadata. Elasticsearch will then index the document and will store the following JSON with document metadata:


{
  "id": "wachusetts_brewing_company-green_monsta_ale",
  "rev": "1-00000005ce01e6210000000000000000",
  "expiration": 0,
  "flags": 0,
  "type": "json"
}

            

And finally when you query Elasticsearch and get a result set, it will contain the document metadata only:


{
    took: 22
    timed_out: false
    _shards: {
    total: 5
    successful: 5
    failed: 0
},
    hits: {
    total: 1
    max_score: 0.18642133
    hits: [
        {
        _index: beer-sample
        _type: couchbaseDocument
        _id: wachusetts_brewing_company-green_monsta_ale
        _score: 0.18642133
            _source: {
                meta: {
                    id: wachusetts_brewing_company-green_monsta_ale
                    rev: 1-00000005ce01e6210000000000000000
                    flags: 0
                    expiration: 0
                    }
                }
            }
        ]
    }
}