You can make some adjustments to tune system performance with Couchbase and Elasticsearch. You can omit some fields from indexing, add more Elasticsearch nodes, change the number of concurrent replications, and make sure to configure your Elasticsearch nodes properly.
When any search engine has to index large blocks of data, the process is more CPU intensive than smaller blocks of data.
So if you have objects with large amounts of text that are not important for search results, you can provide a custom mapping and omit those fields from indexing using the setting
For more detailed information about disabling fields, see Object Type in the Elasticsearch Reference.
To limit the number of index entries, you can filter the documents on the Elasticsearch side. The document filter chooses whether to index or ignore certain documents according to their ID. To configure filtering, see Filtering Documents on the Elasticsearch Side.
If your Couchbase Server cluster experiences a backlog of items in the replication queue, consider adding additional Elasticsearch nodes. Adding additional nodes should increase the speed of items indexing by the search engine.
If you are running your Couchbase cluster and Elasticsearch cluster on hardware with high-performance CPUs, you can adjust settings to improve replication speed between the two clusters. For more information about adjusting XDCR parameters, see XDCR Advanced Settings.
Key parameters you can use to adjust XDCR performance are
XDCR Source Nozzles per Node and
XDCR Target Nozzles per Node.
These parameters increase or decrease the maximum concurrent replication by a Couchbase node.
Each replication can require multiple TCP connections and both the concurrent replications and the number of connections may overwhelm the Elasticsearch node.
If this does occur, you may see errors in the Couchbase Web Console.
Click XDCR and look in the Ongoing Replications section, where the
capi_… string is the ID of the CAPI nozzle, a component in XDCR replication, which writes to the target:
"capi_f4268e62702130298bf87f17cc481219/default/default_172.23.105.146:9091_1 – Connection reset failed"
This message indicates that Couchbase Server cannot communicate with Elasticsearch in the time that it expects.
Couchbase Server can recover from these types of errors and retry replication.
However, your replication may take longer to complete or operate with higher latency because the operations must be later retried.
In such case, you could lower the total number of concurrent replications that can be handled by your Elasticsearch node by decreasing the default
XDCR Source Nozzles per Node setting and adjusting the
XDCR Target Nozzles per Node accordingly.
Check Elasticsearch documentation to make sure that you have configured your Elasticsearch nodes for best performance.
The following tips, which are not specific to Couchbase, will also tune system performance:
When bulk loading, increase the Elasticsearch setting
index.refresh_intervalto improve indexing performance. This setting determines how quickly Elasticsearch makes the indexed documents available to query. For a large bulk load, where realtime indexing is not needed, setting this setting to a large number such as 30s can significantly increase throughput.
If you know that some documents are always or nearly always queried together, you can configure document routing in the plugin so that those documents end up on the same Elasticsearch shard. This can provide a significant boost in query performance.
Mapping documents to different Elasticsearch types and then searching only within a specific type will always be faster than searching the whole Elasticsearch index.