Index Storage Settings

Capella Operational

concept

March 23, 2025

+ 12

All indexes in Couchbase Capella use the Couchbase Plasma storage engine.

Couchbase Capella indexes implement multi-version concurrency control (MVCC) to provide consistent index scan results and high throughput.

Index Storage

In Couchbase Capella, indexes are saved on disk, in a disk-optimized format that uses both memory and disk for index-update and scanning. The performance of standard index storage depends on overall I/O performance.

Couchbase Capella index storage supports indexes for both Couchbase buckets and Memory-only buckets. See Manage Buckets.

The standard global secondary index uses a B-Tree index and keeps the optimal working set of data in the buffer. This means the total size of the index can be much bigger than the amount of memory available in each index node.

Standard index-storage is supported by the Plasma storage engine. Plasma is highly scalable and performant storage engine that is optimized specifically for indexes. Compaction is handled automatically.

Plasma Memory Enhancements

Couchbase Capella provides the following enhancements for Plasma:

Per page Bloom Filters.
In-memory compression.

These are described below.

Per Page Bloom Filters

A Bloom filter gives guidance as to whether a searched-for item resides on disk. By indicating that the item is not on disk, the filter prevents unnecessary on-disk searches.

If Bloom filters are enabled (which is the default), when a lookup occurs, and the correct Plasma page is located, the Bloom filter indicates either that the item is not on the page, or that it may be on the page. If the filter indicates that:

The item is not on the page, then the item is not on disk, and no disk read need occur.
The item may be on the page, then the item can continue to be searched for, and a disk read must therefore occur.

The consequent reduction in disk reads promotes the efficiency of mutation processing, when the mutations are insert heavy.

In-Memory Compression

In Couchbase Capella, Plasma memory-management routinely performs the compression of a subset of items, in order to free memory; and thereby, due to the additional memory made available, keep a greater number of items in memory overall. By keeping more items in memory, the need for disk reads is reduced, as are corresponding latencies.

The selection of items to be compressed occurs periodically. Only items that have already been flushed to disk are compressed: after compression, such items are principal candidates for subsequent ejection.

Disk-flushing occurs every ten minutes: items not yet flushed to disk are not compressed; nor is any recently used item. In consequence, items most likely to be accessed remain uncompressed in memory, and are therefore accessible with the least latency; while items less likely to be accessed are retained in memory in compressed form, until their ejection; beyond which, they must be accessed through disk reads.

This new model of memory usage leads to higher residence ratios, and greater access-efficiency; at the cost of some additional CPU utilization, due to the more frequent performance of compression and decompression routines.