A newer version of this documentation is available.

View Latest

Storage Architecture

Couchbase Server consists of various services and components that have different storage requirements. Each component uses the optimized storage engine purpose-built and configured for the workload of relevant components.

As an administrator, you can independently control data and index storage paths within the file system on a per node basis. This ensures data and index storage can utilize separate I/O subsystems to enable independent tuning and isolation. There are multiple storage engines in use in Couchbase Server:

  • Data Service, MapReduce Views, and Spatial Views

    These use Couchstore. Each vBucket is represented as a separate Couchstore file in the file system. Couchstore uses a B+tree structure to quickly access items through their keys. To ensure efficient and safe writes, Couchstore uses an append-only write model for each file.

  • Index Service

    In Couchbase Server Enterprise Edition, for standard indexes, the Index Service uses the Plasma storage engine. This maintains indexes partially on disk and partially in memory, thereby conserving memory resources. Plasma supports high-performance concurrent disk-access.

    In Couchbase Server Enterprise Edition, the Index Service also uses the Nitro storage engine, in support of Memory-Optimized indexes. Nitro provides fast index scanning, low-overhead snapshot creation, and scalable garbage collection. Additionally, although designed principally to handle indexes in memory, Nitro saves indexes to disk at regular intervals: this allows the Indexer to recover from a restart, without needing to rebuild indexes.

    For more information on Plasma and Nitro, see Global Secondary Index Storage.

    Note that in Couchbase Server Community Edition, the Index Service does not support Memory-Optimized indexes, and uses the ForestDB storage engine for standard indexes, rather than Plasma.

Append-only and Compaction

As mutations arrive, the writes append new pages to the end of the file and invalidate links to previous versions of the updated pages. With these append-only write models, a compaction process is needed to clean up the orphaned or fragmented space in the files.

In Couchbase Server, the compaction process reads the existing file and writes a new contiguous file that no longer contains the orphaned items. The compaction process runs in the background and is designed to minimize the impact on the front end performance.

The compaction process can be manual, scheduled, or automated based on percentage of fragmentation. Compaction of an entire dataset is parallelized across multiple nodes as well as multiple files within those nodes.

In the figure below, as updated data is received by Couchbase Server, the previous versions are orphaned. After compaction, the orphaned references are removed and a continuous file is created.

compaction
Figure 1. Compaction in Couchbase Server

Writes with Circular Reuse

When you enable writes with "circular reuse", as mutations arrive, instead of simply appending new pages to the end of the file, write operations look for reusing the orphaned space in the file. If there is not enough orphaned space available in the file that can accommodate the write, the operation may still do a write with append. With writes with circular reuse, a compaction process is still needed to create a continuous (defragmented) file.

With circular reuse, full compaction still operates the same way. The compaction process reads the existing file and writes a new contiguous file that no longer contains the orphaned items, and is written as a contiguous file in order of the keys. The compaction process runs less often with writes with circular reuse. Compaction still runs in the background and is designed to minimize the impact on the front end performance.

The compaction process can be manual, scheduled, or automated based on percentage of fragmentation. See Configuring Auto-Compaction for details. Compaction of an entire dataset is parallelized across multiple nodes as well as multiple files within those nodes.