A newer version of this documentation is available.

View Latest

Storage Properties

      +
      Couchbase Server provides persistence, whereby certain items are stored on disk as well as in memory; and reliability is thereby enhanced.

      Understanding Couchbase Storage

      Couchbase Server stores certain items in compressed form on disk; and, whenever required, removes them. This allows data-sets to exceed the size permitted by existing memory resources, since undeleted items not currently in memory can be restored to memory from disk, as needed. It also facilitates backup-and-restore procedures.

      Generally, a client’s interactions with the server are not blocked during disk-access procedures. However, if a specific item is being restored from disk to memory, the item is not made available to the client until the item’s restoration is complete.

      Not all items are written to disk: Ephemeral buckets and their items are maintained in memory only. See Buckets for information.

      Items written to disk are always written in compressed form. Based on bucket configuration, items may be maintained in compressed form in memory also. See Compression for information.

      Items can be removed from the disk based on a configured point of expiration, referred to as Time-To-Live. See Expiration for information.

      For illustrations of how Couchbase Server saves new and updates existing Couchbase-bucket items, thereby employing both memory and storage resources, see Memory and Storage.

      Threading

      Couchbase Server uses multiple threads when reading and writing data. It offers several configuration settings you can change to optimize performance for your workload and hardware.

      Reader and Writer Threads

      Synchronized, multithreaded readers and writers provide simultaneous, high-performance operations for data on disk. Conflicts are avoided by assigning each thread (reader or writer) a specific subset of the 1024 vBuckets for each Couchbase bucket.

      Couchbase Server allows the number of threads allocated per node for reading and writing to be configured by the administrator. The maximum thread-allocation that can be specified for each is 64, the minimum is 1.

      A high thread-allocation may improve performance on systems whose hardware resources are commensurately supportive, (for example, where the number of CPU cores is high). In particular, a high number of writer threads on such systems may significantly optimize the performance of durable writes: see Durability, for information.

      A high number of reader and writer threads will benefit disk based workloads that require high throughput especially when using high end disk drives such as NVMe SSDs. This is likely to be the case when using Magma as the storage engine. In this case it is best to choose 'Disk i/o optimized' mode for Reader and Writer thread settings.

      Note, however, that a high thread-allocation might impair some aspects of performance on less appropriately resourced nodes. Consequently, changes to the default thread-allocation should not be made to production systems without prior testing. A starting-point for experimentation is to establish the numbers for reader threads and writer threads as each equal to the queue depth of the underlying I/O subsystem.

      See the General-Settings information on Data Settings for details on how to establish appropriate numbers of reader and writer threads.

      Note also that the number of threads can also be configured for the NonIO and AuxIO thread pools:

      • The NonIO thread pool is used to run in memory tasks — for example, the durability timeout task.

      • The AuxIO thread pool is used to run auxiliary I/O tasks — for example, the access log task.

      Again, the maximum thread-allocation that can be specified for each is 64, the minimum is 1.

      Thread-status can be viewed, by means of the cbstats command, specified with the raw workload option. See cbstats for information.

      For information about using the REST API to manage thread counts, see Setting Storage Thread Allocations.

      Magma Flushing and Compaction Threads

      Couchbase Server compacts the data it writes to disk for Magma buckets. It allocates a thread pool (containing 20 threads by default) for background compaction and flushing operations for these buckets. You can change the number of threads in this pool using the num_storage_threads setting of the thread allocation REST API.

      2 types of threads share the Magma thread pool: compactor threads that compact data and flusher threads that write data to disk. By default, Couchbase Server allocates 20% of the threads to flushing data and 80% to compacting data. With the default thread pool size and the default flusher allocation, Couchbase Server uses 4 threads to flush data and 16 threads to compact data for Magma buckets. You can also change the percentage of flusher threads using the magma_flusher_thread_percentage setting of the thread allocation REST API.

      For most workloads, the default thread pool size and flusher allocation percentage work well. If you notice CPU use spikes during heavy data mutation workloads, you might want to investigate whether the compactor threads are the cause.

      You can monitor the compaction and flushing activity using the kv_magma_compactions metric. This metric counts the number of compactions Couchbase Server has performed. You can view this metric via the Statistics REST API or through Prometheus if you have configured it to collect Couchbase Server metrics. For more information about using Prometheus with Couchbase Server, see Configure Prometheus to Collect Couchbase Metrics.

      If you see the CPU use of the memcached Linux process on your nodes spike while the kv_magma_compactions count is increasing, the compactor threads may be the cause of the spike. In this case, you may want to reduce the number of compactor threads by increasing the percentage of flusher threads. This reduction limits the compactor’s ability to spike CPU use. However, reducing the number of compactor threads may lead to higher latency before Couchbase Server writes data to disk.

      The number of threads in the compactor and flusher pool and the percentages of flusher threads are advanced settings. Contact Couchbase Support before making changes to them. Support can help you determine the best settings for your workload and hardware.

      Deletion

      Items can be deleted by a client application: either by immediate action or by setting a Time-To-Live (TTL) value: this value is established through accessing the TTL metadata field of the item, which establishes a future point-in-time for the item’s expiration. When the point-in-time is reached, Couchbase Server deletes the item.

      Following deletion by either method, a tombstone is maintained by Couchbase Server, as a record (see below).

      An item’s TTL can be established either directly on the item itself or via the bucket that contains the item. For information, see Expiration.

      Tombstones

      A tombstone is a record of an item that has been removed. Tombstones are maintained to provide eventual consistency, between nodes and between clusters.

      Tombstones are created for the following:

      • Individual documents. The tombstone is created when the document is deleted; and contains the former document’s key and metadata.

      • Collections. The tombstone is created when the collection is dropped; and contains information that includes the collection-id, the collection’s scope-id, and a manifest-id that records the dropping of the collection.

        All documents that were in the dropped collection are deleted when the collection is dropped. No tombstones are maintained for such documents: moreover, any tombstones for deleted documents that existed in the collection prior to its dropping are themselves removed when the collection is dropped; and consequently, only a collection-tombstone remains when a collection is dropped. The collection-tombstone is replicated via DCP as a single message (ordered with respect to mutations occurring in the vBucket), to replicas and other DCP clients, to notify such recipients that the collection has indeed been dropped. It is then the responsibility of each recipient to purge anything it still contains that belonged to the dropped collection.

      The Metadata Purge Interval establishes the frequency with which Couchbase Server purges itself of tombstones of both kinds: which means, removes them fully and finally. The Metadata Purge Interval setting runs as part of auto-compaction (see Append-Only Writes and Auto-Compaction, below).

      For more information, see Post-Expiration Purging, in Expiration.

      Disk Paths

      At node-initialization, Couchbase Server allows up to four custom paths to be established for the saving of data to the filesystem: these are for the Data Service, the Index Service, the Analytics Service, and the Eventing Service. Note that the paths are node-specific: consequently, the data for any of these services may occupy a different filesystem-location, on each node.

      For information on setting data-paths, see Initialize a Node.

      Append-Only Writes and Auto-Compaction

      Couchbase Server uses an append-only file-write format, which helps to ensure the internal consistency of the files and reduces the risk of corruption. Necessarily, this means that every change made to a file — whether an addition, a modification, or a deletion — results in a new entry being created at the end of the file: therefore, a file whose user-data is diminished by deletion actually grows in size.

      File-sizes should be periodically reduced by means of compaction. This operation can be performed either manually, on a specified bucket; or on an automated, scheduled basis, either for specified buckets or for all buckets.

      For information on performing manual compaction with the CLI, see bucket-compact. For information on configuring auto-compaction with the CLI, see setting-compaction.

      For all information on using the REST API for compaction, see the Global Compaction API or Per-bucket Compaction API.

      For information on configuring auto-compaction with Couchbase Web Console, see Auto-Compaction.

      Disk I/O Priority

      Disk I/O — reading items from and writing them to disk — does not block client-interactions: disk I/O is thus considered a background task. The priority of disk I/O (along with that of other background tasks, such as item-paging and DCP stream-processing) is configurable per bucket. This means, for example, that one bucket’s disk I/O can be granted priority over another. For further information, see Create a Bucket.

      Ejection Policy

      Ejection is the policy which Couchbase will adopt to prevent data loss due to memory exhaustion. The policies available depend on the type of bucket being created.

      Note that in Capella, Couchbase buckets are referred to as Memory and Disk buckets; while Ephemeral buckets are referred to as Memory Only buckets.

      Table 1. Ejection policies
      Policy Bucket type Description

      No Ejection

      Ephemeral

      If memory is exhausted, then the buckets are set to read-only to prevent data loss. This is the default setting.

      NRU[1] Ejection

      Ephemeral

      The documents that have not been recently used are ejected from memory.

      Value Only Ejection

      Couchbase

      In low-memory situations, this policy will eject values and data from memory, but keys and metadata will be retained. This is the default policy for Couchbase buckets.

      Full Ejection

      Couchbase

      Under this policy, data, keys, and metadata are ejected from memory.

      The policy can be set using the REST API when the bucket is created. For more information on ejection policies, read https://blog.couchbase.com/a-tale-of-two-ejection-methods-value-only-vs-full/

      Full Ejection is recommended when the Magma storage engine is used as the storage engine for a bucket. This is especially the case when the ratio of memory to data is very low (Magma allows you to go as low as 1% of memory to data ratio).


      1. Not Recently Used