A newer version of this documentation is available.

View Latest

Architecture of a single node

A Couchbase Server cluster consists of a group of interchangeable, largely self-sufficient nodes. There is just one Couchbase Server node type; every Couchbase node consists of the same components, services, and interfaces. Having a single node type greatly simplifies the installation, configuration, management and troubleshooting of a Couchbase Server cluster, both in terms of what you, as a human operator, must do and what the automatic management needs to do.

The main components of a Couchbase Server node are the cluster manager, the data service, the query service, the index service, and the underlying managed cache and storage components. By dividing up potentially conflicting workloads in this way, a Couchbase Server node can achieve maximum throughput and resource utilization and minimum latency.

The Cluster Manager handles the cluster level operations and the coordination between nodes in a Couchbase Server cluster. These management functions include configuring nodes, monitoring nodes, gathering statistics, logging, and controlling data rebalancing among cluster nodes. The cluster manager determines the node’s membership in a cluster, responds to heartbeat requests, monitors its own services and repairs itself if possible.

Although each node runs its own local Cluster Manager, there is only one node chosen from among them, called the orchestrator, that supervises the cluster at a given point in time. The orchestrator maintains the authoritative copy of the cluster configuration, and performs the necessary node management functions to avoid any conflicts from multiple nodes interacting. If a node becomes unresponsive for any reason, the orchestrator notifies the other nodes in the cluster and promotes the relevant replicas to active status. This process is called failover, and it can be done automatically or manually. If the orchestrator fails or loses communication with the cluster for any reason, the remaining nodes detect the failure when they stop receiving its heartbeat, so they immediately elect a new orchestrator. This is done immediately and is transparent to the operations of the cluster.

The Data Service handles core data management operations, such as Key Value GET/SET. The data service is also responsible for building and maintaining MapReduce views. Note that even though MapReduce views can act as indexes, they are always managed by the data service, not the indexing service.

The Data Manager performs data storage and retrieval. It consists of a managed cache (based on memcached), a storage engine, and the view query engine.

The Indexing Service efficiently maintains indexes for fast query execution. This involves the creation and deletion of indexes, keeping those indexes up to date based on change notifications from the data service and responding to index-scan requests from the query service. You can access and manage the indexing service through the query service.

The Query Service handles N1QL query parsing, optimization, and execution. This service interacts with both the indexing service and the data service to process and return those queries back to the requesting application.

server cluster single node type
Figure 1. Couchbase Server cluster showing single node type

The Storage Layer persists data from RAM to disk and reads data back into memory when needed. Couchbase Server’s storage engines are responsible for persisting data to disk. They write every change operation that a node receives to an append-only file (AOF), meaning that mutations go to the end of the file and in-place updates are forbidden. This file is replayed if necessary at node startup to repopulate the locally managed cache.

Couchbase Server periodically compacts its data files and index files to save space (auto-compaction). During compaction, each node remains online and continues to process read and write requests. Couchbase Server performs compaction in parallel, both across the many nodes of a cluster as well as across the many files in persistent storage. This highly parallel processing allows very large datasets to be compacted incrementally.

Each data node relies on a built-in multithreaded Managed Cache that is used both for the data service and separately for the indexing service. These caches are actively managed as opposed to relying upon the operating system’s file buffer cache and/or memory-mapped files.

The data service’s integral managed cache is an object-managed cache based on memcached. All key-value operations are performed via the cache. Applications can access Couchbase using memcached compatible APIs such as get, set, delete, append, and prepend. Because this API is complete and transparent to the application, Couchbase Server can be used as a drop in replacement for memcached.

Couchbase Server overcomes many challenges that are faced with memcached. Couchbase Server handles clustering, online rebalancing, and auto-sharding of data across nodes, so that memory and other resources can be elastically increased or reduced as needed to support the workload. Because Couchbase Server automatically replicates data, cache availability is greatly improved and end users rarely hit a cold cache even if system components fail. Finally, Couchbase Server presents a built-in Web Console for monitoring and managing the entire cluster.

The managed caching layer within the indexing service is built into the underlying storage engine (ForestDB in 4.0). It manages the caching of individual portions of indexes as RAM is available to speed performance both of inserting or updating information as well as servicing index scans.