A newer version of this documentation is available.

View Latest

Data access

Couchbase provides multiple ways of accessing data: using a Key-Value access pattern, querying data using MapReduce views or N1QL queries.

At a high level, Key-Value access provides a high performance access path directly to data using the key to the item stored. For applications looking for sub millisecond latencies, Key-Value method provides the fastest access your data. However, not all data access can have a key in hand and that is where queries come in. Queries can be done using MapReduce views and N1QL. N1QL provides a flexible and declarative query language that is similar to SQL access provided by relational databases. N1QL is great for fast access to your data for operations such as secondary lookup on attributes in your JSON document. For example, lookup users using the user email address and not the userID that is used for the document key.

Views also provide a powerful way to index data with user defined map and reduce functions. Views can be a powerful option for complex reshaping and pre-aggregation of data. Views are the ideal solution for interactive reporting over your data.

SQL queries with N1QL

Couchbase Server can be programmed using SQL. Given that nearly all programmers already know SQL, most developers will be able to get started quickly on Couchbase. And since most organizations already have significant amounts of SQL code, Couchbase Server fits into the technology landscape easily. Support for SQL by JDBC and ODBC drivers opens the ecosystem of tools for analytics and data integration such as Microsoft Excel and Tableau.

Couchbase created its own SQL dialect called N1QL in order to give developers and enterprises an expressive, powerful, and complete language for querying, transforming, and manipulating JSON data. The N1QL query engine is optimized for modern, highly parallel multi-core execution. N1QL has special extensions that allow it to deal with documents with variable and/or nested structures. Just as SQL operates on rows, columns and tables of an RDBMS and returns rows and columns to the application, N1QL operates on JSON and returns JSON to the application.

N1QL provides a full range of functionality immediately familiar to anyone who has used SQL. N1QL can even query across document relationships.

  • SELECT statements for queries and sub-queries

  • UPDATE, UPSERT, INSERT, DELETE statements

    DML statements are beta in Couchbase Server 4.0.
  • JOIN clauses to combine results from multiple documents into a single set

  • WHERE clauses to define filters that narrow query selectivity and constrain UPDATE, UPSERT, INSERT and DELETE statements

  • Aliases to rename elements for convenience and clarity

  • Transformations such as GROUP BY, ORDER BY, LIMIT and OFFSET

  • Set operators, both distinct and non-distinct: UNION, INTERSECT, and EXCEPT

  • Aggregate functions such as AVG, SUM, MAX, and COUNT

  • A full range of expressions including string comparisons with LIKE and string operators such as UPPER and SUBSTR

  • Prepared statements that get and cache a query plan independently of query execution, eliminating unnecessary work and therefore lowering query latency

N1QL also has extensions to SQL that enable it to better express the semantics of processing JSON, which is more varied and flexible than RDBMS schemas and which has more embedded structure. N1QL’s extensions to SQL reduce application-side complexity of post-processing and filtering the query results, and also reduce the load on the network by transmitting less data:

  • Paths expressed with dot notation to address embedded JSON elements like arrays and objects. Paths work with both named attributes and numbered indexes: for example, route.schedule[0].day returns just the day of the first schedule object in an array.

  • NEST and UNNEST commands to either construct or flatten an array in a query result, such as a user might do to create a clean list of items to display in a web page. NEST/UNNEST can be chained with JOIN in any combination.

  • USE KEYS bypasses the index scan to directly access one or more documents using a primary key lookup. Because the hash of the key is used to physically place data on the appropriate Couchbase node, the request can be completed without any other lookup. This makes USE KEYS operations extremely fast, nearly as fast as a Couchbase Key-Value operation.

  • JOIN… ON KEYS… As in use keys, the JOIN can be fulfilled by direct access, using a highly efficient nested loop join algorithm.

  • Ranging over collections with operators ANY and EVERY to check the members of an array that meet a particular condition.

  • Mapping with filtering using the ARRAY command

  • WITHIN walks a hierarchical JSON structure of arbitrary depth. Can be combined with SET and UNSET commands to add or remove nodes in the hierarchy.

  • Dynamic JSON object construction using the result of a query.

  • Nested traversal of structures so that queries can "see" and directly address parts of documents including embedded properties and arrays.

  • MISSING, a special query value that indicates the lack of a given field within a document. Unlike RDBMSs, Couchbase documents can simply omit properties that are not applicable rather than being forced to include them for consistency and setting them to NULL. Since NULL is also a valid data type in JSON, MISSING allows programmers to express different handling for each case if they so choose.

N1QL also has functions to help with querying:

  • Date functions that convert a date string into something which can be sorted and/or compared in a query

  • Regular expressions a powerful alternative to LIKE

  • String concatenation functions

  • Type coercion to parse a string as a number, for example

N1QL uses native JSON data types for Numbers, Strings, Boolean, Null, Arrays, and Objects. In addition, N1QL also supports binary types as required to interoperate with the Key-Value store, as well as MISSING, mentioned above. All data types have well defined semantics for comparison, sort order, date handling, and more. An interactive N1QL shell, called cbq, can be used to quickly issue queries and examine their results. To help novices get started quickly, an interactive tutorial is built into the query server itself.

See Querying with N1QL for information on how to query using N1QL and N1QL reference for the N1QL language reference and the N1QL REST API reference.

Key-Value operations

At the heart of Couchbase is the distributed Key-Value (KV) store. A KV store is an extremely simple, schema-less approach to data management that, as the name implies, stores a unique ID (key) together with a piece of arbitrary information (value); it may be thought of as a hash map or dictionary. The KV store itself can accept any data, whether it be a binary blob or a JSON document, and Couchbase features such as N1QL and MapReduce make use of the KV store’s ability to process JSON documents.

Due to their simplicity, KV operations execute with extremely low latency, often sub-millisecond. While the Query service is accessed by a defined query language (N1QL), the KV store is accessed using simple CRUD (Create, Read, Update, Delete) APIs, and thus provides a simpler interface when accessing documents using their IDs (primary keys).

The KV store contains the authoritative, most up-to-date state for each item. In order to perform better, query and MapReduce services provide eventually consistent indexes that, by default, use a version of the data that is potentially slightly out-of-date. However, they can instead elect to wait briefly to make sure they have had a chance to update before responding to a query. By contrast, querying the KV store directly will always access the latest version of data.

While N1QL provides a richer query interface, applications will use the KV store when speed, consistency, and simplified access patterns are preferred over flexible query options.

All KV operations are atomic, which means that Read and Update are individual operations. In order to avoid conflicts that might arise with multiple concurrent updates to the same document, applications may make use of Compare-And-Swap (CAS), which is a per-document checksum that Couchbase modifies each time a document is changed.

MapReduce Views

Developers can write custom JavaScript MapReduce programs to specify complex indexing and aggregation of items stored in Couchbase. MapReduce is a programming model for distributed data processing on highly parallelizable data: the map function reads all documents across the cluster, filters them to select the relevant information, and then emits the results; the reduce function sorts and aggregates the results. MapReduce is most useful for highly customized data processing on large input sets.

MapReduce data processing is incremental, so the output continues to update as the underlying data undergoes mutations.

Programmers can also write Spatial MapReduce programs that operate on geometric data in the form of GeoJSON, n-dimensional numeric data (hyper-cubes), or a combination of the two. This can be used to handle queries about geometries, for example, to return a list of all items within a particular bounding box.

MapReduce programs output either MapReduce Views or Spatial Views, which are described further in Indexing.

Couchbase Server performs search queries using Couchbase FTS (Developer Preview), an integrated full text search engine. With Couchbase full text search, you, as a developer, can easily add full-text search capabilities to your application without deploying additional components, which reduces operational complexity. Alternatively, if you are using external search engines such as Elasticsearch or Lucidworks, you can leverage the available connectors to continuously replicate data from the Couchbase Server cluster to the search engines.