Introduction

Couchbase Analytics is a Developer Preview of a parallel data management capability for Couchbase Server. Couchbase Analytics is designed to efficiently run complex queries over many records. By complex queries, we mean large ad hoc join, set, aggregation, and grouping operations, any of which may result in long running queries, high CPU usage, high memory consumption, and excessive network latency in data fetching and cross node coordination.

Regardless of the technology used, analytic queries can be predetermined or ad hoc, and can be cheap or expensive depending on how much data processing they need. Performance challenges can arise when queries access large numbers of documents and when queries are not supported by a secondary index, as often happens with ad hoc analytics such as you may perform using a visualization tool.

Couchbase Analytics is designed to support truly ad hoc queries in a reasonable amount of time, even when scans are required. Because Analytics supports efficient parallel query processing and bulk data handling, Couchbase Analytics is still preferred for expensive queries, even when those queries are predetermined and might therefore be supported by an index.

The Couchbase Analytics approach has significant advantages compared to alternatives:

  • Common data model: Couchbase Analytics natively supports the same rich, flexible-schema document data model used in Couchbase Server, rather than trying to force your data into an RDBMS model.
  • Workload isolation: Operational query latency and throughput are protected from slow downs due to analytical query workload - without the complexity of operating a separate analytical database.
  • High data freshness: Couchbase Analytics uses DCP, a fast memory-to-memory protocol that Couchbase Server nodes use to synchronize data among themselves. Because of this, analytics run on data that’s extremely current, without extract, transform, load (ETL) or other hassles and delays.

SQL++ Query Language

Couchbase Analytics is programmed using the SQL++ query language, which is a next-generation declarative query language. SQL++ has much in common with SQL, but it also includes a small number of extensions that address the different data models the two languages were designed to serve. Compared to SQL, SQL++ is much newer and targets the nested, schema-optional or even schema-less world of modern NoSQL systems.

Now, you may wonder why Couchbase Analytics uses a query language other than N1QL, the query language used by the Couchbase Server’s query service. The answer is, this is a temporary situation. Both SQL++ and N1QL are quite similar; in fact N1QL was inspired by the open language specification SQL++ and is a proper subset of that specification. In the long term, the two query languages will merge so that Couchbase Server can be queried using one single query language.

For more details, refer to the SQL++ Language Reference section.
<