Analyze Large Datasets

      The Analytics Service provides a parallel data-management capability, allowing the running of complex analytical queries.

      About the Analytics Service

      The Analytics Service is a parallel data management capability for that is designed to efficiently run complex queries over many records. It supports large join, set, aggregation, and grouping operations, any of which may result in long running queries, high CPU usage, high memory consumption, and/or excessive network latency due to data fetching and cross node coordination.

      The Analytics service enables you to create up to eight datasets, which contain shadow copies of the data that you want to analyze. When the Analytics datasets are linked to the operational data, changes in the operational data are reflected in your Analytics data in real time. The Analytics Service also enables you to create external links to analyze data from external sources.

      The Analytics Service supports the SQL++ for Analytics query language, a next-generation declarative query language for JSON data. SQL++ for Analytics has much in common with SQL, but it also includes a small number of extensions that address the different data models that the two languages were designed to query. For detailed information, refer to SQL++ for Analytics vs. SQL++ for Query and the SQL++ for Analytics Language Reference.

      Using the Analytics Service

      Like the other Couchbase services, the Analytics Service can be deployed during database creation, or by adding it to an existing database. The Analytics Service depends on the Data Service. This service must also be deployed on the database in order to use the Analytics Service. (Information about how these services interact with one another can be found in the Couchbase Server documentation.)

      If a database has the Analytics Service deployed, SQL++ for Analytics queries can be issued using the Couchbase SDK and the interactive Analytics Workbench.

      Use the Cost-Based Optimizer for Analytics to select the most efficient query operations. For more information, see Cost-Based Optimizer for Analytics.