A newer version of this documentation is available.

View Latest

Incremental MapReduce Views

    MapReduce views (also called views) uses a user defined map and reduce function that can define arbitrarily complex logic for indexing. This makes views a powerful solution for interactive reporting queries that require powerful reshaping of data that can provide responses at low latencies.

    Views process the map and reduce functions to precalculate and store the answer, hence reducing the need for just-in-time calculations to answer the queries. As a result, you can define a complex logic in the index without paying a penalty of a large latency at query time.

    Views can only process data from a single bucket. Views are also grouped into design documents for optimized processing of mutations on the bucket. There can be one or many views in the design document.

    Indexing with Incremental MapReduce Views

    Incremental MapReduce views take in a user defined map and reduce function. The map function of a view is called for each mutation on the bucket. Map function can call the emit() function to generate an index item for the mutation being processed. One unique aspect of MapReduce views is that it can generate zero or more items for the mutation processed.

    The emit() function generates both a view key and a view value, and preserves the document key. The view value specified in the emit() function enables you to include attributes from the document directly in the view. You can set the view value to null in the emit() function to return just the view keys and document keys. When using views without values with queries, the query output can use the document keys to look up an individual document. Views that emit values can include enough data to support queries without having to look up each document in the bucket.

    For example, create a view on contact information that outputs the JSON items by the contact’s first and last name, and with a value containing the contact’s address. If the view does not contain the information you need, you can use the returned key with the memcached protocol to obtain the complete record.

    When a mutation arrives at the bucket, the mutation is processed in memory by the vBucket manager that is responsible for the document key. The mutation is then queued for replication with DCP to propagate the mutation to replica vBuckets and to the design documents that contain the views on the bucket. Automated view updates or incoming queries that use staleness parameters such as stale=false can trigger updates to the view. The indexer for each design document picks up the mutation from DCP and processes the map and reduce functions defined by the user for the views inside the design document. Indexers then queue the view writes for disk I/O to persist the changes to the view.

    indexing with incremental map reduce views
    Figure 1. Indexing with incremental map-reduce views

    View Storage

    Views store data using Couchstore. Many characteristics of the Couchstore storage engine described in the Bucket disk storage are applicable to view storage as well. However, tombstone maintenance is different.

    Views are distributed and aligned with the data distribution. That means if the vBuckets for a given document key is on node A, the view partition that indexes the document key is also on node A. However, unlike the vBuckets that partition data to many files on a single node, there is one unified view file per design document on each node. This optimization is done to quicken index scans.

    Views and High Availability

    Views have replicas, just like buckets do and the replica count is adjusted to be aligned with the bucket replica count. The placement of replicas have the same characteristics as the buckets. If a node fails, automatic failover kicks in where a new replica view vBucket is promoted to an active view vBucket to continue servicing view queries.

    Views and View Partitioning

    Views are automatically partitioned and aligned with data. Just like buckets data where the server initiates a rebalance operation when new nodes are added to the data service, views are redistributed to balance the load across all nodes.