Documents

  • concept
    +
    Couchbase supports CRUD operations, various data structures, and binary documents.

    Although query and path-based (Sub-Document) services are available, the simplicity of the document-based kv interface is the fastest way to perform operations involving single documents.

    Document

    A document refers to an entry in the database (other databases may refer to the same concept as a row). A document has an ID (primary key in other databases), which is unique to the document and by which it can be located. The document also has a value which contains the actual application data.

    Document IDs (keys) are assigned by application. A valid document ID must:

    • Conform to UTF-8 encoding

    • Be no longer than 250 bytes

      There is a difference between bytes and characters: most non-Latin characters occupy more than a single byte.

    You are free to choose any ID (key) for your document, so long as it conforms to the above restrictions. Unlike some other database, Couchbase does not automatically generate IDs for you, though you may use a separate counter to increment a serial number — you can also use UUIDs as keys, the best choice being determined by your use case.

    The document value contains the actual application data; for example, a product document may contain information about the price and description. Documents are usually (but not always) stored as JSON on the server. Because JSON is a structured format, it can be subsequently searched and queried.

    {
        "type": "product",
        "sku": "CBSRV45DP",
        "msrp": [5.49, "USD"],
        "ctime": "092011",
        "mfg": "couchbase",
        "tags": ["server", "database", "couchbase", "nosql", "fast", "json", "awesome"]
    }

    Primitive Key-Value Operations

    upsert(docid, document)
    insert(docid, document)
    replace(docid, document)
    get(docid)
    remove(docid)

    In Couchbase documents are stored using one of the operations: upsert, insert, and replace. Each of these operations will write a JSON document with a given document ID (key) to the database. The update methods differ in behavior in respect to the existing state of the document:

    • insert will only create the document if the given ID is not found within the database.

    • replace will only replace the document if the given ID already exists within the database.

    • upsert will always replace the document, ignoring whether the ID already exists or not.

    Documents can be retrieved using the get operation, and finally removed using the remove operations.

    Since Couchbase’s KV store may be thought of as a distributed hashmap or dictionary, the following code samples are explanatory of Couchbase’ update operations in pseudo-code:

    map<string,object> KV_STORE;
    
    void insert(string doc_id, object value) {
        if (!KV_STORE.contains(doc_id)) {
            KV_STORE.put(doc_id, value);
        } else {
            throw DocumentAlreadyExists();
        }
    }
    
    void replace(string doc_id, object value) {
        if (KV_STORE.contains(doc_id)) {
            KV_STORE.put(doc_id, value);
        } else {
            throw DocumentNotFound();
        }
    }
    
    void upsert(string doc_id, object value) {
        KV_STORE.put(doc_id, value);
    }
    
    object get(string doc_id) {
        if (KV_STORE.contains(doc_id)) {
            return KV_STORE.get(doc_id);
        } else {
            throw DocumentNotFound();
        }
    }

    You can also use SQL++ Queries (formerly N1QL) and Full Text Search to access documents by means other than their IDs, however these query operations Couchbase eventually translate into primitive key-value operations, and exist as separate services outside the data store.

    Storing and Updating Documents

    Documents can be stored and updated using either the SDK, Command line, or Web UI. When using a storage operation, the full content of the document is replaced with a new value.

    The following example shows a document being stored using the cbc utility. The ID of the document is docid and its value is JSON containing a single field (json) with the value of value.

    # When storing JSON data using cbc, ensure it is properly quoted for your shell:
    $ cbc create -u Administrator -P password docid -V '{"json":"value"}' -M upsert -U couchbase://cluster-node/bucket-name
    docid               Stored. CAS=0x8234c3c0f213

    You can also specify additional options when storing a document in Couchbase

    • Expiry (or TTL) value which will instruct the server to delete the document after a given amount of time. This option is useful for transient data (such as sessions). By default documents do not expire. See Expiry for more information on expiration.

    • CAS value to protect against concurrent updates to the same document.

    • Durability Requirements

    If you wish to only modify certain parts of a document, you can use sub-document operations which operate on specific subsets of documents:

    collection.mutate_in("customer123", [SD.upsert("fax", "311-555-0151")])

    or N1QL UPDATE to update documents based on specific query criteria:

    update `default` SET sale_price = msrp * 0.75 WHERE msrp < 19.95;

    Retrieving Documents

    This section discusses retrieving documents using their IDs, or primary keys. Documents can also be accessed using secondary lookups via SQL++ queries and MapReduce Views. Primary key lookups are performed using the key-value API, which simplifies use and increases performance (as applications may interact with the KV store directly, rather than a secondary index or query processor).

    In Couchbase, documents are stored with their IDs. Retrieving a document via its ID is the simplest and quickest operation in Couchbase.

    >>> result = cb.get('docid')
    >>> print result.value
    {'json': 'value'}
    $ cbc cat docid
    docid                CAS=0x8234c3c0f213, Flags=0x0. Size=16
    {"json":"value"}

    Once a document is retrieved, it is accessible in the native format by which it was stored; meaning that if you stored the document as a list, it is now available as a list again. The SDK will automatically deserialize the document from its stored format (usually JSON) to a native language type. It is possible to store and retrieve non-JSON documents as well, using a transcoder.

    You can also modify a document’s expiration time while retrieving it; this is known as get-and-touch and allows you to keep temporary data alive while retrieving it in one atomic and efficient operation.

    Documents can also be retrieved with SQL++. While SQL++ is generally used for secondary queries, it can also be used to retrieve documents by their primary keys (ID) (though it is recommended to use the key-value API if the ID is known). Lookups may be done either by comparing the META(from-term).id or by using the USE KEYS [...] keyword:

    SELECT * FROM default USE KEYS ["docid"];

    or

    SELECT * FROM default WHERE META(default).id = "docid";

    You can also retrieve parts of documents using sub-document operations, by specifying one or more sections of the document to be retrieved

    name, email = cb.retrieve_in('user:kingarthur', 'contact.name', 'contact.email')

    Counters

    You can atomically increment or decrement the numerical value of special counter document — examples can be found in the practical K-V Howto document.

    Do not increment or decrement counters if using XDCR. Within a single cluster the incr() is atomic, as is decr(); across XDCR however, if two clients connecting to two different (bidirectional) clusters issue incr concurrently, this may (and most likely will) result in the value only getting incremented once in total. The same is the case for decr().

    A document may be used as a counter if its value is a simple ASCII number, like 42. Couchbase allows you to increment and decrement these values atomically using a special counter operation in the Binary.Collection. The example below shows a counter being initialised, then being incremented and decremented:

    >>> cb.counter('counter_id', delta=20, initial=100).value
    100L
    >>> cb.counter('counter_id', delta=1).value
    101L
    >>> cb.counter('counter_id', delta=-50).value
    51L

    In the above example, a counter is created by using the counter method with an initial value. The initial value is the value the counter uses if the counter ID does not yet exist.

    Once created, the counter can be incremented or decremented atomically by a given amount or delta. Specifying a positive delta increments the value and specifying a negative one decrements it. When a counter operation is complete, the application receives the current value of the counter, after the increment.

    Couchbase counters are 64-bit unsigned integers in Couchbase and do not wrap around if decremented beyond 0. However, counters will wrap around if incremented past their maximum value (which is the maximum value contained within a 64-bit integer). Many SDKs will limit the delta argument to the value of a signed 64-bit integer.

    Expiration times can also be specified when using counter operations.

    CAS values are not used with counter operations since counter operations are atomic. The intent of the counter operation is to simply increment the current server-side value of the document. If you wish to only increment the document if it is at a certain value, then you may use a normal upsert function with CAS:

    rv = cb.get('counter_id')
    value, cas = rv.value, rv.cas
    if should_increment_value(value):
      cb.upsert('counter_id', value + increment_amount, cas=cas)

    You can also use sub-document counter operations to increment numeric values within a document containing other content. An example can be found in the practical sub-doc page.

    Use Cases

    The SDK provides a high-level abstraction over the simple incr()/decr() of Couchbase Server’s memcached binary protocol, using collections.binary(). This enables you to work with counters using get() and upsert() operations — allowing, inter alia, the use of durability options with the operations. You will find several ways of working with counters in the API docs.

    Expiration Overview

    Most data in a database is there to be persisted and long-lived. However, the need for transient or temporary data does arise in applications, such as in the case of user sessions, caches, or temporary documents representing a given process ownership. You can use expiration values on documents to handle transient data.

    In databases without a built-in expiration feature, dealing with transient data may be cumbersome. To provide "expiration" semantics, applications are forced to record a time stamp in a record, and then upon each access of the record check the time stamp and, if invalid, delete it.

    Since some logically ‘expired’ documents might never be accessed by the application, to ensure that temporary records do not persist and occupy storage, a scheduled process is typically also employed to scan the database for expired entries routinely, and to purge those entries that are no longer valid.

    Workarounds such as those described above are not required for Couchbase, as it allows applications to declare the lifetime of a given document, eliminating the need to embed "validity" information in documents and eliminating the need for a routine "purge" of logically expired data.

    When an application attempts to access a document which has already expired, the server will indicate to the client that the item is not found. The server internally handles the process of determining the validity of the document and removing older, expired documents.

    Setting Document Expiration

    By default, Couchbase documents do not expire. However, the expiration value may be set for the upsert, replace, and insert operations when modifying data.

    Couchbase offers two additional operations for setting the document’s expiration without modifying its contents:

    • The get-and-touch operation allows an application to retrieve a document while modifying its expiration time. This method is useful when reading session data from the database: since accessing the data is indicative of it still being "alive", get-and-touch provides a natural way to extend its lifetime.

    • The touch operation allows an application to modify a document’s expiration time without otherwise accessing the document. This method is useful when an application is handling a user session but does not need to access the database (for example, if a particular document is already cached locally).

    For Couchbase SDKs which accept simple integer expiry values (as opposed to a proper date or time object) allow expiration to be specified in two flavors.

    1. As an offset from the current time.

    2. As an absolute Unix time stamp

    If the absolute value of the expiry is less than 30 days (60 * 60 * 24 * 30 seconds), it is considered an offset. If the value is greater, it is considered an absolute time stamp.

    It might be preferable for applications to normalize the expiration value, such as by always converting it to an absolute time stamp. The conversion is performed to avoid issues when the intended offset is larger than 30 days, in which case it is taken to mean a Unix time stamp and, as a result, the document will expire automatically as soon as it is stored.

    • If you wish to use the expiration feature, then you should supply the expiry value for every mutation operation.

    • When dealing with expiration, it is important to note that most operations will implicitly remove any existing expiration. Thus, when modifying a document with expiration, it is important to pass the desired expiration time.

    • A document is expired as soon as the current time on the Couchbase Server node responsible for the document exceeds the expiration value. Bear this in mind in situations where the time on your application servers differs from the time on your Couchbase Server nodes.

    Note that expired documents are not deleted from the server as soon as they expire. While a request to the server for an expired document will receive a response indicating the document does not exist, expired documents are actually deleted (i.e. cease to occupy storage and RAM) when an expiry pager is run. The expiry pager is a routine internal process which scans the database for items which have expired and promptly removes them from storage.

    When gathering resource usage statistics, note that expired-but-not-purged items (such as the expiry pager has not scanned this item yet) will still be considered with respect to the overall storage size and item count.

    Although the API only sets expiry values per document, it is possible that elsewhere in the server, an expiry value is being set for every document in a bucket^. Should this be the case, the document TTL may be reduced, and the document may become unavailable to the app sooner than expected.