Data Operations

how-to

Data service offers the simplest way to retrieve or mutate data where the key is known. Here we cover CRUD operations, document expiration, and optimistic locking with CAS.

Documents

A document refers to an entry in the database (other databases may refer to the same concept as a row). A document has an ID (primary key in other databases), which is unique to the document and by which it can be located. The document also has a value which contains the actual application data. See the concept guide to Documents for a deeper dive into documents in the Couchbase Data Platform. Or read on, for a hands-on introduction to working with documents from the Ruby SDK.

CRUD Operations

The core interface to Couchbase Server is simple KV operations on full documents. Make sure you’re familiar with the basics of authorization and connecting to a Cluster from the Start Using the SDK section. We’re going to expand on the short Upsert example we used there, adding options as we move through the various CRUD operations. Here is the Insert operation, with simple error handling:

begin
  collection.insert("document-key", {"title" => "My Blog Post"})
rescue Error::DocumentExists
  puts "The document already exists!"
end

Setting a Compare and Swap (CAS) value is a form of optimistic locking - dealt with in depth in the CAS page. Here we just note that the CAS is a value representing the current state of an item; each time the item is modified, its CAS changes. The CAS value is returned as part of a document’s metadata whenever a document is accessed. Without explicitly setting it, a newly-created document would have a CAS value of 0.

collection.upsert("my-document", {"initial" => true})

result = collection.get("my-document")
content = result.content
content["modified"] = true
content["initial"] = false
collection.replace("my-document", content, Options::Replace(cas: result.cas))

Expiration sets an explicit time to live (TTL) for a document. For a discussion of item (Document) vs Bucket expiration, see the Expiration Overview page.

collection.upsert("my-document", {"doc" => true},
                  Options::Insert(expiry: 2 * 60 * 60))

# or with ActiveSupport::Duration
require 'active_support/core_ext/numeric/time'
collection.upsert("my-document", {"doc" => true},
                  Options::Insert(expiry: 2.hours))

# Time instances also acceptable as absolute time points
expiry = Time.now + 30 # 30 seconds from now
collection.upsert("my-document", {"doc" => true},
                  Options::Insert(expiry: expiry))

Durability

Writes in Couchbase are written to a single node, and from there the Couchbase Server will take care of sending that mutation to any configured replicas.

The optional durability_level parameter, which all mutating operations accept, allows the application to wait until this replication (or persistence) is successful before proceeding.

It can be used like this:

collection.upsert("my-document", {"doc" => true},
                Options::Upsert(durability_level: :majority))

If no argument is provided the application will report success back as soon as the primary node has acknowledged the mutation in its memory. However, we recognize that there are times when the application needs that extra certainty that especially vital mutations have been successfully replicated, and the other durability options provide the means to achieve this.

The options differ depending on what Couchbase Server version is in use. If 6.5 or above is being used, you can take advantage of the Durable Write feature, in which Couchbase Server will only return success to the SDK after the requested replication level has been achieved. The three replication levels are:

:majority - The server will ensure that the change is available in memory on the majority of configured replicas.
:majority_and_persist_to_active - Majority level, plus persisted to disk on the active node.
:persist_to_majority - Majority level, plus persisted to disk on the majority of configured replicas.

The options are in increasing levels of safety. Note that nothing comes for free - for a given node, waiting for writes to storage is considerably slower than waiting for it to be available in-memory. These trade offs, as well as which settings may be tuned, are discussed in the durability page.

If a version of Couchbase Server earlier than 6.5 is being used then the application can fall-back to 'client verified' durability. Here the SDK will do a simple poll of the replicas and only return once the requested durability level is achieved. This can be achieved like this:

collection.upsert("my-document", {"doc" => true},
                Options::Upsert(persist_to: :none, replicate_to: :two))

To stress, durability is a useful feature but should not be the default for most applications, as there is a performance consideration, and the default level of safety provided by Couchbase will be reasonable for the majority of situations.

Sub-Document Operations

All of these operations involve fetching the complete document from the Cluster. Where the number of operations or other circumstances make bandwidth a significant issue, the SDK can work on just a specific path of the document with Sub-Document Operations.

Retrieving full documents

Using the .get() method with the document key can be done in a similar fashion to the other operations:

begin
  get_result = collection.get("document-key")
  title = get_result.content["title"]
  puts title
  #=> My Blog Post
rescue Error::DocumentExists
  puts "Document not found!"
end

You can then add in logic to filter on the fields returned:

found = collection.get("document-key")
content = found.content
if content["author"] == "mike"
  # do something
else
  # do something else
end

Removing

When removing a document, you will have the same concern for durability as with any additive modification to the Bucket:

begin
  collection.remove("my-document")
rescue Error::DocumentNotFound
  puts "Document did not exist when trying to remove"
end

Expiration / TTL

Couchbase Server includes an option to have particular documents automatically expire after a set time. This can be useful for some use-cases, such as user sessions, caches, or other temporary documents.

You can set an expiry value when creating a document:

collection.upsert("my-document", {"doc" => true},
                  Options::Insert(expiry: 2 * 60 * 60))

# or with ActiveSupport::Duration
require 'active_support/core_ext/numeric/time'
collection.upsert("my-document", {"doc" => true},
                  Options::Insert(expiry: 2.hours))

# Time instances also acceptable as absolute time points
expiry = Time.now + 30 # 30 seconds from now
collection.upsert("my-document", {"doc" => true},
                  Options::Insert(expiry: expiry))

When getting a document, the expiry is not provided automatically by Couchbase Server but it can be requested:

found = collection.get("my-document", Options::Get(with_expiry: true))
puts "Expiry of found doc: #{found.expiry_time})"
#=> Expiry of found doc: 2020-07-26 21:52:22 +0300

The type returned by #expiry_time is Time, and always represents absolute time when the document will expire. The #expiry method that returned integer number of seconds since epoch is *deprecated*, and will be removed in release 3.1.

Note that when updating the document, special care must be taken to avoid resetting the expiry to zero. Here’s how:

found = collection.get("my-document", Options::Get(with_expiry: true))

collection.replace("my-document", {"content" => "something new"},
                   Options::Replace(expiry: found.expiry_time))

Some applications may find getAndTouch useful, which fetches a document while updating its expiry field. It can be used like this:

collection.get_and_touch("my-document", 24 * 60 * 60)

# or with ActiveSupport::Duration
require 'active_support/core_ext/numeric/time'
collection.get_and_touch("my-document", 1.day)

If the absolute value of the expiry is less than 30 days (such as 60 * 60 * 24 * 30), it is considered an offset. If the value is greater, it is considered an absolute time stamp. For more on expiration see the expiration section of our documents discussion doc.

Atomic Counters

The value of a document can be increased or decreased atomically using #increment() and #decrement() on the Couchbase::BinaryCollection. See the API Guide for more information.

Increment & Decrement are considered part of the ‘binary’ API and as such may still be subject to change.

Increment

# increment binary value by 1 (default)
binary_collection = collection.binary
res = binary_collection.increment("foo")
res.content
#=> 1

# Create a document and assign it to 10 -- counter works atomically
# by first creating a document if it doesn't exist. If it exists,
# the same method will increment/decrement per the "delta" parameter
res = binary_collection.increment("counter",
           Options::Increment(initial: 10, delta: 2))
res.value
#=> 10

Decrement

# decrement binary value by 1 (default)
res = binary_collection.decrement("foo")
res.content
#=> 0

Decrement (with options)

# Decrement value by 4 to 8
res = binary_collection.decrement("counter",
           Options::Decrement(initial: 10, delta: 4))
res.value
#=> 8

Setting the document expiry time only works when a document is created, and it is not possible to update the expiry time of an existing counter document with the Increment method — to do this during an increment, use with the Touch() method.

Atomicity Across Data Centers

If you are using Cross Data Center Replication (XDCR), be sure to avoid modifying the same counter in more than one datacenter. If the same counter is modified in multiple datacenters between replications, the counter will no longer be atomic, and its value can change in unspecified ways.

A counter must be incremented or decremented by only a single datacenter. Each datacenter must have its own set of counters that it uses — a possible implementation would be including a datacenter name in the counter document ID.

KV Range Scan

A range scan gives you documents from a collection, even if you don’t know the document IDs. This feature requires Couchbase Server 7.6 or newer.

KV range scan is suitable for use cases that require relatively low concurrency and tolerate relatively high latency. If your application does many scans at once, or requires low latency results, we recommend using SQL++ (with a primary index on the collection) instead of KV range scan.

Range scan

Here’s an example of a KV range scan that gets all documents in a collection:

KV Range Scan for all documents in a collection

result = collection.scan(RangeScan.new) (1)
result.each do |item|
  puts "ID: #{item.id}, Content: #{item.content}"
end

1 The RangeScan class has two optional attributes: from and to. If you omit them like in this example, you’ll get all documents in the collection. These parameters are for advanced use cases; you probably won’t need to specify them. Instead, it’s more common to use the "prefix" scan type shown in the next example.

Prefix scan

KV range scan can also give you all documents whose IDs start with the same prefix. Imagine you have a collection where documents are named like this: <username>::<uuid>. In other words, the document ID starts with the name of the user associated with the document, followed by a delimiter, and then a UUID. If you use this document naming scheme, you can use a prefix range scan to get all documents associated with a user. For example, to get all documents associated with user "alice", you would write:

KV Range Scan for all documents in a collection whose IDs start with alice::

result = collection.scan(PrefixScan.new('alice::')) (1)
result.each do |item|
  puts "ID: #{item.id}, Content: #{item.content}"
end

1	Note the scan type is `PrefixScan`

Sample scan

If you want to get random documents from a collection, use a sample scan.

KV Range Scan for 100 random documents

result = collection.scan(SamplingScan.new(100))
result.each do |item|
  puts "ID: #{item.id}, Content: #{item.content}"
end

Get IDs instead of full documents

If you only want the document IDs, set the ids_only attribute of Options::Scan to true, like this:

KV Range Scan for all document IDs in a collection

result = collection.scan(RangeScan.new, Options::Scan.new(ids_only: true))
result.each do |item|
  puts "ID: #{item.id}"
end

Scoped KV Operations

It is possible to perform scoped key-value operations on named Collections with Couchbase Server release 7.x. See the API docs for more information.

Here is an example showing an upsert in the users collection, which lives in the travel-sample.tenant_agent_00 keyspace:

agent_scope = bucket.scope("tenant_agent_00")
users_collection = agent_scope.collection("users")
document = {"name" => "John Doe", "preferred_email" => "johndoe111@test123.test"}

result = users_collection.upsert("user-key", document)

Additional Resources

Working on just a specific path within a JSON document will reduce network bandwidth requirements - see the Sub-Document pages.

Our Query Engine enables retrieval of information using the SQL-like syntax of SQL++ (formerly N1QL).