Concurrent Document Mutations

      +
      You can use the CAS value to control how concurrent document modifications are handled. It helps avoid and control potential race conditions in which some mutations may be inadvertently lost or overridden by mutations made by other clients.

      The CAS is a value representing the current state of an item. Each time the item is modified, its CAS changes.

      The CAS value itself is returned as part of a document’s metadata whenever a document is accessed. In the SDK, this is presented as the cas field in the result object from any operation which executes successfully.

      CAS is an acronym for Compare And Swap, and is known as a form of optimistic locking. The CAS can be supplied as parameters to the replace and remove operations. When applications provide the CAS, server will check the application-provided version of CAS against the CAS of the document on the server:

      • If the two CAS values match (they compare successfully), then the mutation operation succeeds.

      • If the two CAS values differ, then the mutation operation fails.

      CAS, on the server-side might be implemented along these lines (pseudocode):

      uint Replace(string docid, object newvalue, uint oldCas=0) {
          object existing = this.kvStore.get(docid);
          if (!existing) {
              throw DocumentDoesNotExist();
          } else if (oldCas != 0 && oldCas != existing.cas) {
              throw CasMismatch();
          }
          uint newCas = ++existing.cas;
          existing.value = newValue;
          return newCas;
      }

      Demonstration

      The following demonstrates how the server handles CAS. A use case for employing the CAS is when adding a new field to an existing document. At the application level, this requires the following steps:

      1. Read entire document.

      2. Perform modification locally.

      3. Store new document to server.

      Assume the following two blocks of code are executing concurrently in different application instances:

      Table 1. CAS flow
      Thread #1 Thread #2
      >>> result = cb1.get('docid')
      >>> new_doc = result.value
      >>> new_doc['field1'] = 'value1'
      >>> cb1.replace('docid', new_doc)
      >>> result = cb2.get('docid')
      >>> new_doc = result.value
      >>> new_doc['field2'] = 'value2'
      >>> cb2.replace('docid', new_doc)

      Retrieving the document again yields:

      >>> cb1.get('docid').value
      {u'field2': u'value2', u'a_field': u'a_value'}

      Note that field1 is not present, even though the application inserted it into the document. The reason is because the replace on Thread #2 happened to run after the replace on Thread #1, however Thread #1’s replace was executed after Thread #2’s get: Since the local version of the document on Thread #2 did not contain field1 (because Thread #1’s update was not stored on the server yet), by executing the replace, it essentially overrode the replace performed by Thread #1.

      1

      (#2): new_doc = get("docid").value

      2

      (#1): new_doc = get("docid").value

      3

      (#1): new_doc["field1"] = "value1"

      4

      (#2): new_doc["field2"] = "value2"

      5

      (#1): cb.replace("docid", new_doc)

      6

      (#2): cb.replace("docid", new_doc)

      Using CAS - Example

      In the prior example, we saw that concurrent updates to the same document may result in some updates being lost. This is not because Couchbase itself has lost the updates, but because the application was unaware of newer changes made to the document and inadvertently overwrote them.

      Table 2. CAS flow
      >>> result = cb1.get('docid')
      >>> new_doc = result.value
      >>> print new_doc
      {u'a_field': u'a_value'}
      >>> cur_cas = result.cas
      >>> print cur_cas
      272002471883283
      >>> new_doc['field1'] = 'value1'
      >>> new_result = cb1.replace(
             'docid',
             new_doc,
             cas=cur_cas)
      Server’s CAS matches cur_cas. New CAS assigned
      >>> print new_result.cas
      195896137937427
      >>> result = cb2.get('docid')
      >>> new_doc = result.value
      >>> print new_doc
      {u'a_field': u'a_value'}
      >>> cur_cas = result.cas
      >>> print cur_cas
      272002471883283
      >>> new_doc['field2'] = 'value2'
      >>> new_result = cb2.replace(
             'docid',
             new_doc,
             cas=cur_cas)
      CAS on server differs: 195896137937427 vs 272002471883283!

      Handling CAS errors

      If the item’s CAS has changed since the last operation performed by the current client (i.e. the document has been changed by another client), the CAS used by the application is considered stale. If a stale CAS is sent to the server (via one of the mutation commands, as above), the server will reply with an error, and the Couchbase SDK will accordingly return this error to the application (either via return code or exception, depending on the language).

      How to handle this error depends on the application logic. If the application wishes to simply insert a new property within the document (which is not dependent on other properties within the document), then it may simply retry the read-update cycle by retrieving the item (and thus getting the new CAS), performing the local modification and then uploading the change to the server. For example, if a document represents a user, and the application is simply updating a user’s information (like an email field), the method to update this information may look like this:

      int
      casLoop(const couchbase::collection& collection, const std::string& doc_id, int max_retries = 10)
      {
          for (int i = 0; i < max_retries; i++) {
              // Get the current document contents
              auto [get_err, get_res] = collection.get(doc_id).get();
      
              if (get_err) {
                  fmt::println("Got an error during get: {}", get_err);
                  return 1;
              }
      
              // Get and modify the content
              auto content = get_res.content_as<tao::json::value>();
              content["visitCount"] = content["visitCount"].get_unsigned() + 1;
      
              // Try to replace the document, using CAS
              auto [replace_err, replace_res] = collection.replace(doc_id, content, couchbase::replace_options().cas(get_res.cas())).get();
              if (replace_err) {
                  // Check if the error returned is a cas mismatch, if it is, we retry
                  if (replace_err.ec() == couchbase::errc::common::cas_mismatch) {
                      continue;
                  }
                  // Something else went wrong - fast fail
                  fmt::println("Something else went wrong during replace: {}", replace_err);
                  return 1;
              }
              // Succeeded - we're done
              return 0;
          }
          fmt::println("Maxed out our retries");
          return 1;
      }

      Sometimes more logic is needed when performing updates, for example, if a property is mutually exclusive with another property; only one or the other can exist, but not both.

      Performance considerations

      CAS operations incur no additional overhead. CAS values are always returned from the server for each operation. Comparing CAS at the server involves a simple integer comparison which incurs no overhead.

      CAS value format

      The CAS value should be treated as an opaque object at the application level. No assumptions should be made with respect to how the value is changed (for example, it is wrong to assume that it is a simple counter value). In the SDK, the CAS is represented as a 64 bit integer for efficient copying but should otherwise be treated as an opaque 8 byte buffer.

      Pessimistic locking

      While CAS is the recommended way to perform locking and concurrency control, Couchbase also offers explicit locking. When a document is locked, attempts to mutate it without supplying the correct CAS will fail.

      Documents can be locked using the get-and-lock operation and unlocked either explicitly using the unlock operation or implicitly by mutating the document with a valid CAS. While a document is locked, it may be retrieved but not modified without using the correct CAS value. When a locked document is retrieved, the server will return an invalid CAS value, preventing mutations of that document.

      This handy table shows various behaviors while an item is locked:

      Table 3. Behavior of various operations on a locked item
      Operation Result

      get-and-lock

      Locked error.

      get

      Always succeeds, but with an invalid CAS value returned (so it cannot be used as an input to subsequent mutations).

      unlock with bad/missing CAS value

      Locked error.

      unlock with correct CAS

      Item is unlocked. It can now be locked again and/or accessed as usual.

      Mutate with bad/missing CAS value

      CasMismatch error.

      Mutate with correct CAS value

      Mutation is performed and item is unlocked. It can now be locked again and/or accessed as usual.

      A document can be locked for a maximum of 30 seconds, after which the server will unlock it. This is to prevent misbehaving applications from blocking access to documents inadvertently. You can modify the time the lock is held for (though it can be no longer than 30 seconds).

      Setting a lock greater than 30 seconds will cause Couchbase Server to set the lock duration at the Server’s default value, which is 15 seconds.

      Be sure to keep note of the cas value when locking a document. You will need it when unlocking or mutating the document. The following blocks show how to use lock and unlock operations.

      auto [get_err, get_res] = collection.get_and_lock("key", std::chrono::seconds(10)).get();
      if (get_err) {
          fmt::println("Got an error during get_and_lock: {}", get_err);
      } else {
          auto unlock_err = collection.unlock("key", get_res.cas()).get();
          if (unlock_err) {
              fmt::println("Got an error during unlock: {}", unlock_err);
          } else {
              // Successfully unlocked
          }
      }

      If the item has already been locked, the SDK will return a couchbase::errc::common::cas_mismatch error code, which means that the operation could not be executed temporarily, but may succeed later on.