Handling Exceptions and Other Errors with the Python SDK in Couchbase

All Couchbase exceptions are derived from CouchbaseError. Exceptions may be caught via except. You may catch specific exceptions for specific handling, or you may catch the CouchbaseError and handle exceptions based on their status codes. Note that a single error may be part of multiple categories.

CouchbaseDataError

This error category is raised when there is a data logic error (for example, a missing document ID). Handling of this error depends on application logic (for example, either perform corrective action to insert the document, or return an error up the stack indicating the specified resource does not exist).

ArgumentError

This type of exception is thrown during argument/input validation. It indicates that one or more arguments passed to a method are invalid. You should determine how and why the application is passing invalid input to the SDK

CouchbaseNetworkError and CouchbaseTransientError

This error category indicates a connectivity issue between the SDK and the Couchbase Cluster. This error might be a result of a temporary condition or a systematic infrastructure failure.

TimeoutError and NetworkError are subclasses of these error types.

ObjectThreadError

This error is thrown when access is detected from multiple concurrent threads.

Anatomy of an Exception Object

The Python SDK makes full use of the fact that a Python exception is just an object. In addition to catching the exception itself, the exception object may be analyzed for more information regarding the failure, and in the case of batched operations, may be inspected to determine which operations failed and which succeeded.

Applications may use the is_data property on a CouchbaseError instance to determine if the error is a negative reply from the server; the property will evaluate to True for situations where a key is not found, already exists, and so on. Likewise applications may use the is_network property to determine if the exception is a result of a potential network issue (and is thus not an issue with the data, but rather an issue concerning the connectivity between client and server).

All exceptions which are either data errors or network errors will contain a non-zero status code from the underlying C library; this status code may be obtained using the rc property of the exception. Exceptions which contain an rc value of 0 are typically ArgumentErrors which are thrown when a method was supplied with an invalid parameter, or an invalid combination of parameters.

Handling Errors from Multi Operations

When executing multiple operations in a _multi methods (for example, Bucket.get_multi()), some of those operations will fail and an exception will be raised. Some operations may have still succeeded, and in this case you need to inspect the exception object to see which operations failed and which succeeded using the CouchbaseError.split_results

try:
    cb.get_multi(keys)
except CouchbaseError as e:
    ok, fail = e.split_results()
    for k, v in fail.items():
        print 'Key {0} failed with error code {1}'.format(k, v.rc)
    for k, v in ok.items():
        print 'Retrieved {0} with value {1}'.format(k, v.value)

Converting Error Codes to Exception Objects

In cases where exceptions are not raised but the operation fails, you may receive an error code which is the status code returned by the C SDK. The Python SDK will not automatically convert the error code to an exception object for performance reasons. You can use the CouchbaseError.rc_to_exctype class method to get the exception class which would have been raised based on the error code. Note that an error code of 0 means success.

Conceptual Error Types

Data Errors

Data errors are errors returned by the server because a certain data condition was not met. Data errors typically have very clear corrective paths.

Document Does not Exist

If a document is not found, then it has either not yet been created or has since been deleted. It is received on retrieval (get) operations (get a document), replace operations (replace a document that already exists), and remove operations (delete a document).

If this error is received when attempting to retrieve a document, then the item should either be created (if possible) or return an error to the user.

If this error is received when replacing a document, then it indicates an issue in the application state (perhaps you can raise an exception up the stack). If you do not care that the document exists, the upsert method may be used instead which ignores this case.

If receiving this error when removing an document, it may safely be ignored: not-found on remove essentially means the item is already removed.

The Not Found error is returned by the server.

Document Already Exists

The insert operation requires that the document does not exist yet; it is intended to create a new unique record (think about inserting a "new user ID"). This error is returned by the server when a document already exists. Applications at this point should probably return an error up the stack to the user (when applicable); for example indicating that a new account could not be registered with the given user name, since it already exists.

CAS Mismatch

A CAS mismatch error is returned when an operation was executed with a CAS value (supplied by the application) and the CAS value passed differs from the CAS value on the server. The corrective course of action in this case is for the application to re-try the read-update cycle as explained in detail in the CAS documentation.

Document too Big

If the maximum content size is larger than 20MB the server responds with an error noting that it can’t store it since it is too big. This error is not transient and must be raised up the stack since it likely indicates an application error that creates too large document contents.

Transient and Resource Errors

These errors may be received because of resource starvation.

Temporary Failure

This error is received when the server is temporarily inhibiting an error that doesn’t allow it to respond successfully. For example during mutations when the cluster node has run out of memory or is currently warming up. Its disk and replication queues are full and must wait until items in those queues are stored and replicated before it can begin receiving new operations.

While this condition is rare, it may happen under massively concurrent writes from clients and a limited memory allocation on the server.

The short-term corrective action for this error is to throttle and slow down the application, giving Couchbase some time for pending operations to complete (or to complete warmup) before issuing new operations. The long term corrective action is to increase memory capacity on the cluster, either by adding more RAM to each node, or by adding more nodes. The above example shows (in pseudo-code) how to handle a temporary failure error with a linear backoff.

MAX_RETRIES = 5
BASE_DELAY = 50 // milliseconds

current_attempts = 1
do {
    try {
        bucket.upsert(id, document)
        break
    } catch (TemporaryFailure error) {
        sleep(BASE_DELAY * current_attempts)
    }
} while (++current_attempts != MAX_RETRIES)

Out Of Memory

This error indicates a severe condition on the server side and probably needs to be logged and/or signaled to monitoring systems. There is not much the application can do there to mitigate the situation other than backing off significantly and waiting until the server side memory shortage is resolved.