Handling Errors

    +

    Errors are inevitable. The developer’s job is to be prepared for whatever is likely to come up — and to try and be prepared for anything that conceivably could come up. Couchbase gives you a lot of flexibility, but it is recommended that you equip yourself with an understanding of the possibilities.

    As covered here, the Java SDK ships with three different APIs, allowing you to structure your application the way you want. That guide also covers how errors are actually returned (e.g. via Try, Future, or Mono) and handled, so this document will focus instead on specific errors, along with a broader look at error handling strategies.

    Key-Value Errors

    The KV Service exposes several common errors that can be encountered - both during development, and to be handled by the production app. Here we will cover some of the most common errors.

    If a particular key cannot be found it is raised as an DocumentNotFoundException:

    On the other hand if the key already exists and should not (e.g. on an insert) then it is raised as a DocumentExistsException:

    Concurrency

    Couchbase provides optimistic concurrency using CAS. Each document gets a CAS value on the server, which is changed on each mutation. When you get a document you automatically receive its CAS value, and when replacing the document, if you provide that CAS the server can check that the document has not been concurrently modified by another agent in-between. If it has, it returns CasMismatchException, and the most appropriate response is to simply retry it:

    Ambiguity

    There are situations with any distributed system in which it is simply impossible to know for sure if the operation completed successfully or not. Take this as an example: your application requests that a new document be created on Couchbase Server. This completes, but, just before the server can notify the client that it was successful, a network switch dies and the application’s connection to the server is lost. The client will timeout waiting for a response and will raise a TimeoutException, but it’s ambiguous to the app whether the operation succeeded or not.

    So TimeoutException is one ambiguous error, another is DurabilityAmbiguousException, which can returned when performing a durable operation. This similarly indicates that the operation may or may not have succeeded: though when using durability you are guaranteed that the operation will either have been applied to all replicas, or none.

    Given the inevitability of ambiguity, how is the application supposed to handle this?

    It really needs to be considered case-by-case, but the general strategy is to become certain if the operation succeeded or not, and to retry it if required.

    For instance, for inserts, they can simply be retried to see if they fail on DocumentExistsException, in which case the operation was successful:

    Non-Idempotent Operations

    Idempotent operations are those that can be applied multiple times and only have one effect. Repeatedly setting an email field is idempotent - increasing a counter by one is not.

    Some operations we can view as idempotent as they will fail with no effect after the first success - such as inserts.

    Idempotent operations are much easier to handle, as on ambiguous error results (DurabilityAmbiguousException and TimeoutException) the operation can simply be retried.

    Most key-value operations are idempotent. For those that aren’t, such as a Sub-Document arrayAppend call, or a counter increment, the application should, on an ambiguous result, first read the document to see if that change was applied.

    Exception Handling

    How to handle exceptions is unchanged from SDK 2. You should still use try/catch on the blocking APIs and the corresponding reactive/async methods on the other APIs. There have been changes made in the following areas:

    • Exception hierachy and naming.

    • Proactive retry where possible.

    Exception hierachy

    The exception hierachy is now flat and unified under a CouchbaseException. Each CouchbaseException has an associated ErrorContext which is populated with as much information as possible and then dumped alongside the stack trace if an error happens.

    Here is an example of the error context if a N1QL query is performed with an invalid syntax (i.e. select 1= from):

    Exception in thread "main" com.couchbase.client.core.error.ParsingFailedException: Parsing of the input failed {"completed":true,"coreId":1,"errors":[{"code":3000,"message":"syntax error - at from"}],"idempotent":false,"lastDispatchedFrom":"127.0.0.1:62253","lastDispatchedTo":"127.0.0.1:8093","requestId":3,"requestType":"QueryRequest","retried":11,"retryReasons":["ENDPOINT_TEMPORARILY_NOT_AVAILABLE","BUCKET_OPEN_IN_PROGRESS"],"service":{"operationId":"9111b961-e585-42f2-9cab-e1501da7a40b","statement":"select 1= from","type":"query"},"timeoutMs":75000,"timings":{"dispatchMicros":15599,"totalMicros":1641134}}

    The expectation is that the application catches the CouchbaseException and deals with it as an unexpected error (e.g. logging with subsequent bubbling up of the exception or failing). In addition to that, each method exposes exceptions that can be caught separately if needed. As an example, consider the javadoc for the Collection.get API:

       * @throws DocumentNotFoundException the given document id is not found in the collection.
       * @throws TimeoutException if the operation times out before getting a result.
       * @throws CouchbaseException for all other error reasons (acts as a base type and catch-all).

    These exceptions extend CouchbaseException, but both the TimeoutException and the DocumentNotFoundException can be caught individually if specific logic should be executed to handle them.

    Proactive Retry

    One reason why the APIs do not expose a long list of exceptions is that the SDK now retries as many operations as it can if it can do so safely. This depends on the type of operation (idempotent or not), in which state of processing it is (already dispatched or not), and what the actual response code is if it arrived already. As a result, many transient cases — such as locked documents, or temporary failure — are now retried by default and should less often impact applications. It also means, when migrating to the new SDK API, you may observe a longer period of time until an error is returned by default.

    Operations are retried by default as described above with the default BestEffortRetryStrategy. Like in SDK 2 you can configure fail-fast retry strategies to not retry certain or all operations. The RetryStrategy interface has been extended heavily in SDK 3 — please see the API documentation.

    Additional Resources

    Errors & Exception handling is an expansive topic. Here, we have covered examples of the kinds of exception scenarios that you are most likely to face. More fundamentally, you also need to weigh up concepts of durability.

    Diagnostic methods are available to check on the health of the cluster.

    Logging methods are dependent upon the platform and SDK used. We offer recommendations and practical examples.