Handling Exceptions and other Errors with the C (libcouchbase) SDK in Couchbase

An error is received if an operation could not be executed. Sometimes errors are received directly from the server while other times they are generated internally by the client. This page describes how errors are reported in the C SDK, and how some error types should be handled.

Error Codes

Most functions in the C SDK return an error code of type lcb_error_t. As an application developer, you should be prepared to handle non-successful error codes and treat them appropriately. If the library is not behaving as expected then your first action should be to determine if your application is properly handling (and checking) any error codes returned by the library. Remember that error codes are only significant when there is an error condition!

A successful operation is defined by the return code of LCB_SUCCESS, while any other code indicates an error condition. You can find a full list of error codes in the <libcouchbase/error.h> header.

New applications are advised to enable extended error codes by using the LCB_CNTL_DETAILED_ERRCODES setting (see Client Settings for the C (libcouchbase) SDK with Couchbase Server). Extended error codes are not enabled by default to avoid sending older applications error codes that they cannot handle.

When to Check for Errors, and What They Mean

Success and failure depend on the context. A successful return code for one of the data operation APIs (for example, lcb_store3()) does not mean the operation itself succeeded and the key was successfully stored. Rather, it means the key was successfully placed inside the library’s internal queue. The actual error code is delivered as the lcb_RESPBASE::rc parameter in the operation callback itself (that is, the callback installed with lcb_install_callback3()).

Non-successful error codes received in callbacks are usually a result of a negative server reply or a network connectivity issue.

If an error was returned from a scheduling function (e.g. lcb_get3()), it may be the result of a bad input (see LCB_EIFINPUT). Scheduling functions may also return LCB_NOT_SUPPORTED or LCB_CLIENT_FEATURE_UNAVAILABLE which suggest that the server or the client do not support the given API.

If a scheduling API returns anything but LCB_SUCCESS, the callback for that specific request will not be delivered. Conversely, it is guaranteed that the callback will always be delivered if the return code for the scheduling function is LCB_SUCCESS.

You can print a textual representation of the error by using lcb_strerror. This function is always guaranteed to return a valid string

lcb_strerror(NULL, LCB_OPTIONS_CONFLICT);
The first argument to lcb_strerror is ignored and has never been used.

Data Errors

Data errors are errors returned by the server because a certain data condition was not met. Data errors typically have very clear corrective paths.

Document Does not Exist

If a document is not found, then it has either not yet been created or has since been deleted. It is received on retrieval (get) operations (get a document), replace operations (replace a document that already exists), and remove operations (delete a document).

If this error is received when attempting to retrieve a document, then the item should either be created (if possible) or return an error to the user.

If this error is received when replacing a document, then it indicates an issue in the application state (perhaps you can raise an exception up the stack). If you do not care that the document exists, the upsert method may be used instead which ignores this case.

If receiving this error when removing an document, it may safely be ignored: not-found on remove essentially means the item is already removed.

The Not Found error is returned by the server.

Document Already Exists

The insert operation requires that the document does not exist yet; it is intended to create a new unique record (think about inserting a "new user ID"). This error is returned by the server when a document already exists. Applications at this point should probably return an error up the stack to the user (when applicable); for example indicating that a new account could not be registered with the given user name, since it already exists.

CAS Mismatch

A CAS mismatch error is returned when an operation was executed with a CAS value (supplied by the application) and the CAS value passed differs from the CAS value on the server. The corrective course of action in this case is for the application to re-try the read-update cycle as explained in detail in the CAS documentation.

Document too Big

If the maximum content size is larger than 20MB the server responds with an error noting that it can’t store it since it is too big. This error is not transient and must be raised up the stack since it likely indicates an application error that creates too large document contents.

Transient and Resource Errors

These errors may be received because of resource starvation.

Temporary Failure

This error is received when the server is temporarily inhibiting an error that doesn’t allow it to respond successfully. For example during mutations when the cluster node has run out of memory or is currently warming up. Its disk and replication queues are full and must wait until items in those queues are stored and replicated before it can begin receiving new operations.

While this condition is rare, it may happen under massively concurrent writes from clients and a limited memory allocation on the server.

The short-term corrective action for this error is to throttle and slow down the application, giving Couchbase some time for pending operations to complete (or to complete warmup) before issuing new operations. The long term corrective action is to increase memory capacity on the cluster, either by adding more RAM to each node, or by adding more nodes. The above example shows (in pseudo-code) how to handle a temporary failure error with a linear backoff.

MAX_RETRIES = 5
BASE_DELAY = 50 // milliseconds

current_attempts = 1
do {
    try {
        bucket.upsert(id, document)
        break
    } catch (TemporaryFailure error) {
        sleep(BASE_DELAY * current_attempts)
    }
} while (++current_attempts != MAX_RETRIES)

Out Of Memory

This error indicates a severe condition on the server side and probably needs to be logged and/or signaled to monitoring systems. There is not much the application can do there to mitigate the situation other than backing off significantly and waiting until the server side memory shortage is resolved.

Inspecting HTTP (N1QL, MapReduce, FTS) Errors

Many of the SDK’s APIs (lcb_n1ql_query, lcb_view_query, lcb_fts_query) use HTTP internally. Unlike Key-Value based APIs which use a binary protocol and have a fixed internal format for errors, APIs using JSON-over-HTTP can report multiple errors anywhere during execution. The C SDK itself can only report a single top-level error code (i.e. the rc within the response structure).

In cases where an HTTP API failed with a non-successful (non-2xx) HTTP reply, the rc field will be set to LCB_HTTP_ERROR and the LCB_RESP_F_FINAL bit set in the rflags field. The actual cause may be determined using one or more of the following mechanisms within the response callback:

  • Inspecting the underlying lcb_RESPHTTP object for details.

    if (rv->rc == LCB_HTTP_ERROR && rv->htresp) {
        printf("Underlying HTTP failed with code %d\n", rv->htresp->htstatus);
        printf("Raw payload: %.*s\n", (int)rv->htresp->nbody, rv->htresp->nbody);
    }

    rv may be lcb_RESPN1QL or lcb_RESPVIEWQUERY or other response types using a row-like API. Ensure that you verify the hresp pointer is not NULL before dereferencing. Be aware that the content of the response body may be empty or only contain partial JSON.

  • Inspecting the row’s metadata. The metadata is exposed via the normal row fields (e.g. row, nrow) of the response structure, but only when the LCB_RESP_F_FINAL bit is set in the rflags field.

    if ((resp->rflags & LCB_RESP_F_FINAL) && resp->nrow) {
        printf("Metadata: %.*s\n", (int)resp->nrow, resp->row);
        json_parse(resp->nrow, resp->row);
    }

    The metadata is the raw JSON returned from the server-side API. It will be emptied out of any row contents (i.e. any actual result set) as they are dynamically parsed out from the stream. The metadata may contain errors, warnings, and other metrics which may be useful when debugging. Unlike the raw HTTP response, the metadata should always be valid JSON.

Note that any error codes other than LCB_HTTP_ERROR indicate that either the C SDK has handled and converted HTTP or metadata-reported errors, or that an error occurred at the transport or input validation layer. It may still be useful to inspect the raw HTTP response (if any) and/or the metadata as above in such situations.

Program Crashes and Pitfalls

If your application abnormally terminates while invoking a function with the library, you may have either encountered a bug or passed the library an invalid pointer. Keep in mind the following points:

  • The library is not thread safe. While you may use multiple lcb_t handles in different threads, you must never access the same handle from multiple threads without using external synchronization functions (such as mutexes).

  • The response structures within the callback are valid only in the scope of the callback function itself. This means you must copy the structure (and any contained keys and values) into another location in memory if you wish to use it outside the callback.

  • Callbacks will not be invoked if the scheduling function returns a failure status. This means that the following code will result in accessing uninitialized memory:

    struct myresult {
      char *value;
      lcb_error_t err;
    }
    static void get_callback(lcb_t instance, int cbtype, const lcb_RESPGET *resp)
    {
      struct myresult *mr = (struct myresult *)resp->cookie;
      mr->err = resp->rc;
      if (mr->err == LCB_SUCCESS) {
        mr->value = malloc(resp->nkey + 1);
        memcpy(mr->value, resp->key, resp->nkey);
        mr->value[resp->nkey] = '\0';
      } else {
        mr->value = NULL;
      }
    }
    
    // Some lines later
    struct myresult mr;
    lcb_get3(instance, &mr, &cmd);
    lcb_wait(instance);
    if (mr.value) {
      // If lcb_get() returned an error, this will be uninitialized access!
      // ...
    }

A crash can also be a result of a bug in the library. Sometimes the library will call abort when it detects an inconsistent state. If you think you have found a bug in the library you should file a bug in our issue tracker or contact Couchbase support. When filing a bug, please be sure to include the library version and any relevant code samples.