A newer version of this documentation is available.

View Latest

Collecting diagnostic information

When your application crashes, stack traces, core dumps, and logging information can help identify the root cause of the crash. These tools can help you diagnose issues yourself, or if you report an issue with the C SDK to Couchbase, provide crucial information to the Couchbase developers who investigate the issue.

Getting stack traces

To get a stack trace you need to have GDB installed on your machine and be familiar with its use. If you cannot run your application from within GDB, you can attach GDB to an existing process.

To generate a useful stack trace, you need to run your application with a debug build of the Couchbase C library because the released binaries are optimized. During optimization the compiler might rearrange the code, which can result in misleading source file and line number information in the stack trace.

To make a debug build of the C library, build the library yourself from source with the ‑‑enable‑debug option applied to the configure script, as shown in the following example:

mnunberg@mbp15 ~/Source/libcouchbase $ ./cmake/configure --enable-debug
Detected in-source build. Making 'build' directory
cmake ../ \
    "-DCMAKE_BUILD_TYPE=DEBUG" \
    "-DCMAKE_INSTALL_PREFIX=/usr/local" \
    "-GUnix Makefiles" \
# SNIP

Ensure debugging symbols for the C client are installed. If you are using a binary distribution, use your package manager to install the libcouchbase-dbg or libcouchbase-debuginfo package. If you are using a version of the C library built from source, ensure that ‑‑enable‑debug was passed to the configure script.

You should also build your application with debug symbols, though not strictly required.

If your application crashes within the C library, examine the stack trace information. The C library is asynchronous in its core, so almost everything happens from within the lcb_wait() function. Tracing actual state changes and events within the library is only possible by using logging.

The following example shows how to run an application from GDB and how to get a stack trace when it crashes:

mnunberg@mbp15 ~/Source/libcouchbase/build $ gdb --args ./bin/cbc cat foo -U couchbase://192.168.33.101/default
# SNIP
(gdb) r
Starting program: /Users/mnunberg/Source/libcouchbase/build/bin/cbc cat foo -U couchbase://192.168.33.101/default
Reading symbols for shared libraries .
# SNIP
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000020
0x00000001000bacb9 in try_read (ctx=0x100703760, server=0x100700660, ior=0x1007037a8) at /Users/mnunberg/Source/libcouchbase/src/mcserver/mcserver.c:198
198	        pl2->index++;
(gdb) thread apply all bt

Thread 1 (process 7377):
#0  0x00000001000bacb9 in try_read (ctx=0x100703760, server=0x100700660, ior=0x1007037a8) at /Users/mnunberg/Source/libcouchbase/src/mcserver/mcserver.c:198
#1  0x00000001000baa29 in on_read (ctx=0x100703760, nb=31) at /Users/mnunberg/Source/libcouchbase/src/mcserver/mcserver.c:299
#2  0x0000000100090935 in invoke_read_cb (ctx=0x100703760, nb=31) at /Users/mnunberg/Source/libcouchbase/src/lcbio/ctx.c:273
#3  0x0000000100090a23 in E_handler (sock=4, which=2, arg=0x100703760) at /Users/mnunberg/Source/libcouchbase/src/lcbio/ctx.c:290
#4  0x000000010030d422 in event_base_loop ()
#5  0x00000001003068e1 in lcb_io_run_event_loop (iops=0x100405e50) at /Users/mnunberg/Source/libcouchbase/plugins/io/libevent/plugin-libevent.c:202
#6  0x00000001000c1d4d in lcb_wait (instance=0x100405910) at /Users/mnunberg/Source/libcouchbase/src/wait.c:88
#7  0x0000000100003be9 in cbc::GetHandler::run (this=0x100801200) at /Users/mnunberg/Source/libcouchbase/tools/cbc.cc:317
#8  0x000000010000318f in cbc::Handler::execute (this=0x100801200, argc=4, argv=0x7fff5fbffac8) at /Users/mnunberg/Source/libcouchbase/tools/cbc.cc:219
#9  0x000000010000e93c in main (argc=4, argv=0x7fff5fbffac8) at /Users/mnunberg/Source/libcouchbase/tools/cbc.cc:1204

Detecting invalid memory accesses

Valgrind can help you detect issues in your application (or the library itself) related to accessing invalid (already freed or unallocated) memory or uninitialized memory regions. This information is useful when debugging application behavior before a crash because crashes are often caused by entering a normally impossible state induced by a previous write or read to an invalid memory location.

You can install Valgrind through your package manager.

Valgrind will slow down execution time significantly, often by a factor of 20x. To lessen this impact on an application that performs lots of load, temporarily reduce the amount of CPU it uses.

Valgrind typically does not work well with dynamic language interpreters such as Python, Ruby, or Perl due to the way these languages optimize memory allocation. To effectively use Valgrind with these languages, you often need to use a special debug build of the interpreter. To find out more about using Valgrind with interpreters, refer to the documentation for the language you are using.

The following example shows how to run an application with Valgrind. In the example, the operation appears to time out, but the real cause of the problem is uninitialized memory access that causes the client to incorrectly read the response from the server.

mnunberg@csure:/sources/libcouchbase/build$ valgrind ./bin/cbc cat foo -U couchbase://192.168.33.101/default
==29887== Memcheck, a memory error detector
==29887== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==29887== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==29887== Command: ./bin/cbc cat foo -U couchbase://192.168.33.101/default
==29887==
==29887== Conditional jump or move depends on uninitialised value(s)
==29887==    at 0x4E6D801: try_read (mcserver.c:221)
==29887==    by 0x4E6DD64: on_read (mcserver.c:294)
==29887==    by 0x4E4B8AA: invoke_read_cb (ctx.c:273)
==29887==    by 0x4E4B92B: E_handler (ctx.c:290)
==29887==    by 0x6E55EEB: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent_core-2.0.so.5.1.7)
==29887==    by 0x6C49610: lcb_io_run_event_loop (plugin-libevent.c:202)
==29887==    by 0x4E73D75: lcb_wait (wait.c:88)
==29887==    by 0x410964: cbc::GetHandler::run() (cbc.cc:317)
==29887==    by 0x41036F: cbc::Handler::execute(int, char**) (cbc.cc:219)
==29887==    by 0x414E03: main (cbc.cc:1204)
==29887==
==29887== Conditional jump or move depends on uninitialised value(s)
==29887==    at 0x4E4BF07: E_schedule (ctx.c:447)
==29887==    by 0x4E4BFD4: lcbio_ctx_schedule (ctx.c:471)
==29887==    by 0x4E4BA21: E_handler (ctx.c:320)
==29887==    by 0x6E55EEB: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent_core-2.0.so.5.1.7)
==29887==    by 0x6C49610: lcb_io_run_event_loop (plugin-libevent.c:202)
==29887==    by 0x4E73D75: lcb_wait (wait.c:88)
==29887==    by 0x410964: cbc::GetHandler::run() (cbc.cc:317)
==29887==    by 0x41036F: cbc::Handler::execute(int, char**) (cbc.cc:219)
==29887==    by 0x414E03: main (cbc.cc:1204)
==29887==
foo                  Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout (0x17)
==29887== Conditional jump or move depends on uninitialised value(s)
==29887==    at 0x4E4B4EB: lcbio_ctx_close_ex (ctx.c:156)
==29887==    by 0x4E4B640: lcbio_ctx_close (ctx.c:192)
==29887==    by 0x4E6EFB8: finalize_errored_ctx (mcserver.c:720)
==29887==    by 0x4E6EEE3: start_errored_ctx (mcserver.c:698)
==29887==    by 0x4E6ED98: mcserver_close (mcserver.c:646)
==29887==    by 0x4E6A943: lcb_destroy (instance.c:491)
==29887==    by 0x41020C: cbc::Handler::~Handler() (cbc.cc:208)
==29887==    by 0x42222D: cbc::GetHandler::~GetHandler() (in /sources/lcb-packet-ng/build/bin/cbc)
==29887==    by 0x422297: cbc::GetHandler::~GetHandler() (cbc-handlers.h:31)
==29887==    by 0x414E3D: main (cbc.cc:1221)
==29887==
==29887==
==29887== HEAP SUMMARY:
==29887==     in use at exit: 4,994 bytes in 20 blocks
==29887==   total heap usage: 602 allocs, 582 frees, 588,374 bytes allocated
==29887==
==29887== LEAK SUMMARY:
==29887==    definitely lost: 35 bytes in 1 blocks
==29887==    indirectly lost: 0 bytes in 0 blocks
==29887==      possibly lost: 28 bytes in 1 blocks
==29887==    still reachable: 4,931 bytes in 18 blocks
==29887==         suppressed: 0 bytes in 0 blocks
==29887== Rerun with --leak-check=full to see details of leaked memory
==29887==
==29887== For counts of detected and suppressed errors, rerun with: -v
==29887== Use --track-origins=yes to see where uninitialised values come from
==29887== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 10 from 6)

Getting core dumps

Core dumps might be generated when an application crashes (though on many systems core file generation is off by default). The core dump contains a memory dump of your application that can be inspected on a different system. This lets you inspect the state of the library to determine what the error was at the time of the crash.

Core dumps can only be analyzed if the binaries that generated them are accessible and have debugging symbols. Binaries in this sense includes the client library, the application, and any other shared libraries loaded by the application. This means that the platform (and distribution) that generated the core dump must be available and loadable (or at least reproducible) when analyzing the core dump.

A core file contains the memory contents of your application. Anyone who can read your core file can access potentially sensitive data that your application was processing at the time of the crash. If possible, have your application operate on sample data.

To get a core dump:

  1. Invoke ulimit -c unlimited in the same shell that your application will run (or a parent thereof) prior to the invocation of that application. The ulimit command instructs the kernel to generate core dump files for all subsequent processes that terminate abnormally and are executed within the current shell or any child shell.

  2. Ensure you are either using a binary installation of the client library (which was installed from an official Couchbase repository) or are using a source build with debug symbols enabled.

  3. When the application crashes, a core file is created. The exact location and name of the file is dependent on the specific kernel configuration. For more information about where core files are located, see http://stackoverflow.com/questions/2065912/core-dumped-but-core-file-is-not-in-current-directory.

  4. Compress the core file using gzip or a similar utility before submitting to support. Compressing the file makes the upload and download times quicker for analyzing the core file.

  5. Disable core dumps once completed. Core dumps are disabled by default on most Linux distributions because they may potentially write sensitive information to disk and many core dump files can quickly fill up a file system.

The following example shows how to enable and collect a core dump:

mnunberg@csure:/sources/libcouchbase/build$ ulimit -c unlimited
mnunberg@csure:/sources/libcouchbase/build$ ./bin/cbc cat foo -U couchbase://192.168.33.101
Aborted (core dumped)
# Showing the effects of compressing the core file:
mnunberg@csure:/sources/libcouchbase/build$ ls -lsh core
961K -rw------- 1 mnunberg mnunberg 1.3M Nov 21 12:53 core
mnunberg@csure:/sources/libcouchbase/build$ gzip core
mnunberg@csure:/sources/libcouchbase/build$ ls -lsh core.gz
136K -rw------- 1 mnunberg mnunberg 133K Nov 21 12:53 core.gz

Collecting log information

Logging is a lightweight way to collect state and diagnostic information on the library. You can use the library’s default logging (to standard error) by setting the LCB_LOGLEVEL environment variable. There are some undocumented methods of customizing the logging output or integrating them with your application. Refer to the source code of the library for more information on how this may be done. To learn how to set up logging in your application, see Setting up logging.

The following example shows how to run an application with logging and redirect the stderr output to a file:

mnunberg@csure:/sources/libcouchbase/build$ LCB_LOGLEVEL=5 ./bin/cbc cat foo 2&> log
mnunberg@csure:/sources/libcouchbase/build$ cat log
0ms [I0] {30280} [INFO] (instance - L:372) Version=2.4.4, Changeset=4410eebcd813844b6cd6f9c7eeb4ab3dfa2ab8ac
0ms [I0] {30280} [INFO] (instance - L:373) Effective connection string: http://localhost/default?. Bucket=default
0ms [I0] {30280} [DEBUG] (instance - L:63) Adding host localhost:8091 to initial HTTP bootstrap list
0ms [I0] {30280} [DEBUG] (instance - L:63) Adding host localhost:11210 to initial CCCP bootstrap list
0ms [I0] {30280} [DEBUG] (confmon - L:89) Preparing providers (this may be called multiple times)
0ms [I0] {30280} [DEBUG] (confmon - L:99) Provider FILE is DISABLED
0ms [I0] {30280} [DEBUG] (confmon - L:96) Provider CCCP is ENABLED
0ms [I0] {30280} [DEBUG] (confmon - L:96) Provider HTTP is ENABLED
0ms [I0] {30280} [DEBUG] (confmon - L:99) Provider MCRAW is DISABLED
0ms [I0] {30280} [TRACE] (confmon - L:284) Start refresh requested
0ms [I0] {30280} [TRACE] (confmon - L:264) Current provider is CCCP
1ms [I0] {30280} [INFO] (cccp - L:118) Requesting connection to node localhost:11210 for CCCP configuration
1ms [I0] {30280} [DEBUG] (lcbio_mgr - L:414) <localhost:11210> (HE=0x1927e70) Creating new connection because none are available in the pool
1ms [I0] {30280} [DEBUG] (lcbio_mgr - L:318) <localhost:11210> (HE=0x1927e70) Starting connection on I=0x19284c0
1ms [I0] {30280} [INFO] (connection - L:441) <localhost:11210> (SOCK=0x1928600) Starting. Timeout=2000000us
1ms [I0] {30280} [TRACE] (connection - L:258) <localhost:11210> (SOCK=0x1928600) Got event handler for new connection
1ms [I0] {30280} [TRACE] (connection - L:305) <localhost:11210> (SOCK=0x1928600) Scheduling asynchronous watch for socket.
1ms [I0] {30280} [TRACE] (connection - L:258) <localhost:11210> (SOCK=0x1928600) Got event handler for new connection
1ms [I0] {30280} [TRACE] (connection - L:321) <localhost:11210> (SOCK=0x1928600) connect() failed. os_error=111 [Connection refused]
1ms [I0] {30280} [TRACE] (connection - L:258) <localhost:11210> (SOCK=0x1928600) Got event handler for new connection
1ms [I0] {30280} [ERROR] (connection - L:118) <localhost:11210> (SOCK=0x1928600) Failed: lcb_err=0x2c, os_errno=111
1ms [I0] {30280} [DEBUG] (lcbio_mgr - L:272) <localhost:11210> (HE=0x1927e70) Received result for I=0x19284c0,C=(nil); E=0x2c
1ms [I0] {30280} [ERROR] (cccp - L:133) <NOHOST:NOPORT> Got I/O Error=0x2c
1ms [I0] {30280} [INFO] (confmon - L:202) Provider 'CCCP' failed
1ms [I0] {30280} [DEBUG] (confmon - L:229) Will try next provider in 0us
1ms [I0] {30280} [TRACE] (confmon - L:264) Current provider is HTTP
1ms [I0] {30280} [TRACE] (htconfig - L:395) Starting HTTP Configuration Provider 0x191e070
1ms [I0] {30280} [INFO] (connection - L:441) <localhost:8091> (SOCK=0x1928f90) Starting. Timeout=2000000us
2ms [I0] {30280} [TRACE] (connection - L:258) <localhost:8091> (SOCK=0x1928f90) Got event handler for new connection
2ms [I0] {30280} [TRACE] (connection - L:305) <localhost:8091> (SOCK=0x1928f90) Scheduling asynchronous watch for socket.
2ms [I0] {30280} [TRACE] (connection - L:258) <localhost:8091> (SOCK=0x1928f90) Got event handler for new connection
2ms [I0] {30280} [TRACE] (connection - L:321) <localhost:8091> (SOCK=0x1928f90) connect() failed. os_error=111 [Connection refused]
2ms [I0] {30280} [TRACE] (connection - L:258) <localhost:8091> (SOCK=0x1928f90) Got event handler for new connection
2ms [I0] {30280} [ERROR] (connection - L:118) <localhost:8091> (SOCK=0x1928f90) Failed: lcb_err=0x2c, os_errno=111
2ms [I0] {30280} [ERROR] (htconfig - L:341) Connection to REST API failed with code=0x2c (111)
2ms [I0] {30280} [INFO] (confmon - L:202) Provider 'HTTP' failed
2ms [I0] {30280} [TRACE] (confmon - L:219) Maximum provider reached. Resetting index
2ms [I0] {30280} [ERROR] (bootstrap - L:100) Failed to bootstrap client=0x191bc60. Code=0xa, Message=No more bootstrap providers remain
Operation failed with code 0x2c (The remote host refused the connection. Is the service up?)