Slow Operations Logging

    +
    Tracing information on slow operations can be found in the logs as threshold logging, orphan logging, and other span metrics. Change the settings to alter how much information you collect.

    To improve debuggability certain metrics are automatically measured and logged. These include slow queries, responses taking beyond a certain threshold, and orphaned responses.

    Threshold Logging Reporting

    Threshold logging is the recording of slow operations — useful for diagnosing when and where problems occur in a distributed environment.

    Configuring Threshold Logging

    To configure threshold logging with PHP SDK 3.x you will need to set the tracer configuration via the cluster connection string.

    Before setting this up, however, you must ensure that your log level is high enough to display tracing messages.

    ini_set("couchbase.log_level", "INFO");

    Next, add the tracer settings to the connection when initialising a cluster:

    $connectionString = "couchbase://127.0.0.1?" .
        "tracing_threshold_queue_flush_interval=3&" . /* every 3 seconds */
        "tracing_threshold_kv=0.01"; /* 10 milliseconds */
    
    $cluster = new Cluster($connectionString, $options);
    $bucket = $cluster->bucket("travel-sample");
    $collection = $bucket->defaultCollection();

    This will set our threshold for key-value operations to 10 milliseconds, and log the found operations every 3 seconds.

    Here is an example that shows upserting a document that causes longer processing on the server side.

    /*
     * Create a new document
     */
    $document = ["answer" => 42, "updated_at" => date("r")];
    $collection->upsert("foo", $document);
    
    /*
     * Replace the document with a big body and very small deadline which should trigger a 
     * client-side timeout, in which case the server response will be reported as orphan
     */
    $options = new UpsertOptions();
    $options->timeout(1);
    /*
     * Generate a document with 10M payload, that should be unfriendly to the compressing function
     * and longer to process on the server side
     */
    $document = ["noise" => base64_encode(random_bytes(15_000_000))];
    $numberOfTimeouts = 0;
    while (true) {
        try {
            $collection->upsert("foo", $document, $options);
        } catch (TimeoutException $e) {
            $numberOfTimeouts++;
            if ($numberOfTimeouts > 3) {
                break;
            }
        }
    }

    You should expect to see output in JSON format in the logs for operations over the specified threshold (prettified for readability):

    [cb,INFO] (tracer L:149 I:1734870558) Operations over threshold: {
       "count":5,
       "service":"kv",
       "top":[
          {
             "operation_name":"php/upsert",
             "total_us":1584965
          },
          {
             "last_local_address":"127.0.0.1:55956",
             "last_local_id":"3918037d7eb01092/c14128c7238454cb",
             "last_operation_id":"0x2",
             "last_remote_address":"127.0.0.1:11210",
             "operation_name":"upsert",
             "server_us":153489,
             "total_us":1479441
          },
          {
             "operation_name":"php/request_encoding",
             "total_us":78730
          },
          {
             "operation_name":"php/request_encoding",
             "total_us":75922
          },
          {
             "operation_name":"php/upsert",
             "total_us":10862
          }
       ]
    }

    The count represents the total amount of recorded items in each interval per service, which in this case is the KV service. You can also see more information about each item/operation in the top array.

    To see all the available options for the tracing settings refer to the Tracing Options section of this guide.

    Orphaned Response Reporting

    Orphan response reporting acts as an auxiliary tool to the tracing and metrics capabilities. It does not expose an external API to the application and is very focused on its feature set.

    The way it works is that every time a response is in the process of being completed, when the SDK detects that the original caller is not listening anymore (likely because of a timeout), it will send this “orphan” response to a reporting utility which then aggregates it. and at regular intervals logs them in a specific format.

    When the user then sees timeouts in their logs, they can go look at the output of the orphan reporter and correlate certain properties that aid debugging in production. For example, if a single node is slow but the rest of the cluster is responsive, this would be visible from orphan reporting.

    Configuring Orphan Logging

    Configuring orphan logging is very similar to how we configured the threshold logging, the only difference is the property passed to the cluster connection string:

    $connectionString = "couchbase://127.0.0.1?" .
        "tracing_orphaned_queue_flush_interval=5&"; /* every 5 seconds */
    
    $cluster = new Cluster($connectionString, $options);
    $bucket = $cluster->bucket("travel-sample");
    $collection = $bucket->defaultCollection();

    The same code example from the threshold logging section can be used to demonstrate orphaned responses.

    /*
     * Create a new document
     */
    $document = ["answer" => 42, "updated_at" => date("r")];
    $collection->upsert("foo", $document);
    
    /*
     * Replace the document with a big body and very small deadline which should trigger a 
     * client-side timeout, in which case the server response will be reported as orphan
     */
    $options = new UpsertOptions();
    $options->timeout(1);
    /*
     * Generate a document with 10M payload, that should be unfriendly to the compressing function
     * and longer to process on the server side
     */
    $document = ["noise" => base64_encode(random_bytes(15_000_000))];
    $numberOfTimeouts = 0;
    while (true) {
        try {
            $collection->upsert("foo", $document, $options);
        } catch (TimeoutException $e) {
            $numberOfTimeouts++;
            if ($numberOfTimeouts > 3) {
                break;
            }
        }
    }

    The expected output should be something similar to the log below:

    [cb,WARN] (tracer L:147 I:4044469160) Orphan responses observed: {
       "count":2,
       "service":"kv",
       "top":[
          {
             "last_local_address":"127.0.0.1:55653",
             "last_local_id":"fe4cb476e16ce16b/91c55f8ed6e3f1ae",
             "last_operation_id":"0x6",
             "last_remote_address":"127.0.0.1:11210",
             "operation_name":"upsert",
             "server_us":0,
             "total_us":476244
          },
          {
             "last_local_address":"127.0.0.1:55653",
             "last_local_id":"fe4cb476e16ce16b/91c55f8ed6e3f1ae",
             "last_operation_id":"0x9",
             "last_remote_address":"127.0.0.1:11210",
             "operation_name":"upsert",
             "server_us":0,
             "total_us":173575
          }
       ]
    }

    Tracing Options

    The table below describes all the allowed properties that can be set for tracing and their default values.

    Table 1. Allowed Tracer Properties
    Key Type Default Description

    enable_tracing

    bool

    true

    Activate/deactivate end-to-end tracing.

    tracing_orphaned_queue_size

    int

    128

    Size of orphaned spans queue in default tracer.

    tracing_orphaned_queue_flush_interval

    duration

    10.0

    Flush interval for orphaned spans queue in default tracer.

    tracing_threshold_queue_size

    int

    128

    Size of threshold queue in default tracer.

    tracing_threshold_queue_flush_interval

    duration

    10.0

    Flush interval for spans with total time over threshold in default tracer.

    tracing_threshold_kv

    duration

    0.5

    Minimum time for the tracing span of KV service to be considered by threshold tracer.

    tracing_threshold_query

    duration

    1.0

    Minimum time for the tracing span of N1QL service to be considered by threshold tracer.

    tracing_threshold_view

    duration

    1.0

    Minimum time for the tracing span of VIEW service to be considered by threshold tracer.

    tracing_threshold_search

    duration

    1.0

    Minimum time for the tracing span of FTS service to be considered by threshold tracer.

    tracing_threshold_analytics

    duration

    1.0

    Minimum time for the tracing span of ANALYTICS service to be considered by threshold tracer.

    The duration properties are given in seconds with fractions after floating point (e.g. "2.5" is 2 seconds and 500 milliseconds).