Orphaned Requests Logging

    In addition to request tracing and metrics reporting, logging orphaned requests provides additional insight into why an operation might have been cancelled (for example, because a timeout was applied by the caller).

    While tracing and metrics can also be consumed through external interfaces, getting information about orphaned requests only works through the built-in mechanisms.

    The way it works is that every time a response is in the process of being completed, when the SDK detects that the original caller is not listening anymore (beacuse the future was dropped), it will send this "orphaned" response to a reporting utility which aggregates all responses and in regular intervals logs them in a specific format.

    When you spot that an operation Future has been dropped (e.g. due to a caller-side timeout), you can look for the output of the OrphanReporter and correlate the information.

    Output Format

    Since orphans usually indicate a state that is not desirable, The OrphanReporter logs orphaned requests at the WARN level, which is going to be logged alongside all other SDK logs. By default, orphans will be aggregated and logged every 10 seconds, but the output will be skipped if there are no orphans to report. This makes sure that the log line will appear close to the point in time when the corresponding Future was dropped, while not spamming the log file if there is nothing to report. See the next section on how to customize this behavior.

    The actual body of the message consists of the text Orphaned responses observed:, followed by a compact JSON representation of the aggregated orphans. The following code snippet shows a prettified version of such a JSON blob:

    {
      "kv": {
        "total_count": 3,
        "top_requests": [
          {
            "last_server_duration_us": 8000,
            "total_server_duration_us": 8000,
            "operation_name": "Get",
            "last_local_id": "66388CF5BFCF7522/18CC8791579B567C",
            "operation_id": "0x3",
            "last_local_socket": "10.211.55.3:52450",
            "last_remote_socket": "10.112.180.101:11210"
          },
          {
            "last_server_duration_us": 1200,
            "total_server_duration_us": 1200,
            "operation_name": "Get",
            "last_local_id": "66388CF5BFCF7522/18CC8791579B567C",
            "operation_id": "0x1",
            "last_local_socket": "10.211.55.3:52450",
            "last_remote_socket": "10.112.180.101:11210"
          }
        ]
      }
    }

    The overall structure groups orphans by service (currently only kv), and for each service reports the total count of orphaned operations seen during the interval, along with the top sampled requests sorted by server duration (descending).

    Table 1. JSON Output Format Descriptions
    Property Description

    total_count

    Total number of orphaned operations observed during the reporting interval.

    top_requests

    Array of the top sampled orphaned requests, ordered by server duration (descending).

    last_server_duration_us

    The server duration in microseconds from the last dispatch, if present.

    total_server_duration_us

    The total accumulated server duration in microseconds across all dispatches.

    operation_name

    The name of the operation (e.g. Get, Upsert).

    last_local_id

    The connection ID associated with the last dispatch, if present.

    operation_id

    The operation ID (opaque value for Key/Value operations).

    last_local_socket

    The local socket address of the last dispatch, if present.

    last_remote_socket

    The remote socket address of the last dispatch, if present.

    If a field is not available, it will not be included in the output.

    Configuration

    The orphan logger can be configured through the OrphanReporterOptions in the ClusterOptions.

    The following properties can be configured:

    Table 2. OrphanReporterOptions Properties
    Property Default Description

    enabled

    true

    Whether the orphan reporter is enabled.

    reporter_interval

    10 seconds

    The interval at which accumulated orphans are emitted.

    sample_size

    10

    The maximum number of top orphaned requests to retain and report per interval.