Manage Couchbase Server Logging

The Autonomous Operator can be configured to manage certain aspects of Couchbase Server logging, and comes with tools for collecting Couchbase Server logs.

Overview

The Couchbase Server application records important events, and saves the details to a variety of log files. These logs are distinct from the logs that are generated by the Autonomous Operator itself.

Logging is performed continuously within each Couchbase Server container in a Couchbase deployment. When using persistent volumes — as is recommended for all production deployments — log files are written to either the default or logs volume.

Couchbase Server manages logging automatically and uses default settings for things like logging level, file size, and rotation/expiration. The one exception is audit.log. This is a special log file, used to manage cluster-security, and is handled separately from the other log files. For information on audit.log, refer to Configuring Audit Logging.

Configuring Audit Logging

Couchbase Server audit logging is a special type of logging that is not enabled by default. When audit logging is enabled, Couchbase Server begins recording information on who has performed what action, when, and how successfully.

Audit records are written as JSON documents to a default file, named audit.log, which is stored alongside all other Couchbase Server log files in the default or logs volume that is attached to the pod. After a specified period time, or once the file reaches a specified size (whichever happens first), the file is closed, and is saved under a modified name that features a timestamp corresponding to the time of saving. A new, empty audit.log file is created and saved when a new audit event is generated.

There are two approaches you can choose when it comes to implementing audit logging: managed and automated.

Couchbase Server rotates audit logs, but never expires or deletes them. This is by design, as Couchbase Server intentionally has no facility to modify or delete an audit log file once it has been rotated. As a result, it is the explicit responsibility of the administrator to implement a policy for expiring and/or moving audit logs to a different storage location. Without active intervention, rotated audit logs will eventually consume all available storage, leading to node and cluster failures.

About Managed Audit Logging

Managed audit logging involves the administrator directly enabling audit logging in Couchbase Server, and requires that the administrator actively manage the resultant audit log files. Managed audit logging can be implemented after the Couchbase cluster has been successfully deployed by the Autonomous Operator. Once the cluster is deployed, the administrator can enable and configure audit logging directly through the Couchbase UI, CLI, or REST API. Refer to Manage Auditing in the Couchbase Server documentation for more information.

Once audit logging is enabled, it’s up to the administrator to manage the resultant audit.log files. It is expected that the administrator will implement an automated system for expiring and/or moving audit logs to a different storage location subject to a retention policy.

Configuring Automated Audit Logging

Automated audit logging requires that logs be written to a persistent volume (i.e. the Couchbase deployment’s default or logs volumes are backed by persistent storage). Fully-ephemeral clusters are not supported by this feature.

Automated audit logging involves having the Autonomous Operator handle the audit log configuration and optionally manage the resultant audit log files. An audit logging configuration can be specified in the CouchbaseCluster resource specification, allowing the Autonomous Operator to set up audit logging in Couchbase Server, and optionally manage the resultant audit log files.

The required configuration parameters for enabling audit logging are described in the example below. Specified values represent the defaults for their respective fields unless otherwise noted in a callout. (The Autonomous Operator will set the default values for any fields that are not specified by the user.)

apiVersion: couchbase.com/v2
kind: CouchbaseCluster
spec:
  logging:
    audit:
      enabled: true (1)
      disabledEvents: (2)
        - 8243
      disabledUsers: (3)
        - "@eventing/local"
        - "@cbq-engine/local"
      rotation:
        interval: “15m”
        size: “20Mi”
      garbageCollection:
        sidecar:
          enabled: true (4)
          image: "busybox:1.32.1" (5)
          age: “1h”
          interval: “20m”
          resources:
            requests:
              cpu:
              memory:

1	`couchbaseclusters.spec.logging.audit.enabled`: Setting this field to `true` enables audit logging on the Couchbase cluster. This field defaults to `false`. This is technically the only field that is required to configure audit logging.
2	`couchbaseclusters.spec.logging.audit.disabledEvents`: This field can be set to an array of one or more filterable event id integers that will be disabled for auditing purposes. This field normally defaults to an empty array.
3	`couchbaseclusters.spec.logging.audit.disabledUsers`: This field can be set to an array of one or more filterable user strings that will be disabled for auditing purposes. This field normally defaults to an empty array.
4	`couchbaseclusters.spec.logging.audit.garbageCollection.sidecar.enabled`: Setting this field to `true` enables garbage collection of rotated audit logs. This field defaults to `false`. This is technically the only field that is required to configure garbage collection of rotated audit logs. Note, however, that garbage collection can only be enabled if `couchbaseclusters.spec.logging.audit.enabled` is also set to `true`.
5	`couchbaseclusters.spec.logging.audit.garbageCollection.sidecar.image`: This is the base image of the sidecar helper container that will be added to the server pods for handling the cleanup for rotated logs. This sidecar is a standard Linux container that only needs to find and remove files of the appropriate name and age. Be aware that there are security concerns with using a standard Linux image, such as the potential for arbitrary shell execution and write-access to the volumes, which can potentially be exploited by a malicious image. In order to limit potential abuse, the garbage collection sidecar uses a sub-path to only mount the logs directory. In addition, the commands run by the garbage collector are not configurable, and the filenames of removed logs are printed to the garbage collector’s standard output.

After enabling automated audit logging, you should take care only to use the CouchbaseCluster resource specification for making further modifications to the audit logging configuration. Manual changes that are made to the configuration via the Couchbase UI, CLI, or REST API (such as changing the audit log directory) are not prevented by the Autonomous Operator, and can cause audit logging failures.

Changing the location of the audit log is not supported, as it would break the ability for the Autonomous Operator to forward audit logs.

Collecting Logs

The Autonomous Operator package is distributed with a support tool — cao — which can be used to collect logs from Couchbase Server deployments. The cao tool performs explicit logging, which means it captures a snapshot of the current logs at the time the tool is was run. Explicit logging can either be performed for all nodes in the cluster, or for one or more individual nodes. The results are saved as zip files: each zip file contains the log-data generated for an individual node.

Note, however, there are some limitations to be aware of when collecting logs with cao:

Only active, non-rotated log files are collected.
cao requires pods/exec permissions in order to execute log collection scripts and run the cbcollect_info command locally on each pod in a Couchbase Server deployment, which may not be desirable for security and performance reasons.

To avoid these limitations, you can choose to configure log forwarding as an alternative method for collecting logs.

Collecting Logs with `cao`

When run without any flags or options, the cao tool collects a filtered list of the Kubernetes resources associated with the Autonomous Operator in a given namespace. However, to also collect logs from Couchbase Server deployments, the --collectinfo flag is required.

When cao is run unscoped with the --collectinfo flag, it will look for logs from all Couchbase Server deployments that are managed by the Autonomous Operator. However, you can scope the command to a particular cluster in order to look for just the logs from that cluster.

Run the following command to begin the log collection process for the Couchbase Server deployment named cb-example:

$ cao collect-logs --couchbase-cluster cb-example

Note that no logs have been downloaded yet. Instead, an interactive log collection menu opens on the command line.

Detected resources for log collection:
┌────┬─────────────────┬──────┬──────────┐
│ ID │ Pod             │ Type │ Detached │ (1)
├────┼─────────────────┼──────┼──────────┤
│ 0  │ cb-example-0001 │ pod  │          │
│ 1  │ cb-example-0003 │ pod  │          │
│ 2  │ cb-example-0004 │ pod  │          │
└────┴─────────────────┴──────┴──────────┘
Please select volumes to collect [e.g. 1,2,5-6 or leave blank for all]: 1-2 (2)
Please select whether to enable log redaction [Y/n]: (3)
Server logs downloaded to the following files:
    cbinfo-default-cb-example-0002-20181016T131730+0100-redacted.zip (4)
    cbinfo-default-cb-example-0001-20181016T131730+0100-redacted.zip
Wrote cluster information to cbopinfo-20181016T131730+0100.tar.gz

The table lists the pods that have associated log volumes, along with the following information:

ID is an arbitrary identification number that is used for specifying log volumes in later steps.
Pod is the name of the pod associated with the log volume.
Type specifies whether the logs are from a running pod or from an orphaned (detached) persistentVolumeClaim.
If Type is persistentVolumeClaim, then the Detached field will display the timestamp of when the log volume was detected as orphaned from its pod.

Before attempting to collect logs from a detached PersistentVolumeClaim, please review About Collecting Logs from Detached Volumes.

2 Collect the logs from a limited set of log volumes by specifying a comma-separate list of `ID`s from the table. Leaving this field blank will collect the logs from all of the listed log volumes.

3 Specify whether partial redaction should be applied to the collected logs. As an added security measure, this field defaults to yes when left blank.

4 The logs are downloaded to the current working directory once collection is complete.

About Collecting Logs from Detached Volumes

The cao tool collects logs from all log volumes in a Couchbase Server deployment, even PersistentVolumeClaims (PVCs) that are detached from pods. Detached PVCs can occur more commonly when running ephemeral clusters.

When a detached PVC is encountered, cao will automatically create a temporary Couchbase Server pod, mount the log volume to it, and then run cbcollect_info to collect the logs. Once the logs have been downloaded, the Autonomous Operator will delete the temporary pod (but will not delete the PVC).

The cao tool uses a default Couchbase Server container image when creating the temporary pod. However, this container image may not match the version of the Couchbase Server container that the PVC was previously attached to. To avoid compatibility issues when collecting logs from detached PVCs, make sure to use the --server-image flag to specify a matching Couchbase Server container image when running cao. For a Couchbase Server deployment named cb-example, the command would resemble the following:

$ cao collect-logs --collectinfo cb-example --server-image couchbase/server:7.1.3

It should be noted that detached PVCs can sometimes be caused by more serious issues. Some of these issues may also cause cao to encounter errors while attempting to collect and download logs from detached PVCs. If you encounter such errors, you can try to work around the issue by manually collecting the logs.

Manually Collecting Logs from Detached Volumes

In most situations, the cao tool will allow logs to be collected and downloaded successfully. Even in cases where logs are found to exist on detached PersistentVolumeClaims (PVCs), cao will automatically attempt to collect them by creating temporary pods that mount the PVCs.

However, a case may arise where cao fails to collect logs from a detached PVC. For example, if a stateful service — such as Data, Index, or Analytics — were to crash, and then continue to crash each time the Autonomous Operator recovered the pod, cao would unlikely be able to collect the Couchbase Server logs from the pod without manual intervention. This is because in this hypothetical situation, the pod does not remain alive long enough for cao to perform normal log collection; cao also doesn’t see the PVC as detached, and therefore doesn’t automatically create a temporary pod to collect the logs as it normally would for a detached PVC.

The general process for manual log collection is as follows:

Pause the Autonomous Operator’s management of the Couchbase cluster by setting couchbaseclusters.spec.paused to true.

Pausing management of the Couchbase cluster serves two purposes. The first purpose is that it will prevent the Autonomous Operator from attempting to recover the malfunctioning pod after Kubernetes has killed it. The second purpose is that it allows you to create a temporary replacement pod and mount the the detached PVCs to it without the Autonomous Operator interfering.

Create a temporary pod resource with the persistent volumes mounted. The basic template will look like the following:

apiVersion: v1
kind: Pod
metadata:
  name: cb-example-0005
  namespace: default
spec:
  restartPolicy: never
  containers:
  - name: couchbase-server
    image: couchbase/server:7.1.3 (1)
    command: '/bin/sleep' (2)
    args:
    - '86400'
    volumeMounts: (3)
    - mountPath: /opt/couchbase/var/lib/couchbase (4)
      name: pvc-couchbase-cb-example-0005-00-default
      subPath: default
    - mountPath: /opt/couchbase/etc (4)
      name: pvc-couchbase-cb-example-0005-00-default
      subPath: etc
    - mountPath: /mnt/data (5)
      name: pvc-couchbase-cb-example-0005-00-data
    - mountPath: /mnt/index (6)
      name: pvc-couchbase-cb-example-0005-00-index
    - mountPath: /mnt/analytics-00 (7)
      name: pvc-couchbase-cb-example-0005-00-analytics-00
    - mountPath: /mnt/analytics-01 (7)
      name: pvc-couchbase-cb-example-0005-00-analytics-01
  volumes:
    - name: pvc-couchbase-cb-example-0005-00-default (4)
      persistentVolumeClaim:
        claimName: pvc-couchbase-cb-example-0005-00-default
    - name: pvc-couchbase-cb-example-0005-00-data (5)
        persistentVolumeClaim:
          claimName: pvc-couchbase-cb-example-0005-00-data
    - name: pvc-couchbase-cb-example-0005-00-index (6)
        persistentVolumeClaim:
          claimName: pvc-couchbase-cb-example-0005-00-index
    - name: pvc-couchbase-cb-example-0005-00-analytics-00 (7)
        persistentVolumeClaim:
          claimName: pvc-couchbase-cb-example-0005-00-analytics-00
    - name: pvc-couchbase-cb-example-0005-00-analytics-01 (7)
        persistentVolumeClaim:
          claimName: pvc-couchbase-cb-example-0005-00-analytics-01

1	The pod contains only the Couchbase Server container. It’s necessary to run the Couchbase Server container image since it contains the necessary command-line tools, namely `cbcollect_info`.
2	The container entry point is modified to run `/bin/sleep` for `86400` seconds (a day) while logs are collected and downloaded.
3	Run the following command to get the list of PVCs associated with the malfunctioning Couchbase Server pod (in this case, `cb-example-0005`): `$ kubectl get pvc -lcouchbase_node=cb-example-0005` Any returned PVCs will need to be defined in the temporary pod’s `volumes`, and correctly mounted in the pod via `volumeMounts`. The `volumeMounts` names refer to their corresponding entries in `volumes`.
4	`pvc-couchbase-cb-example-0005-00-default`: The `default` PVC needs to be defined in the temporary pod’s `volumes`, and requires two mounts in `volumeMounts`: `subPath: default` must be mounted at `/opt/couchbase/etc` `subPath: etc` must be mounted at `/opt/couchbase/var/lib/couchbase`
5	`pvc-couchbase-cb-example-0005-00-data`: A `data` PVC, if returned, needs to be defined in the temporary pod’s `volumes`, and requires a single mount in `volumeMounts` that must be mounted as `/mnt/data`.
6	`pvc-couchbase-cb-example-0005-00-index`: An `index` PVC, if returned, needs to be defined in the temporary pod’s `volumes`, and requires a single mount in `volumeMounts` that must be mounted as `/mnt/index`.
7	`pvc-couchbase-cb-example-0005-00-analytics-00`: An `analytics` PVC, if returned, needs to be defined in the temporary pod’s `volumes`, and requires a single mount in `volumeMounts` that must be mounted as `/mnt/analytics-00`. Note, however, that if multiple analytics PVCs are returned, they will have different numeric suffixes. Each unique `analytics` PVC that is returned needs to be defined in `volumes`, with each requiring its own individual mount in `volumeMounts`. For example: `pvc-couchbase-cb-example-0005-00-analytics-01` would be mounted as `/mnt/analytics-01`.

Once the temporary pod is created (with the PVCs correctly mounted), run cbcollect_info to collect the logs. A typical command would resemble the following:
```
$ kubectl exec -ti pod/cb-example-0005 /opt/couchbase/bin/cbcollect_info /tmp/cbinfo-default-cb-example-0005-$(date +%y%m%dT%H%M%S%z)
```
The pod/cb-example-0005 name refers to the name given to the temporary pod in the example configuration from the previous step. The convention for logs is cbinfo, namespace, pod name, timestamp.

Once the logs have finished being collected, download them to localhost.

$ kubectl cp default/cb-example-0005:/tmp/cbcollectinfo-default-cb-example-0005-181005T154746+0100.zip .

After successfully downloading the logs, make sure to delete the temporary pod.
```
$ kubectl delete pod cb-example-0005
```
Resume the Autonomous Operator’s management of the Couchbase cluster either by removing couchbaseclusters.spec.paused or setting it to false.