Manage Couchbase Server Logging
The Autonomous Operator can be configured to manage certain aspects of Couchbase Server logging, and comes with tools for collecting Couchbase Server logs.
Overview
The Couchbase Server application records important events, and saves the details to a variety of log files. These logs are distinct from the logs that are generated by the Autonomous Operator itself.
Logging is performed continuously within each Couchbase Server container in a Couchbase deployment.
When using persistent volumes — as is recommended for all production deployments — log files are written to either the default
or logs
volume.
Couchbase Server manages logging automatically and uses default settings for things like logging level, file size, and rotation/expiration.
The one exception is audit.log
.
This is a special log file, used to manage cluster-security, and is handled separately from the other log files.
For information on audit.log
, refer to Configuring Audit Logging.
Configuring Audit Logging
Couchbase Server audit logging is a special type of logging that is not enabled by default. When audit logging is enabled, Couchbase Server begins recording information on who has performed what action, when, and how successfully.
Audit records are written as JSON documents to a default file, named audit.log
, which is stored alongside all other Couchbase Server log files in the default
or logs
volume that is attached to the pod.
After a specified period time, or once the file reaches a specified size (whichever happens first), the file is closed, and is saved under a modified name that features a timestamp corresponding to the time of saving.
A new, empty audit.log
file is created and saved when a new audit event is generated.
There are two approaches you can choose when it comes to implementing audit logging: managed and automated.
Couchbase Server rotates audit logs, but never expires or deletes them. This is by design, as Couchbase Server intentionally has no facility to modify or delete an audit log file once it has been rotated. As a result, it is the explicit responsibility of the administrator to implement a policy for expiring and/or moving audit logs to a different storage location. Without active intervention, rotated audit logs will eventually consume all available storage, leading to node and cluster failures. |
About Managed Audit Logging
Managed audit logging involves the administrator directly enabling audit logging in Couchbase Server, and requires that the administrator actively manage the resultant audit log files. Managed audit logging can be implemented after the Couchbase cluster has been successfully deployed by the Autonomous Operator. Once the cluster is deployed, the administrator can enable and configure audit logging directly through the Couchbase UI, CLI, or REST API. Refer to Manage Auditing in the Couchbase Server documentation for more information.
Once audit logging is enabled, it’s up to the administrator to manage the resultant audit.log
files.
It is expected that the administrator will implement an automated system for expiring and/or moving audit logs to a different storage location subject to a retention policy.
Configuring Automated Audit Logging
Automated audit logging requires that logs be written to a persistent volume (i.e. the Couchbase deployment’s default or logs volumes are backed by persistent storage).
Fully-ephemeral clusters are not supported by this feature.
|
Automated audit logging involves having the Autonomous Operator handle the audit log configuration and optionally manage the resultant audit log files.
An audit logging configuration can be specified in the CouchbaseCluster
resource specification, allowing the Autonomous Operator to set up audit logging in Couchbase Server, and optionally manage the resultant audit log files.
The required configuration parameters for enabling audit logging are described in the example below. Specified values represent the defaults for their respective fields unless otherwise noted in a callout. (The Autonomous Operator will set the default values for any fields that are not specified by the user.)
apiVersion: couchbase.com/v2
kind: CouchbaseCluster
spec:
logging:
audit:
enabled: true (1)
disabledEvents: (2)
- 8243
disabledUsers: (3)
- "@eventing/local"
- "@cbq-engine/local"
rotation:
interval: “15m”
size: “20Mi”
garbageCollection:
sidecar:
enabled: true (4)
image: "busybox:1.32.1" (5)
age: “1h”
interval: “20m”
resources:
requests:
cpu:
memory:
1 | couchbaseclusters.spec.logging.audit.enabled : Setting this field to true enables audit logging on the Couchbase cluster.
This field defaults to false .
This is technically the only field that is required to configure audit logging. |
2 | couchbaseclusters.spec.logging.audit.disabledEvents : This field can be set to an array of one or more filterable event id integers that will be disabled for auditing purposes.
This field normally defaults to an empty array. |
3 | couchbaseclusters.spec.logging.audit.disabledUsers : This field can be set to an array of one or more filterable user strings that will be disabled for auditing purposes.
This field normally defaults to an empty array. |
4 | couchbaseclusters.spec.logging.audit.garbageCollection.sidecar.enabled : Setting this field to true enables garbage collection of rotated audit logs.
This field defaults to false .
This is technically the only field that is required to configure garbage collection of rotated audit logs.
Note, however, that garbage collection can only be enabled if |
5 | couchbaseclusters.spec.logging.audit.garbageCollection.sidecar.image : This is the base image of the sidecar helper container that will be added to the server pods for handling the cleanup for rotated logs.
This sidecar is a standard Linux container that only needs to find and remove files of the appropriate name and age.
Be aware that there are security concerns with using a standard Linux image, such as the potential for arbitrary shell execution and write-access to the volumes, which can potentially be exploited by a malicious image. In order to limit potential abuse, the garbage collection sidecar uses a sub-path to only mount the logs directory. In addition, the commands run by the garbage collector are not configurable, and the filenames of removed logs are printed to the garbage collector’s standard output. |
After enabling automated audit logging, you should take care only to use the CouchbaseCluster
resource specification for making further modifications to the audit logging configuration.
Manual changes that are made to the configuration via the Couchbase UI, CLI, or REST API (such as changing the audit log directory) are not prevented by the Autonomous Operator, and can cause audit logging failures.
Changing the location of the audit log is not supported, as it would break the ability for the Autonomous Operator to forward audit logs. |
Collecting Logs
The Autonomous Operator package is distributed with a support tool — cao
— which can be used to collect logs from Couchbase Server deployments.
The cao
tool performs explicit logging, which means it captures a snapshot of the current logs at the time the tool is was run.
Explicit logging can either be performed for all nodes in the cluster, or for one or more individual nodes.
The results are saved as zip files: each zip file contains the log-data generated for an individual node.
Note, however, there are some limitations to be aware of when collecting logs with cao
:
-
Only active, non-rotated log files are collected.
-
cao
requirespods/exec
permissions in order to execute log collection scripts and run thecbcollect_info
command locally on each pod in a Couchbase Server deployment, which may not be desirable for security and performance reasons.
To avoid these limitations, you can choose to configure log forwarding as an alternative method for collecting logs.
Collecting Logs with cao
When run without any flags or options, the cao
tool collects a filtered list of the Kubernetes resources associated with the Autonomous Operator in a given namespace.
However, to also collect logs from Couchbase Server deployments, the --collectinfo
flag is required.
When cao
is run unscoped with the --collectinfo
flag, it will look for logs from all Couchbase Server deployments that are managed by the Autonomous Operator.
However, you can scope the command to a particular cluster in order to look for just the logs from that cluster.
Run the following command to begin the log collection process for the Couchbase Server deployment named cb-example
:
$ cao collect-logs --couchbase-cluster cb-example
Note that no logs have been downloaded yet. Instead, an interactive log collection menu opens on the command line.
Detected resources for log collection: ┌────┬─────────────────┬──────┬──────────┐ │ ID │ Pod │ Type │ Detached │ (1) ├────┼─────────────────┼──────┼──────────┤ │ 0 │ cb-example-0001 │ pod │ │ │ 1 │ cb-example-0003 │ pod │ │ │ 2 │ cb-example-0004 │ pod │ │ └────┴─────────────────┴──────┴──────────┘ Please select volumes to collect [e.g. 1,2,5-6 or leave blank for all]: 1-2 (2) Please select whether to enable log redaction [Y/n]: (3) Server logs downloaded to the following files: cbinfo-default-cb-example-0002-20181016T131730+0100-redacted.zip (4) cbinfo-default-cb-example-0001-20181016T131730+0100-redacted.zip Wrote cluster information to cbopinfo-20181016T131730+0100.tar.gz
1 | The table lists the pods that have associated log volumes, along with the following information:
|
||
2 | Collect the logs from a limited set of log volumes by specifying a comma-separate list of `ID`s from the table. Leaving this field blank will collect the logs from all of the listed log volumes. | ||
3 | Specify whether partial redaction should be applied to the collected logs.
As an added security measure, this field defaults to yes when left blank. |
||
4 | The logs are downloaded to the current working directory once collection is complete. |
Manually Collecting Logs from Detached Volumes
In most situations, the cao
tool will allow logs to be collected and downloaded successfully.
Even in cases where logs are found to exist on detached PersistentVolumeClaims (PVCs), cao
will automatically attempt to collect them by creating temporary pods that mount the PVCs.
However, a case may arise where cao
fails to collect logs from a detached PVC.
For example, if a stateful service — such as Data, Index, or Analytics — were to crash, and then continue to crash each time the Autonomous Operator recovered the pod, cao
would unlikely be able to collect the Couchbase Server logs from the pod without manual intervention.
This is because in this hypothetical situation, the pod does not remain alive long enough for cao
to perform normal log collection; cao
also doesn’t see the PVC as detached, and therefore doesn’t automatically create a temporary pod to collect the logs as it normally would for a detached PVC.
The general process for manual log collection is as follows:
-
Pause the Autonomous Operator’s management of the Couchbase cluster by setting
couchbaseclusters.spec.paused
totrue
.Pausing management of the Couchbase cluster serves two purposes. The first purpose is that it will prevent the Autonomous Operator from attempting to recover the malfunctioning pod after Kubernetes has killed it. The second purpose is that it allows you to create a temporary replacement pod and mount the the detached PVCs to it without the Autonomous Operator interfering.
-
Create a temporary pod resource with the persistent volumes mounted. The basic template will look like the following:
apiVersion: v1 kind: Pod metadata: name: cb-example-0005 namespace: default spec: restartPolicy: never containers: - name: couchbase-server image: couchbase/server:7.2.0 (1) command: '/bin/sleep' (2) args: - '86400' volumeMounts: (3) - mountPath: /opt/couchbase/var/lib/couchbase (4) name: pvc-couchbase-cb-example-0005-00-default subPath: default - mountPath: /opt/couchbase/etc (4) name: pvc-couchbase-cb-example-0005-00-default subPath: etc - mountPath: /mnt/data (5) name: pvc-couchbase-cb-example-0005-00-data - mountPath: /mnt/index (6) name: pvc-couchbase-cb-example-0005-00-index - mountPath: /mnt/analytics-00 (7) name: pvc-couchbase-cb-example-0005-00-analytics-00 - mountPath: /mnt/analytics-01 (7) name: pvc-couchbase-cb-example-0005-00-analytics-01 volumes: - name: pvc-couchbase-cb-example-0005-00-default (4) persistentVolumeClaim: claimName: pvc-couchbase-cb-example-0005-00-default - name: pvc-couchbase-cb-example-0005-00-data (5) persistentVolumeClaim: claimName: pvc-couchbase-cb-example-0005-00-data - name: pvc-couchbase-cb-example-0005-00-index (6) persistentVolumeClaim: claimName: pvc-couchbase-cb-example-0005-00-index - name: pvc-couchbase-cb-example-0005-00-analytics-00 (7) persistentVolumeClaim: claimName: pvc-couchbase-cb-example-0005-00-analytics-00 - name: pvc-couchbase-cb-example-0005-00-analytics-01 (7) persistentVolumeClaim: claimName: pvc-couchbase-cb-example-0005-00-analytics-01
1 The pod contains only the Couchbase Server container. It’s necessary to run the Couchbase Server container image since it contains the necessary command-line tools, namely cbcollect_info
.2 The container entry point is modified to run /bin/sleep
for86400
seconds (a day) while logs are collected and downloaded.3 Run the following command to get the list of PVCs associated with the malfunctioning Couchbase Server pod (in this case, cb-example-0005
):$ kubectl get pvc -lcouchbase_node=cb-example-0005
Any returned PVCs will need to be defined in the temporary pod’s
volumes
, and correctly mounted in the pod viavolumeMounts
. ThevolumeMounts
names refer to their corresponding entries involumes
.4 pvc-couchbase-cb-example-0005-00-default
: Thedefault
PVC needs to be defined in the temporary pod’svolumes
, and requires two mounts involumeMounts
:-
subPath: default
must be mounted at/opt/couchbase/etc
-
subPath: etc
must be mounted at/opt/couchbase/var/lib/couchbase
5 pvc-couchbase-cb-example-0005-00-data
: Adata
PVC, if returned, needs to be defined in the temporary pod’svolumes
, and requires a single mount involumeMounts
that must be mounted as/mnt/data
.6 pvc-couchbase-cb-example-0005-00-index
: Anindex
PVC, if returned, needs to be defined in the temporary pod’svolumes
, and requires a single mount involumeMounts
that must be mounted as/mnt/index
.7 pvc-couchbase-cb-example-0005-00-analytics-00
: Ananalytics
PVC, if returned, needs to be defined in the temporary pod’svolumes
, and requires a single mount involumeMounts
that must be mounted as/mnt/analytics-00
. Note, however, that if multiple analytics PVCs are returned, they will have different numeric suffixes. Each uniqueanalytics
PVC that is returned needs to be defined involumes
, with each requiring its own individual mount involumeMounts
. For example:pvc-couchbase-cb-example-0005-00-analytics-01
would be mounted as/mnt/analytics-01
. -
-
Once the temporary pod is created (with the PVCs correctly mounted), run
cbcollect_info
to collect the logs. A typical command would resemble the following:$ kubectl exec -ti pod/cb-example-0005 /opt/couchbase/bin/cbcollect_info /tmp/cbinfo-default-cb-example-0005-$(date +%y%m%dT%H%M%S%z)
The
pod/cb-example-0005
name refers to the name given to the temporary pod in the example configuration from the previous step. The convention for logs iscbinfo
, namespace, pod name, timestamp. -
Once the logs have finished being collected, download them to localhost.
$ kubectl cp default/cb-example-0005:/tmp/cbcollectinfo-default-cb-example-0005-181005T154746+0100.zip .
-
After successfully downloading the logs, make sure to delete the temporary pod.
$ kubectl delete pod cb-example-0005
-
Resume the Autonomous Operator’s management of the Couchbase cluster either by removing
couchbaseclusters.spec.paused
or setting it tofalse
.