Logs and Troubleshooting
This section provides information about how to diagnose and troubleshoot problems with the Couchbase Operator or your deployment.
When troubleshooting the Couchbase Operator, it is important to rule out Kubernetes itself as the root cause of the problem you are experiencing. See the Kubernetes Troubleshooting Guide for information about debugging applications within a Kubernetes cluster.
The following topics are also helpful to understand when troubleshooting the Operator:
Full Deployment Logs
The Operator is distributed with a support tool which can automatically collect resources, logs and events from the Kubernetes cluster for use in support cases. It is also capable of collecting logs from Couchbase server instances via cbcollect_info. Please see the documentation.
Operator Logs
The Couchbase Operator generates logs that can help troubleshoot your deployment. Using kubectl
or oc
, you can choose to print the Operator logs to stdout
.
Get the name of the operator pod:
$ kubectl get po -lapp=couchbase-operator
NAME READY STATUS RESTARTS AGE
couchbase-operator-1917615544-h20bm 1/1 Running 0 20h
Get the operator logs:
$ kubectl logs couchbase-operator-1917615544-h20bm
time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main
time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main
time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main
time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main
time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main
time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main
You can even use the deployment to show the logs. Since there is only one instance of the Operator in the deployment, the underlying command will automatically select the correct pod:
$ kubectl logs deployment/couchbase-operator
Get the name of the operator pod:
$ oc get po -lapp=couchbase-operator
NAME READY STATUS RESTARTS AGE
couchbase-operator-1917615544-h20bm 1/1 Running 0 20h
Get the operator logs:
$ oc logs couchbase-operator-1917615544-h20bm
time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main
time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main
time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main
time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main
time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main
time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main
You can even use the deployment to show the logs. Since there is only one instance of the Operator in the deployment, the underlying command will automatically select the correct pod:
$ oc logs deployment/couchbase-operator
Watch for the following messages which indicate that the Operator is unable to reconcile your cluster into a desired state:
-
Logs with
level=error
-
Operator is unable to get cluster state after N retries
Profiling the operator
The Couchbase operator serves profiling data on it’s default listenAddress localhost:8080
. You can access this endpoint by running a remote shell or forwarding the port to your local system.
Access Go routine stack backtraces via a shell:
$ kubectl exec -it couchbase-operator-599bcf47f-8wswh sh
$ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less
Access Go memory usage via a port forward:
$ kubectl port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
$ go tool pprof localhost:8080/debug/pprof/heap
(pprof) traces
Access Go routine stack backtraces via a shell:
$ oc exec -it couchbase-operator-599bcf47f-8wswh sh
$ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less
Access Go memory usage via a port forward:
$ oc port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
$ go tool pprof localhost:8080/debug/pprof/heap
(pprof) traces
For additional details on the Go language pprof feature please read the official documentation.
Couchbase Server Logs
In must situations the cbopinfo command will successfully allow logs to be collected and downloaded. There are some cases where collection will fail, for example if a stateful service crashes when the Operator recovers the pod continuously. In this situation as the pod is not alive for long enough to collect logs so we provide a method to manually collect logs.
The general log collection process is as follows:
-
Pause the Operator for the cluster by setting
spec.paused
totrue
-
Create a temporary pod resource with the persistent volumes mounted
-
Run the
cbcollect_info
command -
Download the logs from the pod
-
Delete the temporary pod
-
Unpause the Operator by unsetting
spec.paused
Creating a Temporary Pod
The basic template will look like the following:
---
apiVersion: v1
kind: Pod
metadata:
name: cb-example-0005
namespace: default
spec:
restartPolicy: never
containers:
- name: couchbase-server
image: couchbase/server:enterprise-6.0.1
command: '/bin/sleep'
args:
- '86400'
volumeMounts:
- mountPath: /opt/couchbase/var/lib/couchbase
name: pvc-couchbase-cb-example-0005-00-default
subPath: default
- mountPath: /opt/couchbase/etc
name: pvc-couchbase-cb-example-0005-00-default
subPath: etc
- mountPath: /mnt/data
name: pvc-couchbase-cb-example-0005-00-data
volumes:
- name: pvc-couchbase-cb-example-0005-00-default
persistentVolumeClaim:
claimName: pvc-couchbase-cb-example-0005-00-default
- name: pvc-couchbase-cb-example-0005-00-data
persistentVolumeClaim:
claimName: pvc-couchbase-cb-example-0005-00-data
The pod contains a single container running a Couchbase Server image as this contains all the necessary command line tools. We modify the container entry point to run /bin/sleep
for 86400
seconds (a day) while logs are collected and downloaded.
The associated volumes
that need to be defined for the pod can be determined by running the following command, assuming the pod you wish to collect from is cb-example-0005
:
kubectl get pvc -lcouchbase_node=cb-example-0005
Any returned volumes will need to be defined in volumes
and be correctly mounted in the pod via the volumeMounts
. volumeMounts
names refer to their corresponding entires in volumes
. The following documents the volumeMounts
required for each entry in volumes
given the returned persistent volume claims:
- pvc-couchbase-cb-example-0005-00-default
-
The
default
persistent volume claim requires twovolumeMounts
. The defaultsubPath
must be mounted at/opt/couchbase/var/lib/couchbase
. The etcsubPath
must be mounted at/opt/couchbase/etc
. - pvc-couchbase-cb-example-0005-00-data
-
If specified the
data
persistent volume claim requires a single mount involumeMounts
, and must be mounted as/mnt/data
. - pvc-couchbase-cb-example-0005-00-index
-
If specified the
index
persistent volume claim requires a single mount involumeMounts
, and must be mounted as/mnt/index
. - pvc-couchbase-cb-example-0005-00-analytics-00
-
If specified
analytics
persistent volume claims require a single mount involumeMounts
per volume, they must be mounted as/mnt/analytics-00
. If multiple analytics mounts are specified they will have different numeric suffixes e.g.pvc-couchbase-cb-example-0005-00-analytics-01
would be mounted as/mnt/analytics-01
.
Collecting & Downloading Logs
Please see the documentation for cbcollect_info
, however a typical command to run would be:
kubectl exec -ti pod/cb-example-0005 /opt/couchbase/bin/cbcollect_info /tmp/cbinfo-default-cb-example-0005-$(date +%y%m%dT%H%M%S%z)
The pod
name refers to the name given to the pod in the template. The convention for logs is cbinfo, namespace, pod name, timestamp.
Once complete the logs can be downloaded to the local host.
kubectl cp default/cb-example-0005:/tmp/cbcollectinfo-default-cb-example-0005-181005T154746+0100.zip .
See Also
Refer to the Couchbase Server Troubleshooting guide for additional information about reporting issues.