Logs and Troubleshooting

This section provides information about how to diagnose and troubleshoot problems with the Couchbase Operator or your deployment.

When troubleshooting the Couchbase Operator, it is important to rule out Kubernetes itself as the root cause of the problem you are experiencing. See the Kubernetes Troubleshooting Guide for information about debugging applications within a Kubernetes cluster.

The following topics are also helpful to understand when troubleshooting the Operator:

Full Deployment Logs

The Operator is distributed with a support tool which can automatically collect resources, logs and events from the Kubernetes cluster for use in support cases. It is also capable of collecting logs from Couchbase server instances via cbcollect_info. Please see the documentation.

Operator Logs

The Couchbase Operator generates logs that can help troubleshoot your deployment. Using kubectl or oc, you can choose to print the Operator logs to stdout.

  • Kubernetes

  • OpenShift

Get the name of the operator pod:

$ kubectl get po -lapp=couchbase-operator
NAME                                  READY     STATUS    RESTARTS   AGE
couchbase-operator-1917615544-h20bm   1/1       Running   0          20h

Get the operator logs:

$ kubectl logs couchbase-operator-1917615544-h20bm
time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main
time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main
time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main
time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main
time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main
time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main

You can even use the deployment to show the logs as the name is not auto-generated, the underlying command will automatically select a pod, in this case we only allow one so will behave correctly:

$ kubectl logs deployment/couchbase-operator

Get the name of the operator pod:

$ oc get po -lapp=couchbase-operator
NAME                                  READY     STATUS    RESTARTS   AGE
couchbase-operator-1917615544-h20bm   1/1       Running   0          20h

Get the operator logs:

$ oc logs couchbase-operator-1917615544-h20bm
time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main
time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main
time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main
time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main
time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main
time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main

You can even use the deployment to show the logs as the name is not auto-generated, the underlying command will automatically select a pod, in this case we only allow one so will behave correctly:

$ oc logs deployment/couchbase-operator

Watch for the following messages which indicate that the Operator is unable to reconcile your cluster into a desired state:

  • Logs with level=error

  • Operator is unable to get cluster state after N retries

Profiling the operator

The Couchbase operator serves profiling data on it’s default listenAddress localhost:8080. You can access this endpoint by running a remote shell or forwarding the port to your local system.

  • Kubernetes

  • OpenShift

Access Go routine stack backtraces via a shell:

$ kubectl exec -it couchbase-operator-599bcf47f-8wswh sh
$ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less

Access Go memory usage via a port forward:

$ kubectl port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
$ go tool pprof localhost:8080/debug/pprof/heap
(pprof) traces

Access Go routine stack backtraces via a shell:

$ oc exec -it couchbase-operator-599bcf47f-8wswh sh
$ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less

Access Go memory usage via a port forward:

$ oc port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
$ go tool pprof localhost:8080/debug/pprof/heap
(pprof) traces

For additional details on the Go language pprof feature please read the official documentation.

Couchbase Server Logs

In must situations the cbopinfo command will successfully allow logs to be collected and downloaded. There are some cases where collection will fail, for example if a stateful service crashes when the Operator recovers the pod continuously. In this situation as the pod is not alive for long enough to collect logs so we provide a method to manually collect logs.

The general log collection process is as follows:

  1. Pause the Operator for the cluster by setting spec.paused to true

  2. Create a temporary pod resource with the persistent volumes mounted

  3. Run the cbcollect_info command

  4. Download the logs from the pod

  5. Delete the temporary pod

  6. Unpause the Operator by unsetting spec.paused

Creating a Temporary Pod

The basic template will look like the following:

---
apiVersion: v1
kind: Pod
metadata:
  name: cb-example-0005
  namespace: default
spec:
  restartPolicy: never
  containers:
  - name: couchbase-server
    image: couchbase/server:enterprise-5.5.2
    command: '/bin/sleep'
    args:
    - '86400'
    volumeMounts:
    - mountPath: /opt/couchbase/var/lib/couchbase
      name: pvc-couchbase-cb-example-0005-00-default
      subPath: default
    - mountPath: /opt/couchbase/etc
      name: pvc-couchbase-cb-example-0005-00-default
      subPath: etc
    - mountPath: /mnt/data
      name: pvc-couchbase-cb-example-0005-00-data
  volumes:
    - name: pvc-couchbase-cb-example-0005-00-default
      persistentVolumeClaim:
        claimName: pvc-couchbase-cb-example-0005-00-default
    - name: pvc-couchbase-cb-example-0005-00-data
        persistentVolumeClaim:
          claimName: pvc-couchbase-cb-example-0005-00-data

The pod contains a single container running a Couchbase Server image as this contains all the necessary command line tools. We modify the container entry point to run /bin/sleep for 86400 seconds (a day) while logs are collected and downloaded.

The associated volumes that need to be defined for the pod can be determined by running the following command, assuming the pod you wish to collect from is cb-example-0005:

kubectl get pvc -lcouchbase_node=cb-example-0005

Any returned volumes will need to be defined in volumes and be correctly mounted in the pod via the volumeMounts. volumeMounts names refer to their corresponding entires in volumes. The following documents the volumeMounts required for each entry in volumes given the returned persistent volume claims:

pvc-couchbase-cb-example-0005-00-default

The default persistent volume claim requires two volumeMounts. The default subPath must be mounted at /opt/couchbase/var/lib/couchbase. The etc subPath must be mounted at /opt/couchbase/etc.

pvc-couchbase-cb-example-0005-00-data

If specified the data persistent volume claim requires a single mount in volumeMounts, and must be mounted as /mnt/data.

pvc-couchbase-cb-example-0005-00-index

If specified the index persistent volume claim requires a single mount in volumeMounts, and must be mounted as /mnt/index.

pvc-couchbase-cb-example-0005-00-analytics-00

If specified analytics persistent volume claims require a single mount in volumeMounts per volume, they must be mounted as /mnt/analytics-00. If multiple analytics mounts are specified they will have different numeric suffixes e.g. pvc-couchbase-cb-example-0005-00-analytics-01 would be mounted as /mnt/analytics-01.

Collecting & Downloading Logs

Please see the documentation for cbcollect_info, however a typical command to run would be:

kubectl exec -ti pod/cb-example-0005 /opt/couchbase/bin/cbcollect_info /tmp/cbinfo-default-cb-example-0005-$(date +%y%m%dT%H%M%S%z)

The pod name refers to the name given to the pod in the template. The convention for logs is cbinfo, namespace, pod name, timestamp.

Once complete the logs can be downloaded to the local host.

kubectl cp default/cb-example-0005:/tmp/cbcollectinfo-default-cb-example-0005-181005T154746+0100.zip .

See Also

Refer to the Couchbase Server Troubleshooting guide for additional information about reporting issues.