Kubernetes Operator Troubleshooting

    If you run into issues with the Kubernetes Operator, you can troubleshoot by examining the logs and events that it generates.

    The Kubernetes Operator generates logs that can be used for auditing and troubleshooting purposes. This page describes logging that is specific to the Kubernetes Operator itself. For information about Couchbase cluster logging, refer to Manage Couchbase Server Logging.

    Overview

    The Kubernetes Operator generates logs that include information about itself and the various other Kubernetes components that make up the Operator deployment. These logs are distinct from the logs that are generated by the Couchbase Server application.

    This page provides information about how to collect and scrutinize logging information that is produced by the Kubernetes Operator. When troubleshooting the Kubernetes Operator, it is important to first rule out Kubernetes itself as the root cause of the problem. The Kubernetes Troubleshooting Guide contains a great deal of helpful information about debugging applications within a Kubernetes cluster.

    Familiarity with the Operator’s configuration settings can be helpful when troubleshooting the Kubernetes Operator.

    Collecting Kubernetes Operator Logs

    Using kubectl or oc, you can choose to print the Kubernetes Operator logs to to standard console output.

    • Kubernetes

    • OpenShift

    Start by getting the name of the Kubernetes Operator pod.

    $ kubectl get po -lapp=couchbase-operator
    NAME                                  READY     STATUS    RESTARTS   AGE
    couchbase-operator-1917615544-h20bm   1/1       Running   0          20h

    Use the pod name to get the logs.

    $ kubectl logs couchbase-operator-1917615544-h20bm
    time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main
    time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main
    time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main
    time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main
    time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main
    time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main

    Alternatively, you can specify the Kubernetes Operator deployment to get the logs.

    $ kubectl logs deployment/couchbase-operator

    Since there is only one instance of the Kubernetes Operator in the deployment, the the underlying command will automatically select the correct pod and print the logs.

    Start by getting the name of the Kubernetes Operator pod.

    $ oc get po -lapp=couchbase-operator
    NAME                                  READY     STATUS    RESTARTS   AGE
    couchbase-operator-1917615544-h20bm   1/1       Running   0          20h

    Use the pod name to get the logs.

    $ oc logs couchbase-operator-1917615544-h20bm
    time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main
    time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main
    time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main
    time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main
    time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main
    time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main

    Alternatively, you can specify the Kubernetes Operator deployment to get the logs.

    $ oc logs deployment/couchbase-operator

    Since there is only one instance of the Kubernetes Operator in the deployment, the the underlying command will automatically select the correct pod and print the logs.

    If you’re troubleshooting the Kubernetes Operator, watch for the following messages which indicate that the Operator is unable to reconcile a Couchbase cluster into a desired state:

    • Logs with level=error

    • Operator is unable to get cluster state after N retries

    Profiling the Kubernetes Operator

    For more advanced troubleshooting, the Kubernetes Operator supports the Go language pprof feature and serves profiling data on its default listen address localhost:8080. You can access this endpoint by running a remote shell or forwarding the port to your local system.

    • Kubernetes

    • OpenShift

    To access goroutine stack traces using a shell:

    $ kubectl exec -it couchbase-operator-599bcf47f-8wswh sh
    $ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less

    To access Go memory usage using a port forward:

    $ kubectl port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
    $ go tool pprof localhost:8080/debug/pprof/heap
    (pprof) traces

    To access goroutine stack traces using a shell:

    $ oc exec -it couchbase-operator-599bcf47f-8wswh sh
    $ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less

    To access Go memory usage using a port forward:

    $ oc port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
    $ go tool pprof localhost:8080/debug/pprof/heap
    (pprof) traces

    Kubernetes Events

    Kubernetes Events provide insights into what is happening inside a Kubernetes cluster. They record significant occurrences and changes in the state of resources, such as the creation, deletion, or failure of pods, nodes, services, and other Kubernetes objects.

    They can be used to monitor changes that have occurred in the cluster, and can be helpful when troubleshooting issues with the Kubernetes Operator. However, they expire after a certain period of time, typically one hour. You can use the Kubernetes Event Collector tool to collect and store events for longer periods of time.

    The Kubernetes Event Collector watches for Kubernetes events within a namespace and stores them to a buffer which can be stashed. It can be deployed and configured using helm

    $ helm install event-collector charts/event-collector

    For more details about the tool and how to use it, refer to the github repository README: https://github.com/couchbase/couchbase-k8s-event-collector