Configure CMOS for Kubernetes deployment

    • Developer Preview
      +

      Tutorials are provided to demonstrate how a particular problem may be solved. Tutorials are accurate at the time of writing but rely heavily on third party software. The third party software is not directly supported by Couchbase. For further help in the event of a problem, contact the relevant software maintainer.

      Overview

      Couchbase Monitoring and Observability Stack (also known as CMOS) is a simple, out-of-the-box solution built using industry standard tooling to observe the state of a running Couchbase cluster. CMOS can be deployed to monitor the Couchbase clusters deployed via Couchbase Autonomous Operator (CAO) running on Kubernetes.

      Deploy CMOS

      CMOS is deployed on Kubernetes platform using a standard set of resources like Deployment, Services etc. The following sections describe how to deploy these standard objects. They also include information on configuring these services.

      Prometheus Configuration

      The prometheus configuration file is a standard way of specifying the configuration for the prometheus server. Prometheus configuration can be externalized using a Kubernetes ConfigMap, which contains all the details including credentials, and targets to scrape metrics. By externalizing Prometheus configuration to a Kubernetes config map, you don’t have to build the Prometheus image whenever you need to add or remove a configuration. You simply need to update the config map and restart the Prometheus pods to apply the new configuration.

      The following is an example file that contains default configuration for CMOS prometheus to work out of the box. Run the below command in the console to create it.

      mkdir -p ./prometheus/custom/alerting
      cat <<EOF >./prometheus/custom/prometheus-k8s.yml
      # This is a template file we use so we can substitute environment variables at launch
      global:
          scrape_interval: 30s
          evaluation_interval: 30s
        # scrape_timeout is set to the global default (10s).
      
        # Attach these labels to any time series or alerts when communicating with
        # external systems (federation, remote storage, Alertmanager).
          external_labels:
              monitor: couchbase-observability-stack
      
      # Load and evaluate rules in this file every 'evaluation_interval' seconds.
      rule_files:
        # All Couchbase default rules go here
          - /etc/prometheus/alerting/couchbase/*.yaml
          - /etc/prometheus/alerting/couchbase/*.yml
        # All custom rules can go here: relative to this file
          - alerting/*.yaml
          - alerting/*.yml
      
      alerting:
          alertmanagers:
              - scheme: http
          # tls_config:
          #   ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                path_prefix: /alertmanager/
          # Assumption is we always have AlertManager with Prometheus
                static_configs:
                    - targets:
                          - localhost:9093
          # Discover alert manager instances using K8s service discovery
          # kubernetes_sd_configs:
          #   - role: pod
          # relabel_configs:
          # - source_labels: [__meta_kubernetes_namespace]
          #   regex: monitoring
          #   action: keep
          # - source_labels: [__meta_kubernetes_pod_label_app]
          #   regex: prometheus
          #   action: keep
          # - source_labels: [__meta_kubernetes_pod_label_component]
          #   regex: alertmanager
          #   action: keep
          # - source_labels: [__meta_kubernetes_pod_container_port_number]
          #   regex:
          #   action: drop
      
      scrape_configs:
          - job_name: prometheus
            metrics_path: /prometheus/metrics
            static_configs:
                - targets: [localhost:9090]
      
          - job_name: couchbase-grafana
            file_sd_configs:
                - files:
                      - /etc/prometheus/couchbase/monitoring/*.json
                  refresh_interval: 30s
      
        # TODO: add unauthenticated endpoint
          - job_name: couchbase-cluster-monitor
            basic_auth:
                username: admin
                password: password
            metrics_path: /api/v1/_prometheus
          # For basic auth we cannot use file_sd
            static_configs:
                - targets: [localhost:7196]
      
        # Used for kubernetes deployment as we can discover the end points to scrape from the API
          - job_name: couchbase-kubernetes-pods
            # Server 7 requires authentication
            basic_auth:
                username: admin
                password: password
            kubernetes_sd_configs:
                - role: pod
            relabel_configs:
            # Scrape pods labelled with app=couchbase and then only port 8091 (server 7), 9091 (exporter) or 2020 (fluent bit)
                - source_labels: [__meta_kubernetes_pod_label_app]
                  action: keep
                  regex: couchbase
                - source_labels: [__meta_kubernetes_pod_container_port_number]
                  action: keep
                  regex: (8091|9091|2020)
                - action: labelmap
                  regex: __meta_kubernetes_pod_label_(.+)
                - source_labels: [__meta_kubernetes_namespace]
                  action: replace
                  target_label: kubernetes_namespace
                - source_labels: [__meta_kubernetes_pod_name]
                  action: replace
                  target_label: kubernetes_pod_name
                - source_labels: [__meta_kubernetes_pod_label_couchbase_cluster]
                  action: replace
                  target_label: cluster
      
        # Kube-state-metrics default service to scrape
          - job_name: kube-state-metrics
            static_configs:
                - targets: [kube-state-metrics:8080]
      EOF
      1 rule_files: Prometheus is configured to load rules via rule_files. You can extend rules by adding rule_files under the alerting/ directory. Note that the alerting/ directory is a relative path to the prometheus configuration file. By default the complete path is /etc/prometheus/custom/alerting/. Refer to the Observability Stack section for volume mounts.
      2 alerting: Alert manager is shipped and enabled by default in the CMOS. This section has various configurations of alert managers.
      3 scrape_configs: All the targets to scrape metrics are defined here. This includes prometheus, couchbase-grafana, couchbase-cluster-monitor, couchbase-kubernetes-pods and kube-state-metrics. We try to discover the couchbase pods using labels.

      Run the below command in kubernetes console to create the prometheus config map from the configuration file:

      kubectl create configmap prometheus-config-cmos --from-file=./prometheus/custom/

      Observability Stack

      Kubernetes controls access to its resources using Role Based Access Control (RBAC). In order to monitor the Couchbase cluster, the CMOS deployment must communicate with the cluster and discover it. The example YAML file handles this for you. Create it by running the below command in the kubernetes console.

      cat <<EOF | kubectl apply -f -
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
         name: monitoring-endpoints-role
         labels:
             rbac.couchbase.observability.com/aggregate-to-monitoring: 'true'
      rules:
         - apiGroups: [''] (1)
           resources: [services, endpoints, pods, secrets]
           verbs: [get, list, watch]
         - apiGroups: [couchbase.com] (2)
           resources: [couchbaseclusters]
           verbs: [get, list, watch]
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
         name: monitoring-role-binding (3)
      roleRef:
         kind: ClusterRole
         name: monitoring-endpoints-role
         apiGroup: rbac.authorization.k8s.io
      subjects:
         - kind: Group
           name: system:serviceaccounts
           apiGroup: rbac.authorization.k8s.io
      EOF

      In this configure file, you can see that the cluster role is defined by specifying the following permissions:

      1 Access to standard Kubernetes resources: CMOS requires get, list and watch permissions to services, endpoints, pods, secrets resources.
      2 Couchbase Custom Resource Definition: CMOS requires get, list and watch permissions to couchbaseclusters resource.
      3 monitoring-role-binding: This role binding is required to give the permissions created in ClusterRole to the service account of CMOS.

      The actual CMOS workload runs as a Kubernetes deployment along with other supporting services. Create it by running the below command in the Kubernetes console.

      cat <<EOF | kubectl apply -f -
      apiVersion: apps/v1
      kind: Deployment
      metadata:
         name: couchbase-grafana
      spec:
         selector:
             matchLabels:
                 run: couchbase-grafana
         replicas: 1
         template:
             metadata:
                 labels:
                     run: couchbase-grafana
             spec:
                 containers:
                     - name: couchbase-grafana
                       image: couchbase/observability-stack
                       ports:
                           - name: http
                             containerPort: 8080
                           - name: loki # So we can push logs to it
                             containerPort: 3100
                       env:
                           - name: KUBERNETES_DEPLOYMENT
                             value: 'true'
                           - name: ENABLE_LOG_TO_FILE
                             value: 'true'
                           - name: PROMETHEUS_CONFIG_FILE
                             value: /etc/prometheus/custom/prometheus-k8s.yml
                           - name: PROMETHEUS_CONFIG_TEMPLATE_FILE
                             value: ignore
                         # - name: DISABLE_LOKI
                         #   value: "true"
                       volumeMounts:
                           - name: prometheus-config-volume
                             mountPath: /etc/prometheus/custom # keep /etc/prometheus for any defaults
           # Now we watch for changes to the volumes and auto-reload the prometheus configuration if seen
                     - name: prometheus-config-watcher
                       image: weaveworks/watch:master-9199bf5
                       args: [-v, -t, -p=/etc/prometheus/custom, curl, -X, POST, --fail, -o, '-', -sS, http://localhost:8080/prometheus/-/reload]
                       volumeMounts:
                           - name: prometheus-config-volume
                             mountPath: /etc/prometheus/custom
                 volumes:
                     - name: prometheus-config-volume
                       configMap:
                           name: prometheus-config-cmos
      EOF

      After the observability dashboard is deployed, we need to create a service to access CMOS. Create it by running the below command in the kubernetes console.

      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Service
      metadata:
         name: couchbase-grafana-http
         labels:
             run: couchbase-grafana
      spec:
         ports:
             - port: 8080 (1)
               protocol: TCP
         selector:
             run: couchbase-grafana
      EOF
      1 The observability monitoring service runs on port 8080 by default.

      Create a service for accessing Loki by running the below command in the kubernetes console.

      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Service
      metadata:
         name: loki
         labels:
             run: couchbase-grafana
      spec:
         ports:
             - port: 3100
               protocol: TCP
         selector:
             run: couchbase-grafana
      EOF

      Deploy Couchbase

      The Couchbase Helm chart is used to deploy Couchbase Autonomous Operator and a default configuration for the Couchbase Server pods.

      It is easy to get started by using the default configuration values. However, if you want to modify any of the values to meet your specific requirements, please see the section below Helm documentation.

      We provide an example to show how to set up log forwarding to CMOS via Kubernetes annotations on the pod.

      Execute the below commands to create a helm values file with custom values.

      cat <<EOF >custom-values.yaml
      cluster:
         logging:
             server:
                 enabled: true
                 sidecar:
                     image: couchbase/fluent-bit:1.2.2
         monitoring:
             prometheus:
                 enabled: false # We're using server 7 metrics directly
                 image: couchbase/exporter:1.0.6
         security:
             username: admin
             password: password (1)
         servers:
             # We use custom annotations to forward to CMOS Loki
             default:
                 size: 3
                 pod:
                     metadata:
                         annotations:
                             # Match all logs
                             fluentbit.couchbase.com/loki_match: "*"
                             # Send to this SVC
                             fluentbit.couchbase.com/loki_host: loki.default
                 volumeMounts:
                     default: couchbase
         volumeClaimTemplates:
             - metadata:
                   name: couchbase
               spec:
                   resources:
                       requests:
                           storage: 1Gi
      EOF
      1 We recommend specifying a stronger password.
      If you already have the Couchbase operator deployed using helm or are considering a new deployment, the below command can be used with custom values to enable CMOS. If it is deployed using command line tools, you have to update the existing service using the kubectl patch with custom values mentioned above.

      By using the command below, you can upgrade the existing version of an already deployed Couchbase operator. If the operator is not yet installed, it will install it.

      Upgrades to an installed version of operator should be handled with extreme caution. Invalid custom-values.yaml can cause issues in the operator installation.

      In the command below, the default values can in turn be overridden by a user-supplied values file specified using the --set parameters.

      helm repo add couchbase https://couchbase-partners.github.io/helm-charts/
      helm upgrade --install couchbase couchbase/couchbase-operator --set cluster.image=couchbase/server:7.0.2 --values=custom-values.yaml

      Accessing CMOS

      Deploying Ingress

      In order to access the cluster, we set up a Kubernetes Ingress to forward traffic from our localhost to the appropriate parts of the cluster.

      There are two aspects required here:

      • Provide an Ingress controller, which is Nginx in this case.

      • Set up Ingress to forward to our CMOS service.

      For a production system it is likely an Ingress controller will already be deployed with appropriate rules.

      Follow this Nginx Ingress Controller guideline to setup it.

      As soon as the Ingress controller is installed and ready, the last step is to deploy the Ingress configuration as shown below :.

      # Ingress to forward to our web server including sub-paths: we should just forward what we need but for local testing just sending it all.
      cat <<EOF | kubectl apply -f -
      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
         name: couchbase-ingress
         annotations:
             kubernetes.io/ingress.class: nginx
             nginx.ingress.kubernetes.io/rewrite-target: /
      spec:
         rules:
             - http:
                   paths:
                       - path: /
                         pathType: Prefix
                         backend:
                             service:
                                 name: couchbase-grafana-http
                                 port:
                                     number: 8080
      EOF

      If everything is deployed properly, you should be able to access http://localhost (or whatever the Ingress* is for your deployment).

      You should see a landing page which includes links to documentation, cluster manager service and various other services of CMOS.

      cmos landing
      Figure 1. CMOS landing image

      Add Couchbase Cluster

      You can access the Couchbase cluster from the CMOS dashboard using the “Add Cluster” option. In this section, we need to enter a few details and then click on "Add Cluster". Enter the Couchbase Server hostname, username and password credentials.

      By default the username is “admin” and password is “password”. We recommend specifying a stronger password in the custom-values.yaml file during CMOS installation.

      Remove the check from “Add to Prometheus” option, because prometheus scraping will be configured using service discovery.
      add cluster k8s
      Figure 2. Add cluster image

      As soon as you add a cluster, you will see a Grafana URL where you can view inventory and metrics of Couchbase server clusters.

      couchbase inventory k8s
      Figure 3. Couchbase inventory image

      Prometheus Targets

      From the "Prometheus Targets" option, we can see the prometheus targets and their details. For instance, we can filter the targets to show all targets or unhealthy targets. The state information tells which prometheus targets are running. The last scrape value shows how long ago the target metrics were scraped.

      prometheus target k8s
      Figure 4. Prometheus target image

      Whilst the pods are coming up, some may report as failing but these will resolve once the pods are running.

      prometheus target failing k8s
      Figure 5. Prometheus target failing image

      Grafana Dashboard

      With Grafana, multiple dashboards for monitoring a Couchbase cluster are provided out of the box. You can list all the dashboards using the search dashboard option. You may create additional dashboards as per your needs. The following are some out-of-the-box dashboards.

      Couchbase Cluster Overview Metrics

      Couchbase cluster overview metrics dashboard can be accessible on Grafana with name as: single-cluster-overview. This dashboard displays a number of items including Couchbase nodes, buckets available, version information, health check warnings, and which services are running.

      couchbase cluster overview k8s
      Figure 6. Couchbase cluster overview image

      Couchbase Node Overview Metrics

      Couchbase node overview metrics dashboard can be accessible on Grafana with name as: node-overview. This dashboard displays from a node perspective, resource utilization, version, and health check warnings.

      couchbase node overview k8s
      Figure 7. Couchbase node overview image

      Alerts

      CMOS comes with pre-installed alert rules to monitor the Couchbase cluster. Navigate to Alertmanager, prometheus UI to check the alert rules and alerts. For more information check the prometheus and alerting configuration section.

      prometheus alert rules
      Figure 8. Alert rules image
      prometheus alerts
      Figure 9. Alerts image

      Alertmanager

      Alertmanager is shipped and enabled by default in the CMOS, which is accessible via “Alert Manager” options. You can view all the generated alerts in this dashboard.

      alert manager
      Figure 10. Alertmanager image

      Loki

      Loki, which is shipped with Grafana, allows access to logs of various components. You can configure it via Configuration > Data sources > Loki > Explore

      loki explore dashboard
      Figure 11. Loki explore dashboard image

      From the Log browser, you can enter a custom Loki query or select appropriate labels to see the logs.

      loki log browser
      Figure 12. Loki log browser image

      After that select the “Show logs” to view logs. You can also build custom Grafana dashboards based on your needs.

      loki logs
      Figure 13. Loki logs image