A newer version of this documentation is available.

View Latest

Release Notes for Couchbase Autonomous Operator 2.2

      +

      Couchbase Autonomous Operator 2.2 is a significant release that expands support for auto-scaling Couchbase clusters, and adds several improvements in the areas of logging and security.

      Take a look at the What’s New page for a list of new features and improvements that are available in this release.

      Installation

      Upgrading to Autonomous Operator 2.2

      The necessary steps needed to upgrade to this release depend on which version of the Autonomous Operator you are upgrading from.

      • Version 1.x.x

        • There is no direct upgrade path from versions prior to 2.0.x. To upgrade from a 1.x.x release, you must first upgrade to 2.0.x, paying particular attention to supported Kubernetes platforms and Couchbase Server versions. Refer to the 2.0.x upgrade steps if upgrading from a 1.x.x release.

      • Version 2.0.x

      • Version 2.1.x

        • There are no additional upgrade steps or considerations when upgrading from this version. You may follow standard upgrade process.

      Upgrading from Version 2.0.x

      First, ensure that you are running compatible versions of Kubernetes and Couchbase Server before upgrading.

      TLS Requirements

      If you are not utilizing TLS, you can skip this section.

      The TLS requirements have been modified as of Autonomous Operator 2.1. In order to ease the migration from legacy client bootstrap (CCCP) to the newest version (GCCCP), the Autonomous Operator requires Couchbase cluster subject alternative names (SANs) to be updated. Consult the TLS tutorial for a full list of all the required SANs, and the TLS rotation guide in order to prepare for upgrade. Failure to perform this step will result in errors from the dynamic admission controller (DAC) once upgraded.

      Mandatory Couchbase Upgrade Cycle

      When upgrading from version 2.0.x, Couchbase clusters will undergo a mandatory upgrade cycle.

      Pod readiness checks were previously driven by an exec based readiness probe. This was a security concern because it granted the Autonomous Operator pods/exec privileges, which may not be acceptable in highly regulated environments. As of Autonomous Operator 2.1, readiness checks are performed using readiness gates that use the Kubernetes API exclusively.

      You can use the couchbaseclusters.spec.rollingUpgrade configuration parameter to speed up this upgrade. To enable this feature while upgrading the Autonomous Operator, stop the old Operator, replace the CRDs, edit the couchbaseclusters.spec.rollingUpgrade field to enable bulk upgrades, then start the new Operator. Refer to Upgrade the Operator for further details.

      Release 2.2.4

      Couchbase Autonomous Operator 2.2.4 was released in July 2022.

      Fixed Issues

      Issue Description

      Summary: Prior to Autonomous Operator 2.2.4, Couchbase Server 6.6.2 was the highest compatible version of Couchbase Server available for use with the Red Hat Marketplace. This release updates the list of compatible Couchbase Server builds to allow Marketplace users to take advantage of the security enhancements provided by Server 6.6.4 and higher.

      To address security related issues, compatibility with Server 6.6.4-7.0.2 was added to Operator Lifecycle Manager, allowing the use of Server 6.6.5 as the default in the Red Hat Marketplace.

      Release 2.2.3

      Couchbase Autonomous Operator 2.2.3 was released in March 2022.

      Common Vulnerabilities and Exposures

      This section lists common vulnerabilities and exposures that are fixed in this release.

      See Couchbase Alerts for the complete list of common vulnerabilities and exposures.

      Release 2.2.2

      Couchbase Autonomous Operator 2.2.2 was released in December 2021.

      New Features and Behavioral Changes

      Configurable Addressability Timeouts

      When using public networking with external-DNS or generic networking, the Operator polls new cluster members to ensure addressability and connectivity before allowing the member to be added to the cluster. Failure to do so would result in client errors and partial availability.

      This has a downside in that polling DNS immediately would poison caching DNS servers with negative lookup results (negative caching).

      In order to mitigate this problem, a new field couchbaseclusters.spec.networking.waitForAddressReachableDelay, is introduced. This allows a time period to be configured between pod creation and polling, allowing time for load balancers to be provisioned and DNS entries to be propagated. This defaults to 2m, changing the old behavior of immediate polling.

      Fixed Issues

      Issue Description

      Summary: When backup and restore jobs are run, they may do so with a security context. Previously, backup jobs would have run with a security context (typically UID 1000 as advised by the documentation). A restore job did not have the security context associated with it, but worked as the container image’s default UID was 1000. With newer backup images (1.1.1 and greater), the default UID of the container image has changed, so this will no longer work, as the restore user does not match the backup user.

      To remedy this, the security context is now attached to the restore job also. This issue affects users running with a UID other than 1000, and users for backup 1.1.1 and greater.

      Release 2.2.1

      Couchbase Autonomous Operator 2.2.1 was released in August 2021.

      New Features and Behavioral Changes

      New Prometheus, Logging and Backup Defaults

      The new default images are couchbase/exporter:1.0.5 and couchbase/fluent-bit:1.0.4. The latest backup image is couchbase/operator-backup:1.1.1. These images feature functional and security updates.

      Existing clusters will continue to use the component versions they were provisioned with. Should you wish to update existing Couchbase clusters to make use of these new images, then they will undergo a rolling upgrade to facilitate the update. You should plan a maintenance window accordingly.

      Fixed Issues

      Issue Description

      Summary: When using Istio with mTLS in STRICT mode, there is the possibility of a race condition that prevents traffic between the operator and new server pods. Typically this issue is observed as an EOL error when the operator uses the REST API of the new pods. On previous versions, the Istio annotations were not preserved. This has been remedied by ensure the Istio annotations are maintained on any pods and allow for mTLS in STRICT mode for everything except for the DAC, which requires PERMISSIVE mode regardless to communicate over TLS with the Kubernetes API.

      Summary: A new pod annotation was added in 2.2.0 to record whether Couchbase Server was fully initialized or not, as Couchbase Server will not work as required by the Operator until initialization is performed. The intention was to remove a deadlock situation that the Operator was unable to recover from.

      Due to a combination of K8S-2273, and the label not being correctly checked, it is possible — in the event of a network partition between the Operator and all pods in a Couchbase cluster, for example — for the Operator to determine all nodes are uninitialized, and for them to be deleted.

      For any type of cluster, this risks service disruption. For an ephemeral cluster — without persistent storage backing — this would also result in total data loss.

      Summary: The Couchbase Prometheus Exporter image default has been updated to the most recent version. This provides numerous stability fixes and enhanced security.

      Summary: The node selector defined for backups was only applied to the backup Cronjob, and not the restore Job also. This behavior has been fixed so the restores happen on the same nodes as backups.

      Summary: A defect was discovered where view service ports were only enabled for Couchbase Server instances with the index service enabled, as opposed to the data service. This has now been fixed to enable the ports on data service instances.

      This defect only affects users of map-reduce views, and only when used over an exposed feature from a network outside of Kubernetes.

      Summary: The LDAP cache timeout configuration value was optional, and had no default. As a result, this would not equal the Couchbase Server provided default of 30000 and result in constant updates. The CouchbaseCluster CRD has now been amended to provide this default and avoid the problem.

      Additionally defaults have been provided for the LDAP port and LDAP group search depth, and better range checking.

      Summary: New Operator Backup image is published which is tested against Couchbase Autonomous Operator 2.2.1 and later with no changes in the Couchbase Autonomous Operator. We recommend updating to couchbase/operator-backup:1.1.1.

      Release 2.2.0

      Couchbase Autonomous Operator 2.2.0 was released in June 2021.

      Platform Support

      For the latest platform support information, refer to Prerequisites and System Requirements.

      New Platform Support

      This release adds support for the following platforms:

      • Open Source Kubernetes 1.20, 1.21

      • Red Hat OpenShift Container Platform 4.7

      This release also adds support for the following utilities:

      • Rancher

      Removed Platform Support

      This release drops support for the following platforms:

      • Open Source Kubernetes 1.15, 1.16

      • Red Hat OpenShift Container Platform 4.3

      New Features and Behavioral Changes

      Couchbase Cluster Auto-scaling

      The Autonomous Operator now supports Couchbase cluster auto-scaling for all Couchbase services. This means that Couchbase cluster server class configurations containing stateful services like Data and Index can now be configured to automatically scale in response to observed metrics. (Previously, only stateless deployments of the Query Service were supported.)

      To help users get started in production, an auto-scaling best practices guide has been provided. This guide discusses relevant metrics for scaling individual Couchbase Services, and includes recommended settings based on internal benchmark testing performed by Couchbase.

      About Auto-scaling Preview Mode

      When cluster auto-scaling was introduced in Autonomous Operator 2.1, only stateless configurations that included the Query Service and Ephemeral buckets were supported. However, a special preview mode could be enabled that allowed stateful server configurations other than query to be auto-scaled. This preview mode was enabled by setting couchbaseclusters.spec.enablePreviewScaling to true.

      In Autonomous Operator 2.2, cluster auto-scaling can be enabled for any service or configuration, whether stateful or stateless. As a result, couchbaseclusters.spec.enablePreviewScaling has been deprecated and is now ignored by the Autonomous Operator.

      Online Expansion of Persistent Volumes

      The couchbaseclusters.spec.enableOnlineVolumeExpansion has been added to allow for the expansion of persistent volumes that are already in use by Couchbase clusters, thus removing the need for a rolling upgrade of cluster pods. The Autonomous Operator achieves this by working in conjunction with Kubernetes Persistent Volume Expansion to claim additional storage for running pods without any downtime.

      For additional information and requirements, refer to Online Volume Expansion.

      Log Forwarding and Audit Log Management

      The Autonomous Operator now supports log forwarding through the optional deployment of a third party log processor. The log processor runs in a sidecar container on each Couchbase pod, which then reads the log files and forwards them to standard console output. For this purpose, Couchbase supplies a default log processor image based on Fluent Bit.

      In addition, audit logging can now be configured via the CouchbaseCluster resource specification, thus allowing for automated audit logging via the Autonomous Operator.

      Improved Security Defaults

      The Couchbase Server default minimum TLS version is 1.0. This is insecure and easily compromised if a client downgrades to a version less than 1.2. Operator 2.2 makes TLS 1.2 the minimum by default, and will automatically update the TLS minimum version unless it is explicitly specified with the couchbaseclusters.spec.networking.tls.tlsMinimumVersion parameter.

      You can prevent this change from occurring when upgrading the Operator on existing clusters by setting couchbaseclusters.spec.networking.tls.tlsMinimumVersion to TLS1.0 after upgrading the CRDs, and before restarting the Operator.

      Improved TLS Certificate and Key Handling

      Couchbase Server requires that the TLS resources be called pkey.key and chain.pem, however the majority of 3rd-party certificate managers generate tls.key and tls.crt. This format is governed by the Kubernetes kubernetes.io/tls secret type. To provide integration with 3rd-party providers, and to maintain backward compatibility with existing clusters, the Operator now provides secret shadowing.

      A new TLS source — couchbaseclusters.spec.networking.tls.secretSource — allows you to use kubernetes.io/tls type secrets to configure Couchbase Server. When using this new TLS source, a shadow copy of the TLS secret is created, with the key names changed to those hard coded in Couchbase Server. The shadow secret can then be mounted and used inside Couchbase Server pods. As the secret is shadowed, the Operator can reformat private keys, and therefore now supports PKCS#8 formatted private keys.

      3rd-party certificate managers can be used with clusters that are configured with this new TLS source. Refer to the tutorial Using a Certificate Manager for an example.

      For existing clusters using the legacy couchbaseclusters.spec.networking.tls.static source, the Operator works as before — directly mounting and consuming the TLS secret without a shadow secret.

      Improved Server Group Support

      Prior to this release, server groups had to be enabled on cluster creation, and were immutable for the lifetime of the cluster. These restrictions have now been lifted.

      It is now possible to enable Couchbase Server groups, and Operator pod scheduling, while a cluster is running.

      It is also now possible to modify the list of server groups used for scheduling. Previously if an availability zone were to suffer an outage, the Operator would continually attempt to recreate failed pods in the same availability zone to maintain even balance across the requested set. While the availability zone continues to suffer an outage, the Couchbase cluster would be undersized, potentially impacting the ability to service client requests. This release allows the availability zones to be modified, excluding the failed one, so that the Operator is able to scale the cluster back up to the correct size.

      All server group migration operations use a shortest-path algorithm in order to minimize disruption.

      Improved XDCR Connection Support

      Prior to this release, any updates to XDCR remote clusters would be silently ignored. In order to modify any setting, the remote cluster and all replications would have to be deleted and recreated from scratch.

      This release adds the ability to modify remote cluster identification and authentication settings. This provides the ability to rotate passwords and certificates on the remote cluster, or even replace the remote cluster entirely.

      Automatic Resource Allocation

      Kubernetes allows pods to reserve and limit compute resources. Resource reservation provides Kubernetes with the ability to fairly schedule pods so that they don’t compete for CPU and memory.

      The Operator now provides the ability to automatically manage pod CPU and memory resource requests with the couchbaseclusters.spec.autoResourceAllocation field. When in use, pods will have their resource requests automatically populated depending on the services enabled on that pod, and the individual Couchbase service quotas.

      Updating quotas will cause the cluster to upgrade as pod resource requests are updated. A new dummy memory quota for the query service is introduced to allow management of memory resource requests, even though Couchbase itself does not provide the ability to constrain this service.

      Enhanced Couchbase Upgrade Behavior

      Prior to this release, the Operator allowed two types of upgrade — a one pod at a time rolling upgrade, and a whole cluster upgrade.

      This release allows rolling upgrades to be extended to upgrade either a fixed number of pods, or a percentage of the cluster size. Both values can be set, and the Autonomous Operator will select the one that results in the fewest number of pods being upgraded at a time.

      Rolling upgrades can be configured with the couchbaseclusters.spec.rollingUpgrade parameter.

      A whole cluster "immediate" upgrade is now synonymous with a rolling upgrade set to upgrade 100% of the cluster at a time. Immediate upgrades may be removed at some point in the future as it duplicates functionality.

      Enhanced Backup and Restore

      With the release of Autonomous Operator 2.2, the operator-backup container image has been enhanced to support a wider range of Couchbase Server and Autonomous Operator versions. As part of this update, the operator-backup image has switched to semantic versioning which is no longer based on Couchbase Server version.

      At the time of this writing, the new image — operator-backup:1.1.0 — is compatible with all versions of the Autonomous Operator that support managed backup and restore (version 2.0 and later).

      The new image is still based on the cbbackupmgr utility, therefore its ability to backup and restore data between different versions of Couchbase Server is still dependent on the compatibility of the underlying cbbackupmgr version. However, the operator-backup:1.1.0 image is based on cbbackupmgr 6.6.2, which is capable of backing up and restoring data from all versions of Couchbase Server that have ever been supported by the Autonomous Operator up to this point. When support for new Couchbase Server versions is added to the Autonomous Operator, support for backing up those versions will be added to the latest operator-backup image.

      Online Backup Volume Resizing

      While not a replacement for a properly implemented monitoring solution, it may be beneficial to have your backup storage grow with your requirements. To this end, the Autonomous Operator now supports online resizing of backup volumes.

      Backup volumes can be resized in one of two ways:

      1. Backup volumes can be manually resized by directly editing the couchbasebackups.spec.size field.

      2. Backup volumes can be automatically resized by the Autonomous Operator, thus keeping backups to the minimum size necessary to operate, helping to reduce cost while also supporting future expansion. In this mode of operation, the couchbasebackups.spec.autoScaling field controls the behavior.

      In order to use any of these resize capabilities, the underlying StorageClass associated with the backup volume must be configured to allow volume expansion.

      General Backup and Restore Enhancements

      By default, backup and restore operations execute with only a single thread of execution, which can sometimes lead to poor performance. To mitigate thread-related performance issues, couchbasebackups.spec.threads and couchbasebackuprestores.spec.threads can now be specified to configure the number of concurrent cbbackupmgr clients to use when backing up or restoring data.

      This release also introduces filters that allow you to control exactly what is restored from a backup. In previous releases, restore operations were fairly inflexible in that the only option was to restore all data from a backup. Now you can choose to include or exclude specific buckets, restore data for a particular service, and even restore to a different bucket name when restoring documents. Refer to Additional Restore Options for more information.

      Customizable Prometheus Metrics

      The latest version of the Couchbase Prometheus Exporter allows for certain customizations to exported metrics. The following customizations are currently supported:

      • Change the namespace, subsystem, name, and help text for each metric.

      • Enable and disable whether a metric is exported to Prometheus.

      For more information, refer to Customizing Metrics.

      Fixed Issues

      Issue Description

      Summary: When using service types in both the console and exposed feature service templates, a logical problem meant that the deprecated service type field defaults overwrote those provided in the template. For example, when setting the template type to LoadBalancer, this may have ended up as NodePort due to the old defaults taking precedence. This has now been fixed so that the template will take precedence.

      Summary: Prior to backup images couchbase/operator-backup:6.5.0-XXX, couchbase/operator-backup:6.6.0-XXX, and couchbase/operator-backup:1.0.0 a bug meant that all backup traffic between the job and Couchbase server was in plain text. While these backup images will work with plain TLS and mTLS, it will not work with mandatory mTLS configurations, as cbbackupmgr does not support client certificate authentication. Autonomous Operator 2.2 will fall back to plain text backup when using mandatory mTLS in order to facilitate functional backups.

      Summary: Istio in mTLS mode is unable to run using Job or CronJob resource types. This is due to the Envoy sidecar proxy not terminating when a backup or restore Job does, thus keeping the Pod alive and blocking further execution. This is remedied in Couchbase Operator Backup versions 1.0.0 and above by sending a termination signal to the Envoy side car on termination of the backup and restore Job.

      Summary: On dynamic platforms (GKE Autopilot being one example), where deployments can be rescheduled to better utilize system resources, we detected a deadlock situation. In this scenario, the Autonomous Operator could be terminated while a Couchbase Server pod was still being initialized, and thus on restart it would look okay, however Couchbase Server would refuse to respond to the Autonomous Operator. This has been remedied by annotating Couchbase Server pods when we know they have been fully initialized, and thus can be terminated when we know they are uninitialized and it’s safe to do so, then re-created.

      Summary: A bug in the Autonomous Operator meant that when using LDAP groups, nesting could not be turned off. This is now fixed in this release.

      Summary: Prior to this release, setting the auto-compaction time window managed the wrong configuration settings, resulting in auto-compaction running at unexpected times. This has now been fixed to update the correct settings. Be aware that when upgrading to this release with non-default auto-compaction settings, this will cease to erroneously manage index auto-compaction, and start managing global auto-compaction.

      Summary: A race condition existed where the Autonomous Operator was restarted and a backup Pod was still executing. This Pod was erroneously considered as part of a metadata update routine and caused a crash loop until the backup terminated. This has now been fixed by correctly filtering the pods considered for metadata updates to only include Couchbase Server instances.

      Known Issues

      Issue Description

      Summary: Issues exist with the default Prometheus Exporter container image set by the Dynamic Admission Controller.

      Workaround: It is recommended that you manually set couchbaseclusters.spec.monitoring.prometheus.image to couchbase/exporter:1.0.5 in order to take advantage of the latest fixes. Note that changing the Couchbase Prometheus Exporter image for an existing Couchbase cluster will result in a rolling upgrade of the cluster pods. This does not cause any loss of services, but may use system resources — therefore it is recommended that this be done during non-peak workload.

      Other Notable Changes

      • The dynamic admission controller (DAC) now allows couchbaseclusters.spec.servers.volumeMounts.index to be configured for server classes that include the Search Service and that don’t include the Index Service. Previously, the DAC would reject configurations that specified the index volume mount unless the Index Service was also included in the server class.

      Feedback

      You can have a big impact on future versions of the Operator (and its documentation) by providing Couchbase with your direct feedback and observations. Please feel free to post your questions and comments to the Couchbase Forums.

      Licenses for Third-Party Components

      The complete list of licenses for Couchbase products is available on the Legal Agreements page. Couchbase is thankful to all of the individuals that have created these third-party components.