Release Notes for Couchbase Autonomous Operator 2.2
Couchbase Autonomous Operator 2.2 is a significant release that expands support for auto-scaling Couchbase clusters, and adds several improvements in the areas of logging and security.
Take a look at the What’s New page for a list of new features and improvements that are available in this release.
The necessary steps needed to upgrade to this release depend on which version of the Autonomous Operator you are upgrading from.
There is no direct upgrade path from versions prior to 2.0.x. To upgrade from a 1.x.x release, you must first upgrade to 2.0.x, paying particular attention to supported Kubernetes platforms and Couchbase Server versions. Refer to the 2.0.x upgrade steps if upgrading from a 1.x.x release.
Additional steps and considerations are required when upgrading from version 2.0.x. Please refer to Upgrading from Version 2.0.x.
There are no additional upgrade steps or considerations when upgrading from this version. You may follow standard upgrade process.
First, ensure that you are running compatible versions of Kubernetes and Couchbase Server before upgrading.
If you are not utilizing TLS, you can skip this section.
The TLS requirements have been modified as of Autonomous Operator 2.1. In order to ease the migration from legacy client bootstrap (CCCP) to the newest version (GCCCP), the Autonomous Operator requires Couchbase cluster subject alternative names (SANs) to be updated. Consult the TLS tutorial for a full list of all the required SANs, and the TLS rotation guide in order to prepare for upgrade. Failure to perform this step will result in errors from the dynamic admission controller (DAC) once upgraded.
When upgrading from version 2.0.x, Couchbase clusters will undergo a mandatory upgrade cycle.
Pod readiness checks were previously driven by an
exec based readiness probe.
This was a security concern because it granted the Autonomous Operator
pods/exec privileges, which may not be acceptable in highly regulated environments.
As of Autonomous Operator 2.1, readiness checks are performed using readiness gates that use the Kubernetes API exclusively.
You can use the
couchbaseclusters.spec.rollingUpgrade configuration parameter to speed up this upgrade.
To enable this feature while upgrading the Autonomous Operator, stop the old Operator, replace the CRDs, edit the
couchbaseclusters.spec.rollingUpgrade field to enable bulk upgrades, then start the new Operator.
Refer to Upgrade the Operator for further details.
Couchbase Autonomous Operator 2.2.1 was released in August 2021.
The new default images are
These images feature functional and security updates.
Existing clusters will continue to use the component versions they were provisioned with. Should you wish to update existing Couchbase clusters to make use of these new images, then they will undergo a rolling upgrade to facilitate the update. You should plan a maintenance window accordingly.
Summary: When using Istio with mTLS in STRICT mode, there is the possibility of a race condition that prevents traffic between the operator and new server pods.
Typically this issue is observed as an
Summary: A new pod annotation was added in 2.2.0 to record whether Couchbase Server was fully initialized or not, as Couchbase Server will not work as required by the Operator until initialization is performed. The intention was to remove a deadlock situation that the Operator was unable to recover from.
Due to a combination of K8S-2273, and the label not being correctly checked, it is possible — in the event of a network partition between the Operator and all pods in a Couchbase cluster, for example — for the Operator to determine all nodes are uninitialized, and for them to be deleted.
For any type of cluster, this risks service disruption. For an ephemeral cluster — without persistent storage backing — this would also result in total data loss.
Summary: The Couchbase Prometheus Exporter image default has been updated to the most recent version. This provides numerous stability fixes and enhanced security.
Summary: The node selector defined for backups was only applied to the backup
Summary: A defect was discovered where view service ports were only enabled for Couchbase Server instances with the index service enabled, as opposed to the data service. This has now been fixed to enable the ports on data service instances.
This defect only affects users of map-reduce views, and only when used over an exposed feature from a network outside of Kubernetes.
Summary: The LDAP cache timeout configuration value was optional, and had no default.
As a result, this would not equal the Couchbase Server provided default of 30000 and result in constant updates.
Additionally defaults have been provided for the LDAP port and LDAP group search depth, and better range checking.
Couchbase Autonomous Operator 2.2.0 was released in June 2021.
For the latest platform support information, refer to Prerequisites and System Requirements.
This release adds support for the following platforms:
Open Source Kubernetes 1.20, 1.21
Red Hat OpenShift Container Platform 4.7
This release also adds support for the following utilities:
The Autonomous Operator now supports Couchbase cluster auto-scaling for all Couchbase services. This means that Couchbase cluster server class configurations containing stateful services like Data and Index can now be configured to automatically scale in response to observed metrics. (Previously, only stateless deployments of the Query Service were supported.)
To help users get started in production, an auto-scaling best practices guide has been provided. This guide discusses relevant metrics for scaling individual Couchbase Services, and includes recommended settings based on internal benchmark testing performed by Couchbase.
About Auto-scaling Preview Mode
When cluster auto-scaling was introduced in Autonomous Operator 2.1, only stateless configurations that included the Query Service and Ephemeral buckets were supported.
However, a special preview mode could be enabled that allowed stateful server configurations other than
In Autonomous Operator 2.2, cluster auto-scaling can be enabled for any service or configuration, whether stateful or stateless.
As a result,
Tutorial: Auto-scaling the Couchbase Query Service
Tutorial: Auto-scaling the Couchbase Data Service
Tutorial: Auto-scaling the Couchbase Index Service
Reference: CouchbaseAutoscaler Resource
Reference: Auto-scaling Lifecycle Events
couchbaseclusters.spec.enableOnlineVolumeExpansion has been added to allow for the expansion of persistent volumes that are already in use by Couchbase clusters, thus removing the need for a rolling upgrade of cluster pods.
The Autonomous Operator achieves this by working in conjunction with Kubernetes Persistent Volume Expansion to claim additional storage for running pods without any downtime.
For additional information and requirements, refer to Online Volume Expansion.
The Autonomous Operator now supports log forwarding through the optional deployment of a third party log processor. The log processor runs in a sidecar container on each Couchbase pod, which then reads the log files and forwards them to standard console output. For this purpose, Couchbase supplies a default log processor image based on Fluent Bit.
The Couchbase Server default minimum TLS version is 1.0.
This is insecure and easily compromised if a client downgrades to a version less than 1.2.
Operator 2.2 makes TLS 1.2 the minimum by default, and will automatically update the TLS minimum version unless it is explicitly specified with the
You can prevent this change from occurring when upgrading the Operator on existing clusters by setting
TLS1.0 after upgrading the CRDs, and before restarting the Operator.
Couchbase Server requires that the TLS resources be called
chain.pem, however the majority of 3rd-party certificate managers generate
This format is governed by the Kubernetes
kubernetes.io/tls secret type.
To provide integration with 3rd-party providers, and to maintain backward compatibility with existing clusters, the Operator now provides secret shadowing.
A new TLS source —
couchbaseclusters.spec.networking.tls.secretSource — allows you to use
kubernetes.io/tls type secrets to configure Couchbase Server.
When using this new TLS source, a shadow copy of the TLS secret is created, with the key names changed to those hard coded in Couchbase Server.
The shadow secret can then be mounted and used inside Couchbase Server pods.
As the secret is shadowed, the Operator can reformat private keys, and therefore now supports PKCS#8 formatted private keys.
3rd-party certificate managers can be used with clusters that are configured with this new TLS source. Refer to the tutorial Using a Certificate Manager for an example.
For existing clusters using the legacy
couchbaseclusters.spec.networking.tls.static source, the Operator works as before — directly mounting and consuming the TLS secret without a shadow secret.
Prior to this release, server groups had to be enabled on cluster creation, and were immutable for the lifetime of the cluster. These restrictions have now been lifted.
It is now possible to enable Couchbase Server groups, and Operator pod scheduling, while a cluster is running.
It is also now possible to modify the list of server groups used for scheduling. Previously if an availability zone were to suffer an outage, the Operator would continually attempt to recreate failed pods in the same availability zone to maintain even balance across the requested set. While the availability zone continues to suffer an outage, the Couchbase cluster would be undersized, potentially impacting the ability to service client requests. This release allows the availability zones to be modified, excluding the failed one, so that the Operator is able to scale the cluster back up to the correct size.
All server group migration operations use a shortest-path algorithm in order to minimize disruption.
Prior to this release, any updates to XDCR remote clusters would be silently ignored. In order to modify any setting, the remote cluster and all replications would have to be deleted and recreated from scratch.
This release adds the ability to modify remote cluster identification and authentication settings. This provides the ability to rotate passwords and certificates on the remote cluster, or even replace the remote cluster entirely.
Kubernetes allows pods to reserve and limit compute resources. Resource reservation provides Kubernetes with the ability to fairly schedule pods so that they don’t compete for CPU and memory.
The Operator now provides the ability to automatically manage pod CPU and memory resource requests with the
When in use, pods will have their resource requests automatically populated depending on the services enabled on that pod, and the individual Couchbase service quotas.
Updating quotas will cause the cluster to upgrade as pod resource requests are updated. A new dummy memory quota for the query service is introduced to allow management of memory resource requests, even though Couchbase itself does not provide the ability to constrain this service.
Prior to this release, the Operator allowed two types of upgrade — a one pod at a time rolling upgrade, and a whole cluster upgrade.
This release allows rolling upgrades to be extended to upgrade either a fixed number of pods, or a percentage of the cluster size. Both values can be set, and the Autonomous Operator will select the one that results in the fewest number of pods being upgraded at a time.
Rolling upgrades can be configured with the
A whole cluster "immediate" upgrade is now synonymous with a rolling upgrade set to upgrade 100% of the cluster at a time. Immediate upgrades may be removed at some point in the future as it duplicates functionality.
With the release of Autonomous Operator 2.2, the
operator-backup container image has been enhanced to support a wider range of Couchbase Server and Autonomous Operator versions.
As part of this update, the
operator-backup image has switched to semantic versioning which is no longer based on Couchbase Server version.
The new image is still based on the
cbbackupmgr utility, therefore its ability to backup and restore data between different versions of Couchbase Server is still dependent on the compatibility of the underlying
operator-backup:1.1.0 image is based on
cbbackupmgr 6.6.2, which is capable of backing up and restoring data from all versions of Couchbase Server that have ever been supported by the Autonomous Operator up to this point.
When support for new Couchbase Server versions is added to the Autonomous Operator, support for backing up those versions will be added to the latest
While not a replacement for a properly implemented monitoring solution, it may be beneficial to have your backup storage grow with your requirements. To this end, the Autonomous Operator now supports online resizing of backup volumes.
Backup volumes can be resized in one of two ways:
Backup volumes can be automatically resized by the Autonomous Operator, thus keeping backups to the minimum size necessary to operate, helping to reduce cost while also supporting future expansion. In this mode of operation, the
couchbasebackups.spec.autoScalingfield controls the behavior.
In order to use any of these resize capabilities, the underlying StorageClass associated with the backup volume must be configured to allow volume expansion.
By default, backup and restore operations execute with only a single thread of execution, which can sometimes lead to poor performance.
To mitigate thread-related performance issues,
couchbasebackuprestores.spec.threads can now be specified to configure the number of concurrent
cbbackupmgr clients to use when backing up or restoring data.
This release also introduces filters that allow you to control exactly what is restored from a backup. In previous releases, restore operations were fairly inflexible in that the only option was to restore all data from a backup. Now you can choose to include or exclude specific buckets, restore data for a particular service, and even restore to a different bucket name when restoring documents. Refer to Additional Restore Options for more information.
The latest version of the Couchbase Prometheus Exporter allows for certain customizations to exported metrics. The following customizations are currently supported:
Change the namespace, subsystem, name, and help text for each metric.
Enable and disable whether a metric is exported to Prometheus.
For more information, refer to Customizing Metrics.
Summary: When using service types in both the console and exposed feature service templates, a logical problem meant that the deprecated service type field defaults overwrote those provided in the template.
For example, when setting the template type to
Summary: Prior to backup images
Summary: Istio in mTLS mode is unable to run using
Summary: On dynamic platforms (GKE Autopilot being one example), where deployments can be rescheduled to better utilize system resources, we detected a deadlock situation. In this scenario, the Autonomous Operator could be terminated while a Couchbase Server pod was still being initialized, and thus on restart it would look okay, however Couchbase Server would refuse to respond to the Autonomous Operator. This has been remedied by annotating Couchbase Server pods when we know they have been fully initialized, and thus can be terminated when we know they are uninitialized and it’s safe to do so, then re-created.
Summary: A bug in the Autonomous Operator meant that when using LDAP groups, nesting could not be turned off. This is now fixed in this release.
Summary: Prior to this release, setting the auto-compaction time window managed the wrong configuration settings, resulting in auto-compaction running at unexpected times. This has now been fixed to update the correct settings. Be aware that when upgrading to this release with non-default auto-compaction settings, this will cease to erroneously manage index auto-compaction, and start managing global auto-compaction.
Summary: A race condition existed where the Autonomous Operator was restarted and a backup
Summary: Issues exist with the default Prometheus Exporter container image set by the Dynamic Admission Controller.
Workaround: It is recommended that you manually set
The dynamic admission controller (DAC) now allows couchbaseclusters.spec.servers.volumeMounts.index to be configured for server classes that include the Search Service and that don’t include the Index Service. Previously, the DAC would reject configurations that specified the index volume mount unless the Index Service was also included in the server class.
You can have a big impact on future versions of the Operator (and its documentation) by providing Couchbase with your direct feedback and observations. Please feel free to post your questions and comments to the Couchbase Forums.
The complete list of licenses for Couchbase products is available on the Legal Agreements page. Couchbase is thankful to all of the individuals that have created these third-party components.