Release Notes for Couchbase Autonomous Operator 2.2

    +

    Couchbase Autonomous Operator 2.2 is a significant release that expands support for auto-scaling Couchbase clusters, and adds several improvements in the areas of logging and security.

    Take a look at the What’s New page for a list of new features and improvements that are available in this release.

    Installation

    Upgrading to Autonomous Operator 2.2

    The necessary steps needed to upgrade to this release depend on which version of the Autonomous Operator you are upgrading from.

    • Version 1.x.x

      • There is no direct upgrade path from versions prior to 2.0.x. To upgrade from a 1.x.x release, you must first upgrade to 2.0.x, paying particular attention to supported Kubernetes platforms and Couchbase Server versions. Refer to the 2.0.x upgrade steps if upgrading from a 1.x.x release.

    • Version 2.0.x

    • Version 2.1.x

      • There are no additional upgrade steps or considerations when upgrading from this version. You may follow standard upgrade process.

    Upgrading from Version 2.0.x

    First, ensure that you are running compatible versions of Kubernetes and Couchbase Server before upgrading.

    TLS Requirements

    If you are not utilizing TLS, you can skip this section.

    The TLS requirements have been modified as of Autonomous Operator 2.1. In order to ease the migration from legacy client bootstrap (CCCP) to the newest version (GCCCP), the Autonomous Operator requires Couchbase cluster subject alternative names (SANs) to be updated. Consult the TLS tutorial for a full list of all the required SANs, and the TLS rotation guide in order to prepare for upgrade. Failure to perform this step will result in errors from the dynamic admission controller (DAC) once upgraded.

    Mandatory Couchbase Upgrade Cycle

    When upgrading from version 2.0.x, Couchbase clusters will undergo a mandatory upgrade cycle.

    Pod readiness checks were previously driven by an exec based readiness probe. This was a security concern because it granted the Autonomous Operator pods/exec privileges, which may not be acceptable in highly regulated environments. As of Autonomous Operator 2.1, readiness checks are performed using readiness gates that use the Kubernetes API exclusively.

    You can use the couchbaseclusters.spec.rollingUpgrade configuration parameter to speed up this upgrade. To enable this feature while upgrading the Autonomous Operator, stop the old Operator, replace the CRDs, edit the couchbaseclusters.spec.rollingUpgrade field to enable bulk upgrades, then start the new Operator. Refer to Upgrade the Operator for further details.

    Release 2.2.0

    Couchbase Autonomous Operator 2.2.0 was released in June 2021.

    Platform Support

    For the latest platform support information, refer to Prerequisites and System Requirements.

    New Platform Support

    This release adds support for the following platforms:

    • Open Source Kubernetes 1.20, 1.21

    • Red Hat OpenShift Container Platform 4.7

    This release also adds support for the following utilities:

    • Rancher

    Removed Platform Support

    This release drops support for the following platforms:

    • Open Source Kubernetes 1.15, 1.16

    • Red Hat OpenShift Container Platform 4.3

    New Features and Behavioral Changes

    Couchbase Cluster Auto-scaling

    The Autonomous Operator now supports Couchbase cluster auto-scaling for all Couchbase services. This means that Couchbase cluster server class configurations containing stateful services like Data and Index can now be configured to automatically scale in response to observed metrics. (Previously, only stateless deployments of the Query Service were supported.)

    To help users get started in production, an auto-scaling best practices guide has been provided. This guide discusses relevant metrics for scaling individual Couchbase Services, and includes recommended settings based on internal benchmark testing performed by Couchbase.

    About Auto-scaling Preview Mode

    When cluster auto-scaling was introduced in Autonomous Operator 2.1, only stateless configurations that included the Query Service and Ephemeral buckets were supported. However, a special preview mode could be enabled that allowed stateful server configurations other than query to be auto-scaled. This preview mode was enabled by setting couchbaseclusters.spec.enablePreviewScaling to true.

    In Autonomous Operator 2.2, cluster auto-scaling can be enabled for any service or configuration, whether stateful or stateless. As a result, couchbaseclusters.spec.enablePreviewScaling has been deprecated and is now ignored by the Autonomous Operator.

    Online Expansion of Persistent Volumes

    The couchbaseclusters.spec.enableOnlineVolumeExpansion has been added to allow for the expansion of persistent volumes that are already in use by Couchbase clusters, thus removing the need for a rolling upgrade of cluster pods. The Autonomous Operator achieves this by working in conjunction with Kubernetes Persistent Volume Expansion to claim additional storage for running pods without any downtime.

    For additional information and requirements, refer to Online Volume Expansion.

    Log Forwarding and Audit Log Management

    The Autonomous Operator now supports log forwarding through the optional deployment of a third party log processor. The log processor runs in a sidecar container on each Couchbase pod, which then reads the log files and forwards them to standard console output. For this purpose, Couchbase supplies a default log processor image based on Fluent Bit.

    In addition, audit logging can now be configured via the CouchbaseCluster resource specification, thus allowing for automated audit logging via the Autonomous Operator.

    Improved Security Defaults

    The Couchbase Server default minimum TLS version is 1.0. This is insecure and easily compromised if a client downgrades to a version less than 1.2. Operator 2.2 makes TLS 1.2 the minimum by default, and will automatically update the TLS minimum version unless it is explicitly specified with the couchbaseclusters.spec.networking.tls.tlsMinimumVersion parameter.

    You can prevent this change from occurring when upgrading the Operator on existing clusters by setting couchbaseclusters.spec.networking.tls.tlsMinimumVersion to TLS1.0 after upgrading the CRDs, and before restarting the Operator.

    Improved TLS Certificate and Key Handling

    Couchbase Server requires that the TLS resources be called pkey.key and chain.pem, however the majority of 3rd-party certificate managers generate tls.key and tls.crt. This format is governed by the Kubernetes kubernetes.io/tls secret type. To provide integration with 3rd-party providers, and to maintain backward compatibility with existing clusters, the Operator now provides secret shadowing.

    A new TLS source — couchbaseclusters.spec.networking.tls.secretSource — allows you to use kubernetes.io/tls type secrets to configure Couchbase Server. When using this new TLS source, a shadow copy of the TLS secret is created, with the key names changed to those hard coded in Couchbase Server. The shadow secret can then be mounted and used inside Couchbase Server pods. As the secret is shadowed, the Operator can reformat private keys, and therefore now supports PKCS#8 formatted private keys.

    3rd-party certificate managers can be used with clusters that are configured with this new TLS source. Refer to the tutorial Using a Certificate Manager for an example.

    For existing clusters using the legacy couchbaseclusters.spec.networking.tls.static source, the Operator works as before — directly mounting and consuming the TLS secret without a shadow secret.

    Improved Server Group Support

    Prior to this release, server groups had to be enabled on cluster creation, and were immutable for the lifetime of the cluster. These restrictions have now been lifted.

    It is now possible to enable Couchbase Server groups, and Operator pod scheduling, while a cluster is running.

    It is also now possible to modify the list of server groups used for scheduling. Previously if an availability zone were to suffer an outage, the Operator would continually attempt to recreate failed pods in the same availability zone to maintain even balance across the requested set. While the availability zone continues to suffer an outage, the Couchbase cluster would be undersized, potentially impacting the ability to service client requests. This release allows the availability zones to be modified, excluding the failed one, so that the Operator is able to scale the cluster back up to the correct size.

    All server group migration operations use a shortest-path algorithm in order to minimize disruption.

    Improved XDCR Connection Support

    Prior to this release, any updates to XDCR remote clusters would be silently ignored. In order to modify any setting, the remote cluster and all replications would have to be deleted and recreated from scratch.

    This release adds the ability to modify remote cluster identification and authentication settings. This provides the ability to rotate passwords and certificates on the remote cluster, or even replace the remote cluster entirely.

    Automatic Resource Allocation

    Kubernetes allows pods to reserve and limit compute resources. Resource reservation provides Kubernetes with the ability to fairly schedule pods so that they don’t compete for CPU and memory.

    The Operator now provides the ability to automatically manage pod CPU and memory resource requests with the couchbaseclusters.spec.autoResourceAllocation field. When in use, pods will have their resource requests automatically populated depending on the services enabled on that pod, and the individual Couchbase service quotas.

    Updating quotas will cause the cluster to upgrade as pod resource requests are updated. A new dummy memory quota for the query service is introduced to allow management of memory resource requests, even though Couchbase itself does not provide the ability to constrain this service.

    Enhanced Couchbase Upgrade Behavior

    Prior to this release, the Operator allowed two types of upgrade — a one pod at a time rolling upgrade, and a whole cluster upgrade.

    This release allows rolling upgrades to be extended to upgrade either a fixed number of pods, or a percentage of the cluster size. Both values can be set, and the Autonomous Operator will select the one that results in the fewest number of pods being upgraded at a time.

    Rolling upgrades can be configured with the couchbaseclusters.spec.rollingUpgrade parameter.

    A whole cluster "immediate" upgrade is now synonymous with a rolling upgrade set to upgrade 100% of the cluster at a time. Immediate upgrades may be removed at some point in the future as it duplicates functionality.

    Enhanced Backup and Restore

    With the release of Autonomous Operator 2.2, the operator-backup container image has been enhanced to support a wider range of Couchbase Server and Autonomous Operator versions. As part of this update, the operator-backup image has switched to semantic versioning which is no longer based on Couchbase Server version.

    At the time of this writing, the new image — operator-backup:1.1.0 — is compatible with all versions of the Autonomous Operator that support managed backup and restore (version 2.0 and later).

    The new image is still based on the cbbackupmgr utility, therefore its ability to backup and restore data between different versions of Couchbase Server is still dependent on the compatibility of the underlying cbbackupmgr version. However, the operator-backup:1.1.0 image is based on cbbackupmgr 6.6.2, which is capable of backing up and restoring data from all versions of Couchbase Server that have ever been supported by the Autonomous Operator up to this point. When support for new Couchbase Server versions is added to the Autonomous Operator, support for backing up those versions will be added to the latest operator-backup image.

    Online Backup Volume Resizing

    While not a replacement for a properly implemented monitoring solution, it may be beneficial to have your backup storage grow with your requirements. To this end, the Autonomous Operator now supports online resizing of backup volumes.

    Backup volumes can be resized in one of two ways:

    1. Backup volumes can be manually resized by directly editing the couchbasebackups.spec.size field.

    2. Backup volumes can be automatically resized by the Autonomous Operator, thus keeping backups to the minimum size necessary to operate, helping to reduce cost while also supporting future expansion. In this mode of operation, the couchbasebackups.spec.autoScaling field controls the behavior.

    In order to use any of these resize capabilities, the underlying StorageClass associated with the backup volume must be configured to allow volume expansion.

    General Backup and Restore Enhancements

    By default, backup and restore operations execute with only a single thread of execution, which can sometimes lead to poor performance. To mitigate thread-related performance issues, couchbasebackups.spec.threads and couchbasebackuprestores.spec.threads can now be specified to configure the number of concurrent cbbackupmgr clients to use when backing up or restoring data.

    This release also introduces filters that allow you to control exactly what is restored from a backup. In previous releases, restore operations were fairly inflexible in that the only option was to restore all data from a backup. Now you can choose to include or exclude specific buckets, restore data for a particular service, and even restore to a different bucket name when restoring documents. Refer to Additional Restore Options for more information.

    Customizable Prometheus Metrics

    The latest version of the Couchbase Prometheus Exporter allows for certain customizations to exported metrics. The following customizations are currently supported:

    • Change the namespace, subsystem, name, and help text for each metric.

    • Enable and disable whether a metric is exported to Prometheus.

    For more information, refer to Customizing Metrics.

    Fixed Issues

    Issue Description

    Summary: When using service types in both the console and exposed feature service templates, a logical problem meant that the deprecated service type field defaults overwrote those provided in the template. For example, when setting the template type to LoadBalancer, this may have ended up as NodePort due to the old defaults taking precedence. This has now been fixed so that the template will take precedence.

    Summary: Prior to backup images couchbase/operator-backup:6.5.0-XXX, couchbase/operator-backup:6.6.0-XXX, and couchbase/operator-backup:1.0.0 a bug meant that all backup traffic between the job and Couchbase server was in plain text.

    + While these backup images will work with plain TLS and mTLS, it will not work with mandatory mTLS configurations, as cbbackupmgr does not support client certificate authentication. Autonomous Operator 2.2 will fall back to plain text backup when using mandatory mTLS in order to facilitate functional backups.

    Summary: Istio in mTLS mode is unable to run using Job or CronJob resource types. This is due to the Envoy sidecar proxy not terminating when a backup or restore Job does, thus keeping the Pod alive and blocking further execution. This is remedied in Couchbase Operator Backup versions 1.0.0 and above by sending a termination signal to the Envoy side car on termination of the backup and restore Job.

    Summary: On dynamic platforms (GKE Autopilot being one example), where deployments can be rescheduled to better utilize system resources, we detected a deadlock situation. In this scenario, the Autonomous Operator could be terminated while a Couchbase Server pod was still being initialized, and thus on restart it would look okay, however Couchbase Server would refuse to respond to the Autonomous Operator. This has been remedied by annotating Couchbase Server pods when we know they have been fully initialized, and thus can be terminated when we know they are uninitialized and it’s safe to do so, then re-created.

    Summary: A bug in the Autonomous Operator meant that when using LDAP groups, nesting could not be turned off. This is now fixed in this release.

    Summary: Prior to this release, setting the auto-compaction time window managed the wrong configuration settings, resulting in auto-compaction running at unexpected times. This has now been fixed to update the correct settings. Be aware that when upgrading to this release with non-default auto-compaction settings, this will cease to erroneously manage index auto-compaction, and start managing global auto-compaction.

    Summary: A race condition existed where the Autonomous Operator was restarted and a backup Pod was still executing. This Pod was erroneously considered as part of a metadata update routine and caused a crash loop until the backup terminated. This has now been fixed by correctly filtering the pods considered for metadata updates to only include Couchbase Server instances.

    Other Notable Changes

    • The dynamic admission controller (DAC) now allows couchbaseclusters.spec.servers.volumeMounts.index to be configured for server classes that include the Search Service and that don’t include the Index Service. Previously, the DAC would reject configurations that specified the index volume mount unless the Index Service was also included in the server class.

    Feedback

    You can have a big impact on future versions of the Operator (and its documentation) by providing Couchbase with your direct feedback and observations. Please feel free to post your questions and comments to the Couchbase Forums.

    Licenses for Third-Party Components

    The complete list of licenses for Couchbase products is available on the Legal Agreements page. Couchbase is thankful to all of the individuals that have created these third-party components.