Release Notes for Couchbase Autonomous Operator 2.3

    +

    Couchbase Autonomous Operator 2.3 is a significant release that expands support for Couchbase Server 7, providing full support for scope and collections.

    Take a look at the What’s New page for a list of new features and improvements that are available in this release.

    Installation

    Upgrading to Autonomous Operator 2.3

    The necessary steps needed to upgrade to this release depend on which version of the Autonomous Operator you are upgrading from.

    • Version 1.x.x

      • There is no direct upgrade path from versions prior to 2.0.x. To upgrade from a 1.x.x release, you must first upgrade to 2.0.x, paying particular attention to supported Kubernetes platforms and Couchbase Server versions. Refer to the 2.0.x upgrade steps if upgrading from a 1.x.x release.

    • Version 2.0.x

    • Version 2.1.x

      • There are no additional upgrade steps or considerations when upgrading from this version. You may follow standard upgrade process.

    • Version 2.2.x

      • There are no additional upgrade steps or considerations when upgrading from this version. You may follow standard upgrade process.

    Upgrading from Version 2.0.x

    First, ensure that you are running compatible versions of Kubernetes and Couchbase Server before upgrading.

    TLS Requirements

    If you are not utilizing TLS, you can skip this section.

    The TLS requirements have been modified as of Autonomous Operator 2.1. In order to ease the migration from legacy client bootstrap (CCCP) to the newest version (GCCCP), the Autonomous Operator requires Couchbase cluster subject alternative names (SANs) to be updated. Consult the TLS tutorial for a full list of all the required SANs, and the TLS rotation guide in order to prepare for upgrade. Failure to perform this step will result in errors from the dynamic admission controller (DAC) once upgraded.

    Mandatory Couchbase Upgrade Cycle

    When upgrading from version 2.0.x, Couchbase clusters will undergo a mandatory upgrade cycle.

    Pod readiness checks were previously driven by an exec based readiness probe. This was a security concern because it granted the Autonomous Operator pods/exec privileges, which may not be acceptable in highly regulated environments. As of Autonomous Operator 2.1, readiness checks are performed using readiness gates that use the Kubernetes API exclusively.

    You can use the couchbaseclusters.spec.rollingUpgrade configuration parameter to speed up this upgrade. To enable this feature while upgrading the Autonomous Operator, stop the old Operator, replace the CRDs, edit the couchbaseclusters.spec.rollingUpgrade field to enable bulk upgrades, then start the new Operator. Refer to Upgrade the Operator for further details.

    Upgrading from 2.0, 2.1, and 2.2

    Mandatory Couchbase Upgrade Cycle

    Some users will encounter a mandatory upgrade cycle when upgrading to this release.

    An upgrade cycle is a relatively heavyweight operation that requires all pods in the cluster to be replaced, and data transferred between the old and new pods. The time taken to perform this operation is dependent on network bandwidth, disk IO and the amount of data resident in the database. For large, production databases, ensure an adequate maintenance window is scheduled as to minimize any disruption to clients and other business critical functions.

    For further information read the Couchbase Upgrade concepts page.

    TLS Users

    In prior versions, Couchbase Server pod readiness checks were done over plain text port 8091. With the introduction of strict mode TLS in this release, port 8091 can be disabled to inhibit all non TLS ports. The Operator has been updated to use port 18091 — Couchbase Server’s TLS admin port — when TLS management is enabled.

    Server Groups Users

    A pod upgrade is also required for users of the server groups feature.

    When the Operator was first released, the only available Node label that was available across all platforms to allow explicit pod scheduling was failure-domain.beta.kubernetes.io/zone. This label has been deprecated since Kubernetes 1.17, in favor of topology.kubernetes.io/zone.

    To ensure correct operation of the Operator in the future, the decision has been taken to start using the new label as of Operator 2.3. This ensures that the Operator is compatible with all future Kubernetes releases in the event of the deprecated label being removed.

    Logging Users

    To control memory usage, memory buffer limits have been implemented for the logging sidecar. This change will affect all users of logging, and will require a cluster upgrade.

    Additionally a bug in the audit clean up script caused old logs to be deleted only if their age matched the specified time period exactly. This has been fixed so that logs older than or equal to the time period are cleaned.

    Release 2.3.0

    Couchbase Autonomous Operator 2.3.0 was released in April 2022.

    New Features and Behavioral Changes

    Data Topology Synchronization

    This feature allows manually created buckets, scopes and collections to be synchronized as Kubernetes primitives and then managed.

    For more information, see the Data Topology Save, Restore and Synchronization concepts page.

    Data Topology Save and Restore

    This feature allows manually created buckets, scopes and collections to be saved, then restored to any cluster. Unlike synchronization, this feature supports partial save and restore, different merge policies, resource compaction and garbage collection.

    For more information, see the Data Topology Save, Restore and Synchronization concepts page.

    Known Issues

    Issue Description

    Summary: Kubernetes resource names are based on DNS, and may be up to ~250 characters in length. Prior to Couchbase Server 6.6.5 and 7.0.0, host names were used to generate a key for internal communication, however, long names caused a buffer overflow.

    This was fixed in 6.6.5 (though not fully implemented for Eventing) and 7.0.0. When using long namespaces or cluster names, ensure that you use a version of Couchbase Server greater than 6.6.5 or 7.0.0.

    Summary: The restore process can determine what repository to use in a CouchbaseBackupRestore resource, and therefore the repo field should be entirely optional. However, the underlying API erroneously requires this field to be set.

    To avoid any errors, we recommend setting the repo field to an empty string until the issue is fixed in a future release.

    apiVersion: couchbase.com/v2
    kind: CouchbaseBackupRestore
    spec:
      repo: ""

    Summary: Due to a mismatch between the catalog version of our backup image and the version recommended by the certification tool, you will need to override the default backup used for certification with OpenShift. The override can be provided via:

    $  cao certify -- -backup-image registry.connect.redhat.com/couchbase/operator-backup:1.3.0-2

    Summary: Currently, it is not possible to specify a raw IPv6 Address for XDCR.
    This will be fixed in the next Operator release, and will include an update to the CRD. As a workaround to support XDCR to an IPv6 endpoint, you will need to remove the XDCR hostname rules from the CouchbaseCluster custom resource.

    Run the kubectl edit command:

    $ kubectl edit crd couchbaseclusters.couchbase.com

    Then, remove pattern from the yaml:

    xdcr:
      hostname:
        description: Hostname is the connection string to use to connect the remote cluster.
        pattern: ^((couchbase|couchbases|http|https)://)?[0-9a-zA-Z\-\.]+(:\d+)?(\?network=[^&]+)?$

    Note that removing pattern will disable any validation on this field (i.e. if an invalid hostname is provided it won’t be validated). Contact the Couchbase Support team for further help on this workaround.

    Summary: At this point in time, you cannot set the storage class used for the artifact logs during certification. This means the default storage class is always used when running the cao certify command.
    Passing the -storage-class flag to the command will only affect volumes created during test, but not the volume used to store test results.

    If you need to use a specific type of storage class for both artifacts and testing, please contact the Couchbase Support team for advice on a workaround.

    Release 2.3.0-beta1

    Couchbase Autonomous Operator 2.3.0-beta1 was released in October 2021.

    New Features and Behavioral Changes

    Logging Defaults

    The new default image for logging is couchbase/fluent-bit:1.1.1. This image features functional and security updates.

    Existing clusters will continue to use the logging version they were provisioned with. Should you wish to update existing Couchbase clusters to make use of this new image, then they will undergo a rolling upgrade to facilitate the update. You should plan a maintenance window accordingly.

    Scopes and Collections Management

    The Operator now has the ability to fully manage scopes and collections within a bucket. Scopes and collections provide fine grained access control and replication, and improved scalability.

    For further information consult the Scopes and Collections Concepts documentation.

    Backup and Restore Improvements

    Due to interface changes introduced in Couchbase Server 7.0, a new backup image (1.2.0) is the only supported version that will run with Operator 2.3. It continues to support operation with all supported Couchbase Server versions.

    Ensure that any backup jobs are upgraded to use the new image when moving to Operator 2.3.

    Backup 1.2.0 fully supports operation with Couchbase scopes and collections. Additional improvements include support for filtering of backup source data — thus minimizing backup size and improving performance, and the addition of new options for filtering restore data — such as with document key and value regular expressions.

    Removal of Admission Controller Mutation

    Prior to this release, the dynamic admission controller was utilized to provide some defaults to Couchbase custom resources. The vast majority of these defaults have already been migrated to native CRD defaulting. Mutation has now been fully removed to provide out-of-the-box compatibility with platforms like GKE Autopilot where mutation is prohibited.

    Some defaults were not able to be migrated so have been removed entirely:

    • Default file system groups for persistent volumes (does not affect Red Hat OCP). When using Couchbase with PVCs, the operator tries to reuse data where possible. In order for an old volume to be used by a new pod, the data needs to be read and written by the same group across all pods. Clusters that use backups are also affected, as these reuse PVCs across backup jobs.

      Previously the DAC provided dynamic defaults depending on the platform (Kubernetes/Red Hat OCP). While OCP should work without specifying the file system group, Kubernetes users will need to explicitly specify the group when using persistent volumes. Follow the existing persistent volume concepts documentation for guidance on configuration.

    • Backup and Prometheus images. These were dynamically populated by the DAC for ease of use, depending on platform. These fields are now marked as required to be provided by the end user. Up to date images can be found on Docker hub and the Red Hat container catalog. See the prerequisites documentation for compatible image versions.

    Tooling Updates

    Previous releases came bundled with cbopcfg and cbopinfo binaries to aid installation and support requests respectively. While these binaries are still part of the distribution, the new cao binary features all the existing functionality of these tools, and extends the feature set to allow self-service platform certification. The existing cbopcfg and cbopinfo binaries are now deprecated and will be removed in a later release.

    You should update any tooling to use cao, or you can alias existing commands to use the new binary:

    $ alias cbopcfg='cao'
    $ alias cbopinfo='cao collect-logs'

    Fixed Issues

    Issue Description

    Summary: When recovering a pod and using persistent volume storage, there is the possibly of a race condition when running the Couchbase Server pod’s initialization container. This occurs when the underlying storage provider returns an error, rather than a definitive answer a to whether a file exists. The error appears as if the file doesn’t exist, and so the container erroneously reinitializes its persistent storage and resets configuration, particularly storage path locations. This fix removes the condition check, and replaces with a non-destructive copy instead.

    Summary: Couchbase Cluster becomes inoperable when its internal replication streams are longer than 200 Characters. This occurs when Operator is installed with Couchbase Server version 6.6.3 and below and is caused by runtime validation checks made by Couchbase Server against internal replication streams. The fix is made to Couchbase Server 6.6.4 and 7.0.x. Also, the Couchbase Helm Chart has been improved to shorten the default cluster name in order to prevent the creation of clusters with long names.

    Feedback

    You can have a big impact on future versions of the Operator (and its documentation) by providing Couchbase with your direct feedback and observations. Please feel free to post your questions and comments to the Couchbase Forums.

    Licenses for Third-Party Components

    The complete list of licenses for Couchbase products is available on the Legal Agreements page. Couchbase is thankful to all of the individuals that have created these third-party components.