March 23, 2025
+ 12
A how-to guide on data topology synchronization with Couchbase Autonomous Operator.

Overview

In the following guide, we’ll show you how to discover the configuration of a Couchbase cluster in the form of Kubernetes resources, and how the Autonomous Operator manages those resources.

Prerequisites

In your Couchbase cluster load the travel-sample sample bucket. See Load the Sample Dataset to learn how to load a sample bucket.

On this cluster we’ll enable synchronization to discover the bucket and its scopes and collections in the form of Kubernetes resources.

For the purposes of this guide, we’ll be referring to an example cluster called cb-example. Substitute the name of the cluster with your own if necessary.
data topology synchronized prereq buckets

Enabling Synchronization

In order to enable the synchronization, firstly we have to disable bucket management in the cluster. This is to ensure that existing configuration is not deleted once backing resources are removed.

console
$ kubectl patch couchbasecluster cb-example --type=merge -p '{"spec":{"buckets":{"managed":false}}}'

The above command disables bucket management for a Couchbase cluster named cb-example.

Next, remove any resources that may conflict with those the Operator will generate.

console
$ for i in couchbasebuckets \ couchbaseephemeralbuckets \ couchbasememcachedbuckets \ couchbasescopes \ couchbasescopegroups \ couchbasecollections \ couchbasecollectiongroups do kubectl delete $i --all done

This command deletes all resources in the namespace that will be affected by a synchronization operation. You can replace --all with a label, or field, selector if you wish to be more selective, especially in the case where multiple Couchbase clusters are running in the same namespace.

Data topology resources can be shared between clusters. If they are shared, then deletion may affect another — unrelated — cluster and result in data loss. For this reason we recommend only ever deploying one Couchbase cluster per namespace.

Synchronizing

Once you have manually updated the data topology to how you want it, we can begin synchronizing it so it can be managed.

Because multiple Couchbase clusters can run in the same namespace, there is a danger that any resource created by synchronization may be erroneously picked up by another cluster. For this reason, the Operator enforces the use of a label selector to generate and select buckets for inclusion on the cluster to be synchronized. It is your responsibility to ensure any other clusters in this namespace have a unique bucket label selector that will not be affected by this synchronization operation.

Synchronization is triggered by first setting a label selector, then triggering the operation:

console
$ kubectl patch couchbasecluster cb-example --type merge -p '{"spec":{"buckets":{"selector":{"matchLabels":{"foo":"bar"}}}}}' $ kubectl patch couchbasecluster cb-example --type merge -p '{"spec":{"buckets":{"synchronize":true}}}'

After the above commands, the Operator will begin the synchronization process and it may take some time depending on the number of resources to synchronize.

The synchronization operation proceeds as follows:

  1. Couchbase server is polled for all buckets, scopes and collections.

  2. Kubernetes resources are generated for those Couchbase resources.

    1. Buckets are labeled as defined by the provided label selector, therefore they should be considered by this cluster only.

  3. Kubernetes resources are created and persisted.

  4. The Operator reports the status in the cluster conditions.

Once synchronization has been triggered, you should not make any more manual adjustments to the data topology. Doing so may result in a conflict between what is expected and what has already been generated and committed. If you do encounter a conflict, then restart the process from the Enabling Synchronization stage to remove the conflicting resource.

To check for completion status, you can wait until the condition is reported:

console
$ kubectl wait --for=condition=Synchronized couchbasecluster/cb-example

Once the synchronization is complete you’ll see the below output:

Result
console
couchbasecluster.couchbase.com/cb-example condition met

Additionally, you can check whether or not the synchronization succeeded.

console
$ kubectl describe couchbasecluster/cb-example | grep Synchronized -B 5
Result
console
Last Transition Time: 2022-04-06T09:09:29Z Last Update Time: 2022-04-06T09:09:29Z Message: Data topology synchronized and ready to be managed Reason: SynchronizationComplete Status: True Type: Synchronized

From the above command output, we can see the synchronization is completed.

To verify this, we can check the couchbasebuckets, couchbasescopes, and couchbasecollections resources in the Kubernetes cluster.

console
$ kubectl get couchbasebuckets
Result
console
NAME MEMORY QUOTA REPLICAS IO PRIORITY EVICTION POLICY CONFLICT RESOLUTION AGE bucket-b9982f53695b5568909c430adb35b7081ce806f03e4595271617f98e02e9860f 100Mi 1 low valueOnly seqno 4m25s

Unlike save and restore, synchronization does not optimize the data topology Kubernetes resources. In a worst case scenario, where 1000’s of scopes and collections are in use, then you can expect synchronization to take several minutes due to the throttling of requests to the Kubernetes API to ensure fair use.

Managing Synchronized Resources

You must ensure synchronization has completed successfully before switching to managed mode. Failure to do so may result in backing resources not being created, and data loss.

Now that we have confirmed synchronization has completed successfully, we can switch the cluster’s bucket management on:

console
$ kubectl patch couchbasecluster cb-example --type=merge -p '{"spec":{"buckets":{"synchronize":false,"managed":true}}}'

From this point onward, any managed resources that are deleted or modified manually will be recovered, and any additional resources that are added will be deleted, as per the usual operation of the Operator.

Scenarios

Oops Scenario

Let’s check the scopes of the travel-sample bucket first. In the image below, we can see that there are a total of 7 scopes in this bucket.

data topology synchronized resources

Let’s assume a scenario where we deleted one of the scopes (tenant_agent_00) by mistake. If the resource was synchronized, the operator should be able to recover it.

data topology synchronized drop scope

In the above image we are deleting the tenant_agent_00 scope and after some time, the Operator should recover it.

data topology synchronized deleted scope

In order to see if the resources are recovered or not, we can look at the events of the CouchbaseCluster resource.

console
$ kubectl describe couchbaseclusters cb-example
Result
console
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NewMemberAdded 52m New member cb-example-0000 added to cluster Normal NewMemberAdded 52m New member cb-example-0001 added to cluster Normal NewMemberAdded 52m New member cb-example-0002 added to cluster Normal RebalanceStarted 52m A rebalance has been started to balance data across the cluster Normal RebalanceCompleted 52m A rebalance has completed Normal EventScopesAndCollectionsUpdated 38m Scopes and collections updated for bucket travel-sample

Here, we have just copied the output of events from the above command. The last event says scopes and collections are updated for the bucket.

This means the operator has recovered the resource and we can verify the same on Couchbase UI.

data topology synchronized recovered