If you do not specify any durability requirements, the server will respond with a success message if the data has been acknowledged and processed in the managed cache.
Because persistence and replication are asynchronous tasks and happen eventually, there is a time gap where a node failure can lead to data loss. This time gap occurs exactly when data has neither been replicated to another node nor persisted to disk.
In general, for most use cases this time gap is totally acceptable (since it is usually very small), but for the important mutation operations we need stronger guarantees. You can make changes such as to acknowledge only once the data has been replicated or persisted, which by default makes the whole system slower. However, Couchbase Server has another way to handle those situations:
The SDK will perform the normal mutation without durability requirements.
If the server returns with a successful response, the client will start polling.
The client polls all the affected (more in a bit on this) nodes for this mutation until either the desired state is reached or it can’t be reached for a reason.
In the successful case, the operation will also complete towards the application layer, and in the failure case the client will error out the operation, leaving the user to decide what’s next.
The following code will make sure that the document has been persisted on the active node for this document ID and also replicated to one of the configured replicas:
cas, err := myBucket.UpsertDura(docToStore, valueToStore, 1, 1)
In this example, the client will poll two nodes until completion: the active node for this document and also the configured replica. If any of the constraints cannot be fulfilled, an error will be returned.
If you wonder why in the failure case we put the burden on you to figure out what’s next: the SDK has no idea of your SLAs or intents to what to do when the operation fails. Sometimes it might be fine to proceed and log the error, in other cases you may want sophisticated retry mechanisms where the SDK can guide you with functionality, but not the actual execution semantics.
Couchbase Server 4.0 introduces a new feature that allows the SDK to be more accurate during the observe poll cycle, especially in the concurrent and failover cases. Instead of using the CAS to verify mutations, it uses sequence numbers and partition UUIDs.
They are automatically enabled by default when available and will be used for any durability operations performed.
Couchbase Server is widely recognized for its excellent and predictable performance. One of the reasons for that is its managed cache, which allows it to return a response very quickly without taking replication or persistence latency into account.
If you need to make sure data is replicated and/or persisted. your network or disk performance will be the dominant factor. If you need high throughput and durability requirements, make sure (and measure) to have fast disks (SSD) and/or fast network.
Because more than one node in general is involved and more round trips are needed, think about realistic timeouts you want to set and measure them in production. All timeouts you set on the blocking API need to take the original mutation and all subsequent polls into account until the durability requirement is met.