Distributed Transactions from the Go SDK

how-to

A practical guide to using Couchbase’s distributed ACID transactions, via the Go API.

This document presents a practical HOWTO on using Couchbase transactions, following on from our transactions documentation.

Requirements

Couchbase Server 7.0.0 or above.
Couchbase Go SDK 2.4.0 or above.
NTP should be configured so nodes of the Couchbase cluster are in sync with time.
The application, if it is using extended attributes (XATTRs), must avoid using the XATTR field txn, which is reserved for Couchbase use.

If using a single node cluster (for example, during development), then note that the default number of replicas for a newly created bucket is 1. If left at this default, then all Key-Value writes performed at with durability will fail with a ErrDurabilityImpossible. In turn this will cause all transactions (which perform all Key-Value writes durably) to fail. This setting can be changed via GUI or command line. If the bucket already existed, then the server needs to be rebalanced for the setting to take effect.

Getting Started

Couchbase transactions require no additional components or services to be configured.

Initializing Transactions

The starting point is the Transactions object. The Transactions object is effectively a singleton belonging to a Cluster object, internally Transactions is created on gocb.Connect(..) and its lifetime is bound to the parent Cluster object. Multiple calls to cluster.Transactions() will yield the same Transactions object, this is because the Transactions object performs automated background processes that should not be duplicated.

// Initialize the Couchbase cluster
opts := gocb.ClusterOptions{
	Authenticator: gocb.PasswordAuthenticator{
		Username: "Administrator",
		Password: "password",
	},
}

cluster, err := gocb.Connect("localhost", opts)
if err != nil {
	panic(err)
}

bucket := cluster.Bucket("travel-sample")

scope := bucket.Scope("inventory")
collection := scope.Collection("airport")

transactions := cluster.Transactions()

Configuration

Transactions can optionally be globally configured at the point of creating the Cluster object:

opts := gocb.ClusterOptions{
	Authenticator: gocb.PasswordAuthenticator{
		Username: "Administrator",
		Password: "password",
	},
	TransactionsConfig: gocb.TransactionsConfig{
		DurabilityLevel: gocb.DurabilityLevelPersistToMajority,
	},
}

The default configuration will perform all writes with the durability setting Majority, ensuring that each write is available in-memory on the majority of replicas before the transaction continues. There are two higher durability settings available that will additionally wait for all mutations to be written to physical storage on either the active or the majority of replicas, before continuing. This further increases safety, at a cost of additional latency.

A level of None is present but its use is discouraged and unsupported. If durability is set to None, then ACID semantics are not guaranteed.

Creating a Transaction

A core idea of Couchbase transactions is that an application supplies the logic for the transaction inside a lambda, including any conditional logic required, and the transaction is then automatically committed. If a transient error occurs, such as a temporary conflict with another transaction, then the transaction will rollback what has been done so far and run the lambda again. The application does have to do these retries and error handling itself.

Each run of the lambda is called an attempt, inside an overall transaction.

result, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	// The lambda gets passed an AttemptContext object, which permits getting, inserting,
	// removing and replacing documents, and performing N1QL queries.

	// ... Your transaction logic here ...

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	log.Printf("%+v", err)
}

The lambda gets passed a TransactionAttemptContext object, generally referred to as ctx here.

Since the lambda may be rerun multiple times, it is important that it does not contain any side effects. In particular, you should never perform regular operations on a Collection, such as collection.Insert(), inside the lambda. Such operations may be performed multiple times, and will not be performed transactionally. Instead such operations must be done through the ctx object, e.g. ctx.Insert().

Examples

A code example is worth a thousand words, so here is a quick summary of the main transaction operations. They are described in more detail below.

scope := cluster.Bucket("travel-sample").Scope("inventory")

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	// Inserting a doc:
	_, err := ctx.Insert(collection, "doc-a", map[string]interface{}{})
	if err != nil {
		return err
	}

	// Getting documents:
	_, err = ctx.Get(collection, "doc-a")
	// Use err != nil && !errors.Is(err, gocb.ErrDocumentNotFound) if the document may or may not exist
	if err != nil {
		return err
	}

	// Replacing a doc:
	docB, err := ctx.Get(collection, "doc-b")
	if err != nil {
		return err
	}

	var content map[string]interface{}
	err = docB.Content(&content)
	if err != nil {
		return err
	}
	content["transactions"] = "are awesome"
	_, err = ctx.Replace(docB, content)
	if err != nil {
		return err
	}

	// Removing a doc:
	docC, err := ctx.Get(collection, "doc-c")
	if err != nil {
		return err
	}

	err = ctx.Remove(docC)
	if err != nil {
		return err
	}

	// Performing a SELECT N1QL query against a scope:
	qr, err := ctx.Query("SELECT * FROM hotel WHERE country = $1", &gocb.TransactionQueryOptions{
		PositionalParameters: []interface{}{"United Kingdom"},
		Scope:                scope,
	})
	if err != nil {
		return err
	}

	type hotel struct {
		Name string `json:"name"`
	}

	var hotels []hotel
	for qr.Next() {
		var h hotel
		err = qr.Row(&h)
		if err != nil {
			return err
		}

		hotels = append(hotels, h)
	}

	// Performing an UPDATE N1QL query on multiple documents, in the `inventory` scope:
	_, err = ctx.Query("UPDATE route SET airlineid = $1 WHERE airline = $2", &gocb.TransactionQueryOptions{
		PositionalParameters: []interface{}{"airline_137", "AF"},
		Scope:                scope,
	})
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
var ambigErr gocb.TransactionCommitAmbiguousError
if errors.As(err, &ambigErr) {
	log.Println("Transaction possibly committed")

	log.Printf("%+v", ambigErr)
	return
}
var failedErr gocb.TransactionFailedError
if errors.As(err, &failedErr) {
	log.Println("Transaction did not reach commit point")

	log.Printf("%+v", failedErr)
	return
}
if err != nil {
	panic(err)
}

Transaction Mechanics

While this document is focussed on presenting how transactions are used at the API level, it is useful to have a high-level understanding of the mechanics. Reading this section is completely optional.

Recall that the application-provided lambda (containing the transaction logic) may be run multiple times by Couchbase transactions. Each such run is called an attempt inside the overall transaction.

Active Transaction Record Entries

The first mechanic is that each of these attempts adds an entry to a metadata document in the Couchbase cluster. These metadata documents:

Are named Active Transaction Records, or ATRs.
Are created and maintained automatically.
Begin with "_txn:atr-".
Each contain entries for multiple attempts.
Are viewable, and they should not be modified externally.

Each such ATR entry stores some metadata and, crucially, whether the attempt has committed or not. In this way, the entry acts as the single point of truth for the transaction, which is essential for providing an 'atomic commit' during reads.

Staged Mutations

The second mechanic is that mutating a document inside a transaction, does not directly change the body of the document. Instead, the post-transaction version of the document is staged alongside the document (technically in its extended attributes (XATTRs)). In this way, all changes are invisible to all parts of the Couchbase Data Platform until the commit point is reached.

These staged document changes effectively act as a lock against other transactions trying to modify the document, preventing write-write conflicts.

Cleanup

There are safety mechanisms to ensure that leftover staged changes from a failed transaction cannot block live transactions indefinitely. These include an asynchronous cleanup process that is started with the first transaction, and scans for expired transactions created by any application, on the relevant collections.

Note that if an application is not running, then this cleanup is also not running.

The cleanup process is detailed below in Asynchronous Cleanup.

Committing

Only once the lambda has successfully run to conclusion, will the attempt be committed. This updates the ATR entry, which is used as a signal by transactional actors to use the post-transaction version of a document from its XATTRs. Hence, updating the ATR entry is an 'atomic commit' switch for the transaction.

After this commit point is reached, the individual documents will be committed (or "unstaged"). This provides an eventually consistent commit for non-transactional actors.

Key-Value Mutations

Replacing

Replacing a document requires a ctx.Get() call first. This is necessary so that the transaction can check that the document is not involved in another transaction. If it is, then the SDK will handle this at the ctx.Replace() point. Generally, this involves rolling back what has been done so far, and retrying the lambda.

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	doc, err := ctx.Get(collection, "replace-doc")
	if err != nil {
		return err
	}

	var content map[string]interface{}
	err = doc.Content(&content)
	if err != nil {
		return err
	}
	content["transactions"] = "are awesome"

	_, err = ctx.Replace(doc, content)
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

Removing

As with replaces, removing a document requires a ctx.Get() call first.

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	doc, err := ctx.Get(collection, "remove-doc")
	if err != nil {
		return err
	}

	err = ctx.Remove(doc)
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

Inserting

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	_, err := ctx.Insert(collection, "insert-doc", map[string]interface{}{})
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

Key-Value Reads

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	doc, err := ctx.Get(collection, "get-doc")
	if err != nil {
		return err
	}

	var content interface{}
	err = doc.Content(&content)
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

Getting a document with Key-Value can return an ErrDocumentNotFound which can be ignored if you are unsure if the document exists, or it not existing does not matter:

_, err = cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	doc, err := ctx.Get(collection, "get-doc")
	if err != nil && !errors.Is(err, gocb.ErrDocumentNotFound) {
		return err
	}

	fmt.Println(doc != nil)

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

If the ErrDocumentNotFound is not ignored then Get will cause the transaction to fail with TransactionFailedError (after rolling back any changes, of course). ErrDocumentNotFound is one of very few errors that the SDK will allow you to ignore, the SDK internally tracks the state of the transaction and will not allow illegal operations to continue.

Gets will 'read your own writes', e.g. this will succeed:

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	_, err := ctx.Insert(collection, "ownwritesdoc", map[string]interface{}{})
	if err != nil {
		return err
	}

	doc, err := ctx.Get(collection, "ownwritesdoc")
	if err != nil {
		return err
	}

	var content interface{}
	err = doc.Content(&content)
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

N1QL Queries

As of Couchbase Server 7.0, N1QL queries may be used inside the transaction lambda, freely mixed with Key-Value operations.

BEGIN TRANSACTION

There are two ways to initiate a transaction with Couchbase 7.0: via the SDK, and via the query service directly using BEGIN TRANSACTION. The latter is intended for those using query via the REST API, or using the query workbench in the UI, and it is strongly recommended that application writers instead use the SDK. This provides these benefits:

It automatically handles errors and retrying.
It allows Key-Value operations and N1QL queries to be freely mixed.
It takes care of issuing BEGIN TRANSACTION, END TRANSACTION, COMMIT and ROLLBACK automatically. These become an implementation detail and you should not use these statements inside the lambda.

Supported N1QL

The majority of N1QL DML statements are permitted within a transaction. Specifically: INSERT, UPSERT, DELETE, UPDATE, MERGE and SELECT are supported.

DDL statements, such as CREATE INDEX, are not.

Using N1QL

If you already use N1QL from the Go SDK, then its use in transactions is very similar. It returns a similar TransactionsQueryResult, and takes most of the same options. The main difference between TransactionsQueryResult and QueryResult is that TransactionsQueryResult does not stream results. This means that there are no Err or Close functions and that result sets are buffered in memory - allowing the SDK to read and handle any errors that occur on the stream before returning a result/error.

You must take care to write ctx.Query() inside the lambda however, rather than cluster.Query() or scope.Query().

An example of selecting some rows from the travel-sample bucket:

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	qr, err := ctx.Query("SELECT * FROM `travel-sample`.inventory.hotel WHERE country = $1", &gocb.TransactionQueryOptions{
		PositionalParameters: []interface{}{"United Kingdom"},
	})
	if err != nil {
		return err
	}

	type hotel struct {
		Name string `json:"name"`
	}

	var hotels []hotel
	for qr.Next() {
		var h hotel
		err = qr.Row(&h)
		if err != nil {
			return err
		}

		hotels = append(hotels, h)
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

Rather than specifying the full "`travel-sample`.inventory.hotel" name each time, it is easier to pass a reference to the inventory Scope:

bucket := cluster.Bucket("travel-sample")
scope := bucket.Scope("inventory")
_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	qr, err := ctx.Query("SELECT * FROM hotel WHERE country = $1", &gocb.TransactionQueryOptions{
		PositionalParameters: []interface{}{"United Kingdom"},
		Scope:                scope,
	})
	if err != nil {
		return err
	}

	type hotel struct {
		Name string `json:"name"`
	}

	var hotels []hotel
	for qr.Next() {
		var h hotel
		err = qr.Row(&h)
		if err != nil {
			return err
		}

		hotels = append(hotels, h)
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

An example using a Scope for an UPDATE operation:

bucket := cluster.Bucket("travel-sample")
scope := bucket.Scope("inventory")
_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	qr, err := ctx.Query("UPDATE hotel SET price = $1 WHERE url LIKE $2 AND country = $3", &gocb.TransactionQueryOptions{
		PositionalParameters: []interface{}{99.99, "http://marriot%", "United Kingdom"},
		Scope:                scope,
	})
	if err != nil {
		return err
	}

	meta, err := qr.MetaData()
	if err != nil {
		return err
	}

	if meta.Metrics.MutationCount != 1 {
		panic("Should have received 1 mutation")
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

And an example combining SELECTs and UPDATEs. It’s possible to call regular Go functions from the lambda, as shown here, permitting complex logic to be performed. Just remember that since the lambda may be called multiple times, so may the method.

bucket := cluster.Bucket("travel-sample")
scope := bucket.Scope("inventory")
_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	// Find all hotels of the chain
	qr, err := ctx.Query("SELECT reviews FROM hotel WHERE url LIKE $1 AND country = $2", &gocb.TransactionQueryOptions{
		PositionalParameters: []interface{}{"http://marriot%", "United Kingdom"},
		Scope:                scope,
	})
	if err != nil {
		return err
	}

	// This function (not provided here) will use a trained machine learning model to provide a
	// suitable price based on recent customer reviews
	updatedPrice := priceFromRecentReviews(qr)

	_, err = ctx.Query("UPDATE hotel SET price = $1 WHERE url LIKE $2 AND country = $3", &gocb.TransactionQueryOptions{
		PositionalParameters: []interface{}{updatedPrice, "http://marriot%", "United Kingdom"},
		Scope:                scope,
	})
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

Read Your Own Writes

As with Key-Value operations, N1QL queries support Read Your Own Writes.

This example shows inserting a document and then selecting it again.

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	_, err := ctx.Query("INSERT INTO `default` VALUES ('doc', {'hello':'world'})", nil) (1)
	if err != nil {
		return err
	}

	st := "SELECT `default`.* FROM `default` WHERE META().id = 'doc'" (2)
	qr, err := ctx.Query(st, nil)
	if err != nil {
		return err
	}

	meta, err := qr.MetaData()
	if err != nil {
		return err
	}

	if meta.Metrics.ResultCount != 1 {
		panic("Should have received 1 result")
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

1	The inserted document is only staged at this point, as the transaction has not yet committed. Other transactions, and other non-transactional actors, will not be able to see this staged insert yet.
2	But the SELECT can, as we are reading a mutation staged inside the same transaction.

Mixing Key-Value and N1QL

Key-Value operations and queries can be freely intermixed, and will interact with each other as you would expect.

In this example we insert a document with Key-Value, and read it with a SELECT.

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	_, err := ctx.Insert(collection, "queryRyow", map[string]interface{}{"hello": "world"}) (1)
	if err != nil {
		return err
	}

	st := "SELECT `default`.* FROM `default` WHERE META().id = 'queryRyow'" (2)
	qr, err := ctx.Query(st, nil)
	if err != nil {
		return err
	}

	meta, err := qr.MetaData()
	if err != nil {
		return err
	}

	if meta.Metrics.ResultCount != 1 {
		panic("Should have received 1 result")
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

1	As with the 'Read Your Own Writes' example, here the insert is only staged, and so it is not visible to other transactions or non-transactional actors.
2	But the SELECT can view it, as the insert was in the same transaction.

Query Options

Query options can be provided via TransactionQueryOptions, which provides a subset of the options in the Go SDK’s QueryOptions.

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	_, err := ctx.Query("INSERT INTO `default` VALUES ('queryOpts', {'hello':'world'})", &gocb.TransactionQueryOptions{
		Profile: gocb.QueryProfileModeTimings,
	})
	if err != nil {
		return err
	}

	// There is no commit call, by not returning an error the transaction will automatically commit
	return nil
}, nil)
if err != nil {
	panic(err)
}

The supported options are:

PositionalParameters
NamedParameters
ScanConsistency
FlexIndex
ClientContextID
ScanWait
ScanCap
PipelineBatch
PipelineCap
Profile
Readonly
Raw

See the QueryOptions documentation for details on these.

Query Concurrency

Only one query statement will be performed by the query service at a time. Non-blocking mechanisms can be used to perform multiple concurrent query statements, but this may result internally in some added network traffic due to retries, and is unlikely to provide any increased performance.

Query Performance Advice

This section is optional reading, and only for those looking to maximize transactions performance.

After the first query statement in a transaction, subsequent Key-Value operations in the lambda are converted into N1QL and executed by the query service rather than the Key-Value data service. The operation will behave identically, and this implementation detail can largely be ignored, except for these two caveats:

These converted Key-Value operations are likely to be slightly slower, as the query service is optimized for statements involving multiple documents. Those looking for the maximum possible performance are recommended to put Key-Value operations before the first query in the lambda, if possible.
Those using non-blocking mechanisms to achieve concurrency should be aware that the converted Key-Value operations are subject to the same parallelism restrictions mentioned above, e.g. they will not be executed in parallel by the query service.

Query with KV Roles

To execute a key-value operation within a transaction, users must have the relevant Administrative or Data RBAC roles, and permissions on the relevant buckets, scopes, and collections.

Similarly, to run a query statement within a transaction, users must have the relevant Administrative or Query & Index RBAC roles, and permissions on the relevant buckets, scopes and collections.

Refer to Roles for details.

Query Mode

When a transaction executes a query statement, the transaction enters query mode, which means that the query is executed with the user’s query permissions. Any key-value operations which are executed by the transaction after the query statement are also executed with the user’s query permissions. These may or may not be different to the user’s data permissions; if they are different, you may get unexpected results.

Committing

Committing is automatic: if no errors are returned, the transaction will be committed.

As soon as the transaction is committed, all its changes will be atomically visible to reads from other transactions. The changes will also be committed (or "unstaged") so they are visible to non-transactional actors, in an eventually consistent fashion.

Commit is final: after the transaction is committed, it cannot be rolled back, and no further operations are allowed on it.

An asynchronous cleanup process ensures that once the transaction reaches the commit point, it will be fully committed - even if the application crashes.

A Full Transaction Example

Let’s pull together everything so far into a more real-world example of a transaction.

This example simulates a simple Massively Multiplayer Online game, and includes documents representing:

Players, with experience points and levels;
Monsters, with hitpoints, and the number of experience points a player earns from their death.

In this example, the player is dealing damage to the monster. The player’s client has sent this instruction to a central server, where we’re going to record that action. We’re going to do this in a transaction, as we don’t want a situation where the monster is killed, but we fail to update the player’s document with the earned experience.

(Though this is just a demo - in reality, the game would likely live with the small risk and limited impact of this, rather than pay the performance cost for using a transaction.)

func playerHitsMonster(damage int, playerID, monsterID string) {
	type monster struct {
		Hitpoints            int `json:"hitpoints"`
		ExperienceWhenKilled int `json:"experience_when_killed"`
	}

	type player struct {
		Experience int `json:"experience"`
		Level      int `json:"level"`
	}

	initTransactions(func(cluster *gocb.Cluster, collection *gocb.Collection) {
		_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
			monsterDoc, err := ctx.Get(collection, monsterID)
			if err != nil {
				return err
			}
			playerDoc, err := ctx.Get(collection, monsterID)
			if err != nil {
				return err
			}

			var monsterContent monster
			if err := monsterDoc.Content(&monsterContent); err != nil {
				return err
			}

			monsterNewHitPoints := monsterContent.Hitpoints - damage

			if monsterNewHitPoints <= 0 {
				// Monster is killed. The remove is just for demoing, and a more realistic
				// example would set a "dead" flag or similar.
				err = ctx.Remove(monsterDoc)
				if err != nil {
					return err
				}

				var playerContent player
				if err := playerDoc.Content(&playerContent); err != nil {
					return err
				}

				// The player earns experience for killing the monster
				playerNewExperience := playerContent.Experience + monsterContent.ExperienceWhenKilled
				playerNewLevel := calculateLevelForExperience(playerNewExperience)

				playerContent.Experience = playerNewExperience
				playerContent.Level = playerNewLevel

				_, err = ctx.Replace(playerDoc, playerContent)
				if err != nil {
					return err
				}
			} else {
				// Monster is damaged but still alive
				monsterContent.Hitpoints = monsterNewHitPoints

				_, err = ctx.Replace(monsterDoc, monsterContent)
				if err != nil {
					return err
				}
			}

			return nil
		}, nil)
		var transactionFailedErr gocb.TransactionFailedError
		if errors.As(err, &transactionFailedErr) {
			// The operation failed. Both the monster and the player will be untouched.

			// Situations that can cause this would include either the monster
			// or player not existing (as get is used), or a persistent
			// failure to be able to commit the transaction, for example on
			// prolonged node failure.
			return
		}

		if err != nil {
			panic(err)
		}
	})
}

Concurrency with Non-Transactional Writes

This release of transactions for Couchbase requires a degree of co-operation from the application. Specifically, the application should ensure that non-transactional writes are never done concurrently with transactional writes, on the same document.

This requirement is to ensure that the strong Key-Value performance of Couchbase was not compromised. A key philosophy of our transactions is that you 'pay only for what you use'.

If two such writes do conflict then the transactional write will 'win', overwriting the non-transactional write.

Note this only applies to writes. Any non-transactional reads concurrent with transactions are fine, and are at a Read Committed level.

Rollback

If an exception is thrown, either by the application from the lambda, or by the transaction internally, then that attempt is rolled back. The transaction logic may or may not be retried, depending on the exception.

If the transaction is not retried then it will return a TransactionFailedError error, and its Unwrap function can be used for more details on the failure.

The application can use this to signal why it triggered a rollback, as so:

var ErrBalanceInsufficient = errors.New("insufficient funds")

_, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	doc, err := ctx.Get(collection, "customer-name")
	if err != nil {
		return err
	}

	var cust customer
	err = doc.Content(&cust)
	if err != nil {
		return err
	}

	if cust.Balance < costOfItem {
		return ErrBalanceInsufficient
	}
	// else continue transaction

	return nil
}, nil)
var ambigErr gocb.TransactionCommitAmbiguousError
if errors.As(err, &ambigErr) {
	// This exception can only be thrown at the commit point, after the
	// BalanceInsufficient logic has been passed, so there is no need to
	// check getCause here.
	fmt.Println("Transaction possibly committed")
	fmt.Printf("%+v", ambigErr)
	return
}

var transactionFailedErr gocb.TransactionFailedError
if errors.As(err, &transactionFailedErr) {
	if errors.Is(transactionFailedErr, ErrBalanceInsufficient) {
		// Re-raise the error
		panic(transactionFailedErr)
	} else {
		fmt.Println("Transaction did not reach commit point")
		fmt.Printf("%+v", transactionFailedErr)
	}
	return
}

After a transaction is rolled back, it cannot be committed, no further operations are allowed on it.

Error Handling

As discussed previously, Couchbase transactions will attempt to resolve many errors for you, through a combination of retrying individual operations and the application’s lambda. This includes some transient server errors, and conflicts with other transactions.

But there are situations that cannot be resolved, and total failure is indicated to the application via errors. These errors include:

Any error thrown by your transaction lambda, either deliberately or through an application logic bug.
Attempting to insert a document that already exists.
Attempting to remove or replace a document that does not exist.
Calling ctx.Get() on a document key that does not exist.

Once one of these errors occurs, the current attempt is irrevocably failed (though the transaction may retry the lambda). It is not possible for the application to catch the failure and continue. Once a failure has occurred, all other operations tried in this attempt (including commit) will instantly fail.

Transactions, as they are multi-stage and multi-document, also have a concept of partial success or failure. This is signalled to the application through the TransactionResult.UnstagingComplete field, described later.

There are three errors that Couchbase transactions can return to the application: TransactionFailedError, TransactionExpiredError and TransactionCommitAmbiguousError.

TransactionFailedError and TransactionExpiredError

The transaction definitely did not reach the commit point. TransactionFailedError indicates a fast-failure whereas TransactionExpiredError indicates that retries were made until the expiration point was reached, but this distinction is not normally important to the application and generally TransactionExpiredError does not need to be handled individually.

Either way, an attempt will have been made to rollback all changes. This attempt may or may not have been successful, but the results of this will have no impact on the protocol or other actors. No changes from the transaction will be visible (presently with the potential and temporary exception of staged inserts being visible to non-transactional actors, as discussed under Inserting).

Handling: Generally, debugging exactly why a given transaction failed requires review of the logs, so it is suggested that the application log these on failure (see Logging). The application may want to try the transaction again later. Alternatively, if transaction completion time is not a priority, then transaction expiration times (which default to 15 seconds) can be extended across the board through TransactionsConfig.

cluster, err := gocb.Connect("localhost", gocb.ClusterOptions{
	TransactionsConfig: gocb.TransactionsConfig{
		Timeout: 120 * time.Second,
	},
})

This will allow the protocol more time to get past any transient failures (for example, those caused by a cluster rebalance). The tradeoff to consider with longer expiration times, is that documents that have been staged by a transaction are effectively locked from modification from other transactions, until the expiration time has exceeded.

Note that expiration is not guaranteed to be followed precisely. For example, if the application were to do a long blocking operation inside the lambda (which should be avoided), then expiration can only trigger after this finishes. Similarly, if the transaction attempts a key-value operation close to the expiration time, and that key-value operation times out, then the expiration time may be exceeded.

TransactionCommitAmbiguousError

As discussed previously, each transaction has a 'single point of truth' that is updated atomically to reflect whether it is committed.

However, it is not always possible for the protocol to become 100% certain that the operation was successful, before the transaction expires. That is, the operation may have successfully completed on the cluster, or may succeed soon, but the protocol is unable to determine this (whether due to transient network failure or other reason). This is important as the transaction may or may not have reached the commit point, e.g. succeeded or failed.

Couchbase transactions will raise TransactionCommitAmbiguousError to indicate this state. It should be rare to receive this error.

If the transaction had in fact successfully reached the commit point, then the transaction will be fully completed ("unstaged") by the asynchronous cleanup process at some point in the future. With default settings this will usually be within a minute, but whatever underlying fault has caused the TransactionCommitAmbiguousError may lead to it taking longer.

If the transaction had not in fact reached the commit point, then the asynchronous cleanup process will instead attempt to roll it back at some point in the future. If unable to, any staged metadata from the transaction will not be visible, and will not cause problems (e.g. there are safety mechanisms to ensure it will not block writes to these documents for long).

Handling: This error can be challenging for an application to handle. As with TransactionFailedError it is recommended that it at least writes any logs from the transaction, for future debugging. It may wish to retry the transaction at a later point, or globally extend transactional expiration times to give the protocol additional time to resolve the ambiguity.

TransactionResult.UnstagingComplete

This boolean flag indicates whether all documents were able to be unstaged (committed).

For most use-cases it is not an issue if it is false. All transactional actors will still all the changes from this transaction, as though it had committed fully. The cleanup process is asynchronously working to complete the commit, so that it will be fully visible to non-transactional actors.

The flag is provided for those rare use-cases where the application requires the commit to be fully visible to non-transactional actors, before it may continue. In this situation the application can raise an error here, or poll all documents involved until they reflect the mutations.

If you regularly see this flag false, consider increasing the transaction expiration time to reduce the possibility that the transaction times out during the commit.

Full Error Handling Example

Pulling all of the above together, this is the suggested best practice for error handling:

result, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	// ... transactional code here ...
	return nil
}, nil)
var ambigErr gocb.TransactionCommitAmbiguousError
if errors.As(err, &ambigErr) {
	fmt.Println("Transaction returned TransactionCommitAmbiguous and may have succeeded")

	// Of course, the application will want to use its own logging rather
	// than fmt.Printf
	fmt.Printf("%+v", ambigErr)
	return
}
var transactionFailedErr gocb.TransactionFailedError
if errors.As(err, &transactionFailedErr) {
	// The transaction definitely did not reach commit point
	fmt.Println("Transaction failed with TransactionFailed")
	fmt.Printf("%+v", transactionFailedErr)
	return
}
if err != nil {
	panic(err)
}

// The transaction definitely reached the commit point. Unstaging
// the individual documents may or may not have completed
if !result.UnstagingComplete {
	// In rare cases, the application may require the commit to have
	// completed.  (Recall that the asynchronous cleanup process is
	// still working to complete the commit.)
	// The next step is application-dependent.
}

Asynchronous Cleanup

Transactions will try to clean up after themselves in the advent of failures. However, there are situations that inevitably created failed, or 'lost' transactions, such as an application crash.

This requires an asynchronous cleanup task, described in this section.

Calling Connect spawns a background cleanup task, whose job it is to periodically scan for expired transactions and clean them up. It does this by scanning a subset of the Active Transaction Record (ATR) transaction metadata documents, for each metadata collection used by any transactions. As you’ll recall from earlier, an entry for each transaction attempt exists in one of these documents. They are removed during cleanup or at some time after successful completion.

Unless there are any metadata collections registered (either from config or by running a transaction) then the background cleanup task will do no work and so is very lightweight.

The default settings are tuned to find expired transactions reasonably quickly, while creating negligible impact from the background reads required by the scanning process. To be exact, with default settings it will generally find expired transactions within 60 seconds, and use less than 20 reads per second. This is unlikely to impact performance on any cluster, but the settings may be tuned as desired.

All applications connected to the same cluster and running Transactions will share in the cleanup, via a low-touch communication protocol on the "_txn:client-record" metadata document that will be created in each metadata collection used during transactions. This document is visible and should not be modified externally as is maintained automatically. All ATRs on a metadata collection will be distributed between all cleanup clients, so increasing the number of applications will not increase the reads required for scanning.

An application may cleanup transactions created by another application.

It is important to understand that if an application is not running, then cleanup is not running. This is particularly relevant to developers running unit tests or similar.

If this is an issue, then the deployment may want to consider running a simple application at all times that just call Connect, to guarantee that cleanup is running. When an application is used solely for cleanup it must register any collections to monitor via the CleanupCollections config option, otherwise the cleanup task will not do any work. Only the collections registered will be monitored.

Configuring Cleanup

Setting Default Description

Setting	Default	Description
`CleanupWindow`	60 seconds	This determines how long a cleanup 'run' is; that is, how frequently this client will check its subset of ATR documents. It is perfectly valid for the application to change this setting, which is at a conservative default. Decreasing this will cause expiration transactions to be found more swiftly (generally, within this cleanup window), with the tradeoff of increasing the number of reads per second used for the scanning process.
`DisableLostAttemptCleanup`	false	This is the thread that takes part in the distributed cleanup process described above, that cleans up expired transactions created by any client. It is strongly recommended that it is left enabled.
`DisableClientAttemptCleanup`	false	This thread is for cleaning up transactions created just by this client. The client will preferentially aim to send any transactions it creates to this thread, leaving transactions for the distributed cleanup process only when it is forced to (for example, on an application crash). It is strongly recommended that it is left enabled.
`CleanupCollections`	`[]TransactionKeyspace{}`	This is the set of additional collections that the lost transactions cleanup task will monitor

CleanupWindow

60 seconds

This determines how long a cleanup 'run' is; that is, how frequently this client will check its subset of ATR documents. It is perfectly valid for the application to change this setting, which is at a conservative default. Decreasing this will cause expiration transactions to be found more swiftly (generally, within this cleanup window), with the tradeoff of increasing the number of reads per second used for the scanning process.

DisableLostAttemptCleanup

false

This is the thread that takes part in the distributed cleanup process described above, that cleans up expired transactions created by any client. It is strongly recommended that it is left enabled.

DisableClientAttemptCleanup

false

This thread is for cleaning up transactions created just by this client. The client will preferentially aim to send any transactions it creates to this thread, leaving transactions for the distributed cleanup process only when it is forced to (for example, on an application crash). It is strongly recommended that it is left enabled.

CleanupCollections

[]TransactionKeyspace{}

This is the set of additional collections that the lost transactions cleanup task will monitor

Logging

To aid troubleshooting, raise the log level on the SDK.

Please see the Go SDK logging documentation for details.

Custom Metadata Collections

As described earlier, transactions automatically create and use metadata documents. By default, these are created in the default collection of the bucket of the first mutated document in the transaction. Optionally, you can instead use a collection to store the metadata documents. Most users will not need to use this functionality, and can continue to use the default behavior. They are provided for these use-cases:

The metadata documents contain, for documents involved in each transaction, the document’s key and the name of the bucket, scope and collection it exists on. In some deployments this may be sensitive data.
You wish to remove the default collections. Before doing this, you should ensure that all existing transactions using metadata documents in the default collections have finished.

Usage

Custom metadata collections are enabled with:

cluster, err := gocb.Connect("localhost", gocb.ClusterOptions{
	TransactionsConfig: gocb.TransactionsConfig{
		MetadataCollection: &gocb.TransactionKeyspace{
			BucketName:     "travel-sample",
			ScopeName:      "transactions",
			CollectionName: "metadata",
		},
	},
})

When specified:

Any transactions created from this Transactions object, will create and use metadata in that collection.
The asynchronous cleanup started by this Transactions object will be looking for expired transactions only in this collection, unless additional CleanupCollections are provided or a transaction explicitly overrides the metadata collection.

You need to ensure that this application has RBAC data read and write privileges to it, and should not delete the collection subsequently as it can interfere with existing transactions. You can use an existing collection or create a new one.

Custom metadata collections can also be provided at the transaction level itself:

metaCollection := cluster.Bucket("travel-sample").Scope("transactions").Collection("other-metadata")
result, err := cluster.Transactions().Run(func(ctx *gocb.TransactionAttemptContext) error {
	// ... transactional code here ...
	return nil
}, &gocb.TransactionOptions{
	MetadataCollection: metaCollection,
})

This will override any metadata collection that has been provided at the Transactions level.

Distributed Transactions from the Go SDK

Requirements

Getting Started

Initializing Transactions

Configuration

Creating a Transaction

Examples

Transaction Mechanics

Active Transaction Record Entries

Staged Mutations

Cleanup

Committing

Key-Value Mutations

Replacing

Removing

Inserting

Key-Value Reads

N1QL Queries

BEGIN TRANSACTION

Supported N1QL

Using N1QL

Read Your Own Writes

Mixing Key-Value and N1QL

Query Options

Query Concurrency

Query Performance Advice

Query with KV Roles

Committing

A Full Transaction Example

Concurrency with Non-Transactional Writes

Rollback

Error Handling

TransactionFailedError and TransactionExpiredError

TransactionCommitAmbiguousError

TransactionResult.UnstagingComplete

Full Error Handling Example

Asynchronous Cleanup

Configuring Cleanup

Logging

Custom Metadata Collections

Usage

Further Reading