Non-JSON Documents

  • concept
January 19, 2025
+ 12
Binary formats & Transcoders.
Couchbase is a document database and functions best when document contents are JSON. Nevertheless Couchbase may be used to store non-JSON data for various use cases.

Non-JSON formats may be more efficient in terms of memory and processing power (for example, if storing only flat strings, JSON adds an additional syntactical overhead of two bytes per string). Non-JSON documents may be desirable if migrating a legacy application, which is using a customized binary format.

Note that only JSON documents can be accessed using Query and Search. Limited support for non-JSON documents is available for MapReduce views. Additionally, non-JSON documents will not be accessible using the Web UI (the contents will be shown in their Base64 equivalent).

Using non-JSON documents

It’s important to note that a JSON document can also refer to a simple integer (42), string ("hello"), array ([1,2,3]), boolean (true, false) and the JSON null value. Nevertheless if your application requires a non-JSON format, the SDK may still support it natively.

If there is no native support for your format, you can write a transcoder which handles the encoding and decoding of your documents to and from the server.

Common flags

Every item (record) in Couchbase contains metadata stored along with it in the server. One of the metadata fields is a 32 bit "flag" value.

Couchbase SDKs accept native object types (integers, strings, arrays, dictionaries) as valid inputs for a Document and internally convert them to JSON before sending them to the server to be stored. When the SDK serializes the Document, it notes the type of serialization performed (JSON) and sends a corresponding type code along with the serialized document to the server to be stored. This type code is stored in the flags field within the item’s metadata.

Later, when retrieving the document from the server, the SDK checks the type code which informs it about the type of serialization used to encode the document, and thus how the SDK should de-serialize the document.

By default, SDKs will only accept document types which can serialize to JSON and will only serialize them to JSON. However applications can configure the SDK to use non-JSON serialization and accept other types of inputs as documents. Note that using non-JSON serialization will prevent the document from being accessible via Query and MapReduce.

The table below shows the built-in formats available in most SDKs. The name column displays the format name; the native type column displays the native language type which is used as input and output for the document, the description contains the properties of such a type, and the flags value contains the actual code used for the format. The format code is discussed more in depth below.

Table 1. Built-in format types
Name Native Type Description Flags value (see below)

JSON

Dictionary, Array, Number, String, Integer

Default serialization. Serializes document to JSON. Document can be used with Query

0x02 << 24

UTF-8

Unicode or String

Indicates the document is a UTF-8 string. This may be more space-efficient than JSON for documents that are a simple string, as JSON requires strings to be encapsulated by quotes. Using this serialization format may save two bytes for each string value

0x04 << 24

RAW

ByteArray, buffer, etc.

Indicates this value is a raw sequence of bytes. It is the simplest encoding form and indicates that the application will process and interpret its contents as it sees fit.

0x03 << 24

PRIVATE

SDK/Language dependent

This indicates that a language-specific serialization format is being used. The serialization format depends on the language (for example, Pickle for Python, Marshal for Ruby, Java Serialization for Java, etc). Using this format will make your documents inaccessible from other-language SDKs.

0x01 << 24