Managing documents

Both Couchbase Server and Elasticsearch enable you to flexibly model data by using JSON documents to represent your application objects. These documents generally contain all the information about a data entity, and can be as complex as you choose. You can use nested data and structures such as arrays and hashes to provide additional information about your object. You also do not need to perform schema migrations for JSON documents. Instead you can flexibly write new data fields or values to represent new information in your documents.

Be aware that Elasticsearch automatically creates a mapping which defines what is searchable in your documents. So how you design your documents for search is influenced by Elasticsearch mappings. This guide is not intended to be an exhaustive description on how to model documents for use in Elasticsearch or how to modifying mapping in Elasticsearch.

There are a few document design considerations you should have in mind if you are going to use Couchbase Server with Elasticsearch. For instance, imagine you have a document to represent a product as follows:

Imagine your inventory system changes and you want to change your product SKU so that it is an integer. With the flexibility of JSON, you can update the schema for your product to look like this in Couchbase Server:

By design, there may be some cases where you do not want to update all your products to have integer SKUs. This may be an intentional design choice if your application logic does not depend on it, or you may want to change your application logic to handle both string and integer SKUs.

When you use Elasticsearch, it will generate a default mapping used to index items; this mapping describes how particular fields should be indexed. By default, Elasticsearch builds a mapping based on documents transmitted to it and it will make assumptions and generalizations about all documents based on the initial documents it receives. Each default mapping will assume specific data types for each field. Be aware that even though you can change a field type for a document in Couchbase Server, it can cause problems for Elasticsearch if you do not also update the default mapping. If you change the field type in Couchbase Server, Elasticsearch will not index the new documents containing the different field type and you will not receive the document ID as a search query. In this case, because we change the SKU from a string to an integer, all the products we add to our system with the integer SKU will not be indexed by Elasticsearch using the default mapping.

To resolve this, whenever you update a document schema you probably want to update all documents which contain that field. In our example, we update the SKU field so that all documents have integers for SKUs. Elasticsearch will receive all updates to existing products and then index them. For more general information about data modeling for Couchbase Server, see the Modeling Data section in the Couchbase Developer Guide,.

Understanding arrays in Elasticsearch

If you want to store an array of objects in a JSON document, be aware that you may need to provide your own specific mapping for Elasticsearch in order to achieve the results you expect from it. For example, imagine you have an object with two items that appear:

{
    "object1" : [
        {
            "name" : "blue",
            "count" : 4
        },
        {
            "name" : "green",
      "count" : 6
     }
    ]
}

If you search for a name set to blue and a count greater than 5, you will get this document in your index, even though the count associated with blue is 4 not 6. This is most likely not what you expected since the two conditions are satisfied by two different objects in the array. To handle these types of scenarios, you need to provide your own nested mapping for Elasticsearch.

Enabling document expiration

Time to Live, also known as TTL, is the time until a document expires in Couchbase Server. By default, all documents will have a TTL of 0, which indicates the document will be kept indefinitely. When you create your own application, you can provide specific TTLs such as 30 seconds. As part of normal maintenance operations, Couchbase Server will periodically remove all items with expirations that have passed. By default this maintenance process runs every 60 minutes.

Be aware that if you use TTLs in Couchbase Server that this expiration will not be propagated with a document to Elasticsearch. Instead Couchbase Server will send information about the document deletion when the maintenance process runs on the 60- minute interval. This may cause some problems because the document ID can appear in an Elasticsearch index, but the document may no longer be available from Couchbase Server.

To mitigate this problem, you can manually enable the expiration field used by Elasticsearch, _ttl. When you do this, Couchbase Server can propagate the document TTL to Elasticsearch, and then Elasticsearch will remove the item from an index when TTL expires. Because there is some time lag between the servers during replication, indexing, and maintenance processes, document expiration will not occur at the same exact time for each server. However if you use this approach it can help reduce items in an Elasticsearch result set which are no longer available from Couchbase.