Indexing issues

If you encounter issues with indexing, such as failure to index items from Couchbase, or unexpected items in your search results, try to check the following items and performing the described fixes:

Change settings for initial indexing

If you have an existing Couchbase data bucket with a large number of documents already in production, these documents will be transferred to Elasticsearch in bulk. Typically this works with Elasticsearch default settings, however there are some Elasticsearch settings you can change so that indexing quickly completes.

You use the Elasticsearch refresh_interval setting to indicate how frequently the engine provides newly indexed items. During an initial bulk load of documents from Couchbase, you can reduce access to newly indexed items in exchange for overall faster indexing time. For more information about enabling and disabling this setting, see Elasticsearch Guide, Indices Update.

Check Elasticsearch mappings

When you send documents to Elasticsearch it will automatically generate a mapping that contains rules for indexing fields. You can also provide your own mapping or update this mapping. Be aware that this default mapping from Elasticsearch includes assumptions about data types and data structures in your documents. Based on these assumptions, Elasticsearch may omit your document from the index. For instance, objects within an array may not be indexed as you expect.

For general information about expected data structures for Elasticsearch see Elasticsearch, Mapping, Types and related sections.

Check your documents

Validate your documents as well-formed JSON. The Couchbase Plug-in for Elasticsearch will take any items that are binary data and will log an error message. Elastic search cannot index documents which are not valid JSON, for instances.jpgs and other forms of binary data cannot be indexed by Elasticsearch.

If you change a field type for your documents after Elasticsearch has indexed, it may omit your document from the index.