Default Analyzers

  • Capella Operational
  • reference
March 23, 2025
+ 12
Use an analyzer to filter and modify search strings to improve matches for search results.

Analyzers contain:

When you create a type mapping, you can choose a default analyzer for your type mappings, or create your own.

The following default analyzer options are available:

Analyzer Description

inherit

If you set an analyzer to inherit, the Search index component inherits the default analyzer set for an index.

Arabic - ar

An Arabic language analyzer.

Chinese, Japanese, and Korean - cjk

An analyzer designed for the Chinese, Japanese, and Korean languages.

Kurdish - ckb

A Kurdish language analyzer.

Danish - da

A Danish language analyzer.

German - de

A German language analyzer.

English - en

An English language analyzer.

Castilian Spanish - es

A Castilian Spanish language analyzer.

Persian - fa

A Persian language analyzer.

Finnish - fi

A Finnish language analyzer.

French - fr

A French language analyzer.

Hebrew - he

A Hebrew language analyzer.

Hindi - hi

A Hindi language analyzer.

Croatian - hr

A Croatian language analyzer.

Hungarian - hu

A Hungarian language analyzer.

Italian - it

An Italian language analyzer.

keyword

The keyword analyzer turns input into a single token. It forces exact matches and preserves whitespace characters like spaces.

For example, the keyword analyzer turns an input of Couchbase Server into a single token: Couchbase Server.

Dutch - nl

A Dutch language analyzer.

Norwegian - no

A Norwegian language analyzer.

Portuguese - pt

A Portuguese language analyzer.

Romanian - ro

A Romanian language analyzer.

Russian - ru

A Russian language analyzer.

simple

The simple analyzer turns input into tokens based on letter characters. It removes characters like punctuation and numbers, and uses these characters as the boundaries for tokens.

For example, the simple analyzer turns an input of Couchbase Server into two tokens: Couchbase and Server.

standard

The standard analyzer uses the unicode tokenizer with the to_lower and stop_en token filters.

For example, the standard analyzer turns an input of The name is Couchbase Server into three tokens: name, couchbase, and server.

Swedish - sv

A Swedish language analyzer.

Turkish - tr

A Turkish language analyzer.

web

The web analyzer finds email addresses, URLs, Twitter usernames, and hashtags in its input and turns them into tokens.

For example, the web analyzer turns an input of Send #Couchbase to example@gmail.com into four tokens: send, #Couchbase, to, and example@gmail.com.