Create a Custom Tokenizer
- Capella Operational
- how-to
Create a custom tokenizer with the Couchbase Capella UI to change how the Search Service creates tokens for matching Search index content to a Search query.
Prerequisites
-
You have the Search Service enabled on a node in your operational cluster. For more information about how to change Services on your operational cluster, see Modify a Paid Cluster.
-
You have logged in to the Couchbase Capella UI.
-
You have started to create or already created an index in Advanced Mode Editing.
-
You have already created or started to create a custom analyzer in your Search index.
Procedure
To create a new custom tokenizer with the Capella UI in Advanced Mode:
-
On the Operational Clusters page, select the operational cluster where you want to work with the Search Service.
-
Go to
. -
Do one of the following:
-
To work with an existing Search index, click the name of the index where you want to create a custom analyzer.
-
To create a new Search index, click Create Search Index.
-
-
Make sure to select Enable Advanced Options.
-
Expand Global Index Settings.
-
Do one of the following:
-
To create a new custom analyzer with a new tokenizer, click Add Custom Analyzer.
-
To add a new custom tokenizer to use with an existing analyzer, expand the Default Analyzer list, and next to your custom analyzer, click Edit.
-
-
Click Add Custom Tokenizer.
-
In the Tokenizer Name field, enter a name for the tokenizer.
-
In the Type list, select a tokenizer type.
-
Configure your tokenizer based on your chosen tokenizer type.
You can create 2 types of custom tokenizers:
Tokenizer Type | Description |
---|---|
The tokenizer uses any input that matches the regular expression to create new tokens. |
|
The tokenizer removes any input that matches the regular expression, and creates tokens from the remaining input. You can choose another tokenizer to apply to the remaining input. |
Create a Regular Expression Tokenizer
To create a regular expression tokenizer with the Capella UI:
-
In the Type list, select regexp.
-
In the Regular Expression field, enter the regular expression to use to split input into tokens.
For example, the regular expression
\b\w+\b
would create tokens based on the word boundaries and word characters found in the input. -
Click Add Custom Tokenizer.
Create an Exception Custom Tokenizer
To create an exception custom tokenizer with the Capella UI in Advanced Mode:
-
In the Type list, select exception.
-
In the Regular Expressions field, enter 1 or more regular expression to use to remove content from your input. Separate multiple regular expression patterns by entering a comma (
,
). -
In the Tokenizer for Remaining Input list, select a tokenizer to apply to your input after removing any content that matches your provided Regular Expressions.
For more information about the available tokenizers, see Default Tokenizers.
-
Click Add Custom Tokenizer.
Next Steps
After you create a custom tokenizer, you can use it with a custom analyzer.
To continue customizing your Search index, you can also:
To run a search and test the contents of your Search index, see Run A Simple Search with the Capella UI.