Create a Custom Tokenizer

  • how-to
    +
    Create a custom tokenizer with the Couchbase Server Web Console to change how the Search Service creates tokens for matching Search index content to a Search query.

    Prerequisites

    Procedure

    You can create 2 types of custom tokenizers:

    Tokenizer Type Description

    Regular expression

    The tokenizer uses any input that matches the regular expression to create new tokens.

    Exception

    The tokenizer removes any input that matches the regular expression, and creates tokens from the remaining input. You can choose another tokenizer to apply to the remaining input.

    Create a Regular Expression Tokenizer

    To create a regular expression tokenizer with the Couchbase Server Web Console:

    1. Go to Search.

    2. Click the Search index where you want to create a custom tokenizer.

    3. Click Edit.

    4. Expand Customize Index  Custom Filters.

    5. Click Add Tokenizer.

    6. In the Name field, enter a name for the custom tokenizer.

    7. In the Type field, select regexp.

    8. In the Regular Expression field, enter the regular expression to use to split input into tokens.

    9. Click Save.

    Create an Exception Custom Tokenizer

    To create an exception custom tokenizer with the Couchbase Server Web Console:

    1. Go to Search.

    2. Do one of the following:

    3. Click the Search index where you want to create a custom tokenizer.

    4. Click Edit.

    5. Expand Customize Index  Custom Filters.

    6. Click Add Tokenizer.

    7. In the Name field, enter a name for the custom tokenizer.

    8. In the Type field, select exception.

    9. In the Exception Patterns field, enter a regular expression to use to remove content from input.

    10. To add the regular expression to the list of exception patterns, click Add.

    11. (Optional) To add additional regular expressions to the list of exception patterns, repeat the previous steps.

    12. In the Tokenizer for Remaining Input field, select a tokenizer to apply to input after removing any content that matches the regular expression.

      For more information about the available tokenizers, see Default Tokenizers.

    13. Click Save.