Create a Custom Tokenizer

  • Capella Operational
  • how-to
    +
    Create a custom tokenizer with the Couchbase Capella UI to change how the Search Service creates tokens for matching Search index content to a Search query.

    Prerequisites

    • You have the Search Service enabled on a node in your operational cluster. For more information about how to change Services on your operational cluster, see Modify a Paid Cluster.

    • You have logged in to the Couchbase Capella UI.

    • You have started to create or already created an index in Advanced Mode Editing.

    • You have already created or started to create a custom analyzer in your Search index.

    Procedure

    To create a new custom tokenizer with the Capella UI in Advanced Mode:

    1. On the Operational Clusters page, select the operational cluster where you want to work with the Search Service.

    2. Go to Data Tools  Search.

    3. Do one of the following:

      1. To work with an existing Search index, click the name of the index where you want to create a custom analyzer.

      2. To create a new Search index, click Create Search Index.

    4. Make sure to select Enable Advanced Options.

    5. Expand Global Index Settings.

    6. Do one of the following:

      1. To create a new custom analyzer with a new tokenizer, click Add Custom Analyzer.

      2. To add a new custom tokenizer to use with an existing analyzer, expand the Default Analyzer list, and next to your custom analyzer, click Edit.

    7. Click Add Custom Tokenizer.

    8. In the Tokenizer Name field, enter a name for the tokenizer.

    9. In the Type list, select a tokenizer type.

    10. Configure your tokenizer based on your chosen tokenizer type.

    You can create 2 types of custom tokenizers:

    Tokenizer Type Description

    Regular expression

    The tokenizer uses any input that matches the regular expression to create new tokens.

    Exception

    The tokenizer removes any input that matches the regular expression, and creates tokens from the remaining input. You can choose another tokenizer to apply to the remaining input.

    Create a Regular Expression Tokenizer

    To create a regular expression tokenizer with the Capella UI:

    1. In the Type list, select regexp.

    2. In the Regular Expression field, enter the regular expression to use to split input into tokens.

      For example, the regular expression \b\w+\b would create tokens based on the word boundaries and word characters found in the input.

    3. Click Add Custom Tokenizer.

    Create an Exception Custom Tokenizer

    To create an exception custom tokenizer with the Capella UI in Advanced Mode:

    1. In the Type list, select exception.

    2. In the Regular Expressions field, enter 1 or more regular expression to use to remove content from your input. Separate multiple regular expression patterns by entering a comma (,).

    3. In the Tokenizer for Remaining Input list, select a tokenizer to apply to your input after removing any content that matches your provided Regular Expressions.

      For more information about the available tokenizers, see Default Tokenizers.

    4. Click Add Custom Tokenizer.

    Next Steps

    After you create a custom tokenizer, you can use it with a custom analyzer.

    To continue customizing your Search index, you can also:

    To run a search and test the contents of your Search index, see Run A Simple Search with the Capella UI.