Pattern-Matching Functions

reference

Pattern-matching functions allow you to find regular expression patterns in strings or attributes. Regular expressions can formally represent various string search patterns using different special characters to indicate wildcards, positional characters, repetition, optional or mandatory sequences of letters, etc. SQL++ functions are available to find matching patterns, find position of matching pattern, or replace a pattern with a new string.

For more information on all supported REGEX patterns, see https://golang.org/pkg/regexp/syntax.

From Couchbase Server 5.0, SQL++ supports regular expressions supported by The Go Programming Language version 1.8.

REGEXP_CONTAINS(`expression`, `pattern`)

This function has an alias REGEX_CONTAINS().

Arguments

expression: String, or any SQL++ expression that evaluates to a string.
pattern: String representing a supported regular expression.

Return Value

Returns TRUE if the string value contains any sequence that matches the regular expression pattern.

Example

To try the examples in this section, set the query context to the inventory scope in the travel sample dataset. For more information, see Query Context.

Query

SELECT name
FROM landmark
WHERE REGEXP_CONTAINS(name, "In+.*")
LIMIT 5;

Results

[
  {
    "name": "Beijing Inn"
  },
  {
    "name": "Sportsman Inn"
  },
  {
    "name": "In-N-Out Burger"
  },
  {
    "name": "Mel's Drive-In"
  },
  {
    "name": "Inverness Castle"
  }
]

REGEXP_LIKE(`expression`, `pattern`)

This function has an alias REGEX_LIKE().

Arguments

expression: String, or any SQL++ expression that evaluates to a string.
pattern: String representing a supported regular expression.

Return Value

Returns TRUE if the string value exactly matches the regular expression pattern.

Example

To try the examples in this section, set the query context to the inventory scope in the travel sample dataset. For more information, see Query Context.

Query

SELECT name
FROM landmark
WHERE REGEXP_LIKE(name, "In+.*")
LIMIT 5;

Results

[
  {
    "name": "In-N-Out Burger"
  },
  {
    "name": "Inverness Castle"
  },
  {
    "name": "Inverness Museum & Art Gallery"
  },
  {
    "name": "Inverness Botanic Gardens"
  },
  {
    "name": "International Petroleum Exchange"
  }
]

REGEXP_MATCHES(`expression`, `pattern`)

Arguments

expression: String, or any SQL++ expression that evaluates to a string.
pattern: String representing a supported regular expression.

Return Value

Returns an array of all substrings matching the expression pattern within the input string expression. Returns an empty array if no match is found.

Examples

REGEXP_MATCHES() Example 1

The following query finds all words beginning with upper or lower case B.

Query

SELECT REGEXP_MATCHES("So, 'twas better Betty Botter bought a bit of better butter",
                      "\\b[Bb]\\w+"); (1)

1	The backslash that introduces an escape sequence in the regular expression must itself be escaped by another backslash in the SQL++ query. So `\b` (word boundary) must be entered as `\\b` and `\w` (word character) must be entered as `\\w`.

Results

[
  {
    "$1": [
      "better",
      "Betty",
      "Botter",
      "bought",
      "bit",
      "better",
      "butter"
    ]
  }
]

REGEXP_MATCHES() Example 2

The following query finds sequences of two words beginning with upper or lower case B.

Query

SELECT REGEXP_MATCHES("So, 'twas better Betty Botter bought a bit of better butter",
                      "\\b[Bb]\\w+ \\b[Bb]\\w+");

Results

[
  {
    "$1": [
      "better Betty",
      "Botter bought", (1)
      "better butter"
    ]
  }
]

1	Note that `Betty Botter` is not found in this example, because `Betty` has already been found by the first match.

REGEXP_POSITION(`expression`, `pattern`)

This function has an alias REGEX_POSITION().

Arguments

expression: String, or any SQL++ expression that evaluates to a string.
pattern: String representing a supported regular expression.

Return Value

Returns first position of the occurrence of the regular expression pattern within the input string expression. Returns -1 if no match is found. Position counting starts from zero.

Example

To try the examples in this section, set the query context to the inventory scope in the travel sample dataset. For more information, see Query Context.

The following query finds positions of first occurrence of vowels in each word of the name attribute.

Query

SELECT name, ARRAY REGEXP_POSITION(x, "[aeiou]") FOR x IN TOKENS(name) END
FROM hotel
LIMIT 2;

Results

[
  {
    "$1": [
      1,
      1,
      1
    ],
    "name": "Medway Youth Hostel"
  },
  {
    "$1": [
      2,
      1,
      1
    ],
    "name": "The Balmoral Guesthouse"
  }
]

Note that the order of tokens in the second result may be different.

REGEXP_REPLACE(`expression`, `pattern`, `repl` [, `n`])

This function has an alias REGEX_REPLACE().

Arguments

expression: String, or any SQL++ expression that evaluates to a string.
pattern: String representing a supported regular expression.
repl: String, or any SQL++ expression that evaluates to a string.
n: [Optional] The maximum number of times to find and replace the matching pattern.

Return Value

Returns new string with occurrences of pattern replaced with repl. If n is given, at the most n replacements are performed. If n is not provided, all matching occurrences are replaced.

Examples

To try the examples in this section, set the query context to the inventory scope in the travel sample dataset. For more information, see Query Context.

REGEXP_REPLACE() Example 1

Query

SELECT REGEXP_REPLACE("Sql++ is sql for NoSql", "[sS][qQ][lL]", "SQL"),
       REGEXP_REPLACE("Winning innings Inn", "[Ii]n+", "Hotel", 6),
       REGEXP_REPLACE("Winning innings Inn", "[IiNn]+g", upper("inning"), 2);

Results

[
  {
    "$1": "SQL++ is SQL for NoSQL",
    "$2": "WHotelHotelg HotelHotelgs Hotel",
    "$3": "WINNING INNINGs Inn"
  }
]

REGEXP_REPLACE() Example 2

In this example, the query retrieves first 4 documents and replaces the pattern of repeating n with emphasized NNNN.

Query

SELECT name, REGEXP_REPLACE(name, "n+", "NNNN") as new_name
FROM airline
LIMIT 4;

Results

[
  {
    "name": "40-Mile Air",
    "new_name": "40-Mile Air"
  },
  {
    "name": "Texas Wings",
    "new_name": "Texas WiNNNNgs"
  },
  {
    "name": "Atifly",
    "new_name": "Atifly"
  },
  {
    "name": "Jc royal.britannica",
    "new_name": "Jc royal.britaNNNNica"
  }
]

REGEXP_SPLIT(`expression`, `pattern`)

Arguments

expression: String, or any SQL++ expression that evaluates to a string.
pattern: String representing a supported regular expression.

Return Value

Returns an array of all the substrings created by splitting the input string expression at each occurrence of the expression pattern. Returns an empty array if no match is found.

Example

Query

SELECT REGEXP_SPLIT("C:\\Program Files\\couchbase\\server\\bin", "[\\\\]") AS Windows, (1)
REGEXP_SPLIT("/opt/couchbase/bin", "/") AS Unix;

1	The regular expression `[\\\\]` matches the escaped backslash `\\`.

Results

[
  {
    "Unix": [
      "", (1)
      "opt",
      "couchbase",
      "bin"
    ],
    "Windows": [
      "C:",
      "Program Files",
      "couchbase",
      "server",
      "bin"
    ]
  }
]

1	The `REGEXP_SPLIT` function returns any zero-length matches that occur at the start of the expression string, except when the split pattern is zero-length. Otherwise, it returns any zero-length matches immediately after a previous match.

Aliases

Some pattern-matching functions have an alias whose name begins with REGEX_.

REGEX_CONTAINS() is an alias for REGEXP_CONTAINS().
REGEX_LIKE() is an alias for REGEXP_LIKE().
REGEX_POSITION() is an alias for REGEXP_POSITION().
REGEX_REPLACE() is an alias for REGEXP_REPLACE().

Pattern-Matching Functions

REGEXP_CONTAINS(expression, pattern)

Arguments

Return Value

Example

REGEXP_LIKE(expression, pattern)

Arguments

Return Value

Example

REGEXP_MATCHES(expression, pattern)

Arguments

Return Value

Examples

REGEXP_POSITION(expression, pattern)

Arguments

Return Value

Example

REGEXP_REPLACE(expression, pattern, repl [, n])

Arguments

Return Value

Examples

REGEXP_SPLIT(expression, pattern)

Arguments

Return Value

Example

Aliases

REGEXP_CONTAINS(`expression`, `pattern`)

REGEXP_LIKE(`expression`, `pattern`)

REGEXP_MATCHES(`expression`, `pattern`)

REGEXP_POSITION(`expression`, `pattern`)

REGEXP_REPLACE(`expression`, `pattern`, `repl` [, `n`])

REGEXP_SPLIT(`expression`, `pattern`)