You are viewing the documentation for a prerelease version.

View Latest

XDCR Filtering Expressions

XDCR filtering expressions allow a document to be included in or excluded from a filtered replication, based on the document’s fields and values.

Understanding Filtering Expressions

XDCR Advanced Filtering expressions, applied to the documents within a specified source bucket, allow case-sensitive matches to be made on:

  • id and xattrs values, within the document’s metadata.

  • Field-names and values, within the document’s data, nested to any degree.

Every document on which a match is successfully made is included in the filtered replication. Other documents are not included.

XDCR filtering expressions constitute a subset of N1QL expressions, with some additions.

Sections on this page list the available expressions, provide examples of their usage, and provide links to documentation for the comparable N1QL expressions. A list of reserved words is provided. Additionally, relationships between expressions are listed in in BNF (Backus-Naur Form) notation.

Note that most examples assume the travel-sample bucket to be the source of the filtered replication. For information on installing this bucket, see Install Sample Buckets.

XDCR Advanced Filtering can be managed by means of Couchbase Web Console (see Filter a Replication), the CLI (see xdcr-replicate), and the REST API (see Creating XDCR Replications).

Pattern Matching

Pattern matching is supported with the following syntax:

REGEXP_CONTAINS (expression, pattern)

For example, the following expression specifies country as the expression, and "France" as the pattern:

REGEXP_CONTAINS(country, "France")

If travel-sample is used as the source bucket, every document that contains a field country whose value is "France" is replicated to the target. For example, this would include the document airline_1191:

{
  "callsign": "REUNION",
  "country": "France",
  "iata": "UU",
  "icao": "REU",
  "id": 1191,
  "name": "Air Austral",
  "type": "airline"
}

Metadata Access

Each document contains metadata, as well as data. Within the metadata, a document’s id and extended attributes can be accessed by means of the reserved word META, used as part of the the pattern-matching expression.

For example, the following expression seeks a match on any document the value of whose metadata id field contains the substring "airline_10":

REGEXP_CONTAINS(META().id, "airline_10")

This would match a number of documents from travel-sample, including airline_10, whose metadata is as follows:

{
  "meta": {
    "id": "airline_10",
    "rev": "1-159642cc11a000000000000002000000",
    "expiration": 0,
    "flags": 33554432,
    "type": "json"
  },
  "xattrs": {}
}

Extended Attributes are optionally used to define application-specific metadata. Their field name is xattrs. In the metadata example shown above, no value exists for the field. If an extended attribute were fully defined — for example, a color field, assigned the value blue — the appearance within the metadata would be as follows:

  "xattrs": {
    "color": "blue"
  }

The extended attribute’s value could then be filtered and matched with the following expression:

REGEXP_CONTAINS(META().xattrs.color, "blue")

For information on extended attributes, see Extended Attributes.

Lookahead

XDCR Advanced Filtering is supported by lookahead. Lookahead is used to specify a pattern, in pattern matching; and permits a match when one specified character or character-sequence:

  • Is followed by another specified character or character-sequence. This is termed positive lookahead.

    The following syntax is used:

    char1 ( ?= char2 )

    The specified char1 must therefore be located and, for a successful match to be obtained, must be followed by the specified char2.

  • Is not followed by another specified character or character-sequence. This is termed negative lookahead.

    The following syntax is used:

    char1 ( ?! char2 )

    The specified char1 must therefore be located, and, for a successful match to be obtained, must not be followed by the specified char2.

Lookahead might therefore be used on the travel-sample bucket, to filter documents whose metadata id contains airport rather than airline, or vice versa. For example, the following expression uses positive lookahead to specify that the metadata id that begins with air should be immediately followed by the character p; thereby returning documents whose metadata id value is airport, but not airline:

REGEXP_CONTAINS(META().id, "^air(?=p)")

For information on the expression ^, see XDCR Regular Expressions.

Alternatively, negative lookahead might be used to specify that the metadata id that begins with air should not be followed by the character l; thereby returning documents whose metadata id value is airport, but not airline:

REGEXP_CONTAINS(META().id, "^air(?!l)")

XDCR Pattern Matching: Comparison with N1QL

The function REGEXP_CONTAINS is also supported by N1QL. Note, however, that N1QL does not support forward lookahead.

N1QL supports a number of pattern-matching functions in addition to REGEXP_CONTAINS: see Pattern-matching Functions.

META is one of a group of reserved words used by XDCR Advanced Filtering. For details, see Reserved Words, below. META is also one of a larger group of reserved words used by N1QL. For details, see the page for N1QL Reserved Words. Note that in N1QL, META provides access to a wider range of extended attributes.

Checking for Existence

The existence of a field can be checked for, by means of the Collection Operator EXISTS. The syntax is as follows:

EXISTS ( expression )

For example:

EXISTS(country)

This returns every document that contains a country field. This would therefore include airline_10:

{
  "callsign": "MILE-AIR",
  "country": "United States",
  "iata": "Q5",
  "icao": "MLA",
  "id": 10,
  "name": "40-Mile Air",
  "type": "airline"
}

XDCR Collection Operator: Comparison with N1QL

XDCR Advanced Filtering provides the single Collection Operator, EXISTS; which determines whether or not a specified field exists in the body of a document. N1QL uses EXISTS on subclauses; and provides a variety of additional Collection Operators. For details, see the N1QL page for Collection Operators.

Using Logical Operators

XDCR Advanced Filtering provides the Logical Operators AND, OR, and NOT. See the Filtering Expression BNF, below, for the syntactic possibilities of these operators.

AND can be used to add a required condition to an expression. For example:

REGEXP_CONTAINS(country, "France") AND airportname = "La Teste De Buch"

This provides a successful match on any document whose country value is "France", and whose aiportname value is "La Test De Buch". This would therefore include the following document in the replication:

{
  "airportname": "La Teste De Buch",
  "city": "Arcachon",
  "country": "France",
  "faa": "XAC",
  "geo": {
    "alt": 49,
    "lat": 44.59639,
    "lon": -1.110833
  },
  "icao": "LFCH",
  "id": 1283,
  "type": "airport",
  "tz": "Europe/Paris"
}

OR can be used to add an alternative condition to an expression. For example:

REGEXP_CONTAINS(country, "France") OR country = "United States"

This provides a successful match on any document whose country value is either "France" or "United States".

NOT can be prepended to a condition, to allow a successful match only when the condition is false. For example:

REGEXP_CONTAINS(country, "France") AND NOT airportname = "La Teste De Buch"

This provides a successful match on every document that contains the country value "France", and contains an airportname value that is not "La Teste De Buch".

All words used as Logical Operators for XDCR Advanced Filtering are reserved. For details, see Reserved Words, below.

XDCR Logical Operators: Comparison with N1QL

N1QL provides the same Logical Operators as does XDCR Advanced Filtering. For details, see the N1QL page for Logical Operators.

Using Comparison Operators

XDCR Advanced Filtering provides the following Comparison Operators:

= and ==

!= and <>

>

>=

<

<=

IS NULL

IS NOT NULL

IS MISSING

IS NOT MISSING

As this indicates, the tests for equality and inequality are each provided in two versions, to ensure compatibility with different languages.

Note that for purposes of comparison, each field within a document is categorized as one of the following:

  • Has a non-null value

  • Is specified as null

  • Is missing a non-null value, and is not specified as null

Therefore:

  • IS NULL is successfully matched with null.

  • IS NOT NULL is successfully matched when the field either has a value, or is missing a value.

  • IS MISSING is successfully matched when neither null nor a value is present.

  • IS NOT MISSING is successfully matched when either null or a value is present.

For example:

REGEXP_CONTAINS(country, "France") AND name != "40-Mile Air"

This provides a successful match with every document whose country value is "France", and whose name value is not "40-Mile Air". This would include airline_1191:

{
  "callsign": "REUNION",
  "country": "France",
  "iata": "UU",
  "icao": "REU",
  "id": 1191,
  "name": "Air Austral",
  "type": "airline"
}

The following, additional example tests for a null icao field, on documents whose country value is United States:

REGEXP_CONTAINS(country, "United States") AND icao IS NULL

This returns a number of matches, one of which is airport_4079:

{
  "airportname": "Orlando",
  "city": "Orlando",
  "country": "United States",
  "faa": "DWS",
  "geo": {
    "alt": 340,
    "lat": 28.398,
    "lon": -81.57
  },
  "icao": null,
  "id": 4079,
  "type": "airport",
  "tz": "America/New_York"
}

All words used in Comparison Operators for XDCR Advanced Filtering are reserved. For details, see Reserved Words, below.

XDCR Comparison Operators: Comparison with N1QL

The Comparison Operators provided by XDCR Advanced Filtering are a subset of those provided by N1QL. For details, see the N1QL page for Comparison Operators.

Selecting Fields and Elements

XDCR Advanced Filtering provides operators for Field Selection and Element Selection.

Field Selection

The Field Selection Operator is the period: . This allows a child-field, within a parent-field, to be specified. Note that the Field Selection Operator was used to specify the metadata id field, in Metadata Access, above.

The following example uses the Field Selection Operator to obtain a match on any document that contains a country field with a value of "United States", and also has an alt field, within the value of its geo field, with a value that is greater than or equal to 6813.

REGEXP_CONTAINS(country, "United States") AND geo.alt >= 6813

This returns a number of matches, including airport_4084:

{
  "airportname": "Telluride",
  "city": "Telluride",
  "country": "United States",
  "faa": "TEX",
  "geo": {
    "alt": 9078,
    "lat": 37.953759,
    "lon": -107.90848
  },
  "icao": "KTEX",
  "id": 4084,
  "type": "airport",
  "tz": "America/Denver"
}

When non-standard alphanumeric characters have been used in field-naming (for example, space or bracket characters), the field, when referenced in an advanced filtering expression, should be escaped with backticks: eg, field.`the field name`.

Element Selection

The Element Selection Operator, which is provided for use on arrays, takes the form [ n ], where n is an array-position.

For example, the following provides a successful match when a document whose airline value is "AA" also contains a schedule array, whose initial member has a field flight with a value of "AA679":

REGEXP_CONTAINS(airline, "AA") AND schedule[0].flight = "AA679"

This produces a match on document route_5784:

{
  "airline": "AA",
  "airlineid": "airline_24",
  "destinationairport": "PHL",
  "distance": 153.59665185566308,
  "equipment": "E90 DH3 319",
  "id": 5784,
  "schedule": [{
    "day": 0,
    "flight": "AA679",
    "utc": "22:01:00"
  }, {
    "day": 0,
    "flight": "AA253",
    "utc": "22:29:00"
  }, {
    "day": 1,
      .
      .
      .
  }, {
    "day": 6,
    "flight": "AA661",
    "utc": "16:35:00"
  }],
  "sourceairport": "LGA",
  "stops": 0,
  "type": "route"
}

XDCR Selection Operators: Comparison with N1QL

The Field Selection Operator is used in N1QL with additional functionality: nested expressions support is provided. The Element Selection Operator is also used in N1QL with additional functionality: negative indexing of arrays and the * operator are supported. N1QL also provides array slicing that allows the building of sub-slices of arrays. For information, see the N1QL page for Nested Operators and Expressions.

Using Arithmetic Operators

XDCR Advanced Filtering is supported by the following arithmetic operators:

Operator Description

+

Add values.

-

Subtract right value from left value.

*

Multiply values.

/

Divide left value by right value.

%

Modulo. Divide left value by right value and return the remainder.

NOTE: Modulo is an integer operator and will use only the integer part of each value.

-value

Negate value.

For additional details on syntax and return values, see the N1QL page for Arithmetic Operators.

For example, the following filtering expression provides a successful match on documents whose country value is United States and whose id is an integer value that returns a value of less than or equal to 3, when modulo 7 is applied:

REGEXP_CONTAINS(country, "United States") AND id % 7 <=  3

XDCR Arithmetic Operators: Comparison with N1QL

N1QL provides the same arithmetic operators, with the same semantics and return values. See the N1QL page for Arithmetic Operators.

Number Functions

XDCR Advanced Filtering is supported by Number Functions. These functions are the same as those provided by N1QL. However, the following N1QL Number Functions do not apply to XDCR Advanced Filtering:

  • RANDOM

  • SIGN

  • TRUNC

For information on all other functions, see the N1QL page for Number Functions.

Handling Dates

XDCR Advanced Filtering provides a basic DATE function that allows user to compose filtering expressions based on dates.

Dates can be specified in RFC3339 format. Dates can also be specified in the following ISO-8601 formats:

ISO-8601 Format Example

Date

2019-01-25

Date and time in UTC

2019-01-25T18:40:37Z

For example:

REGEXP_CONTAINS(event_type, "birthday") AND DATE(date) >= DATE("2019-01-25")

This seeks a match on any document whose date value is equal to or later than "2019-01-25". If created, the following document provides a successful match:

{
  "event_type": "birthday",
  "venue": "white hart hotel, salisbury, uk",
  "session": "afternoon",
  "dining_preference": "buffet",
  "number_of_guests": 25,
  "date": "2019-01-25"
}

Note that times are supported by utilizing the following RFC-3339 format:

DATE(transaction.time) < DATE(2018-01-01T12:00Z)

XDCR Date Operators: Comparison with N1QL

N1QL date functions are not supported by XDCR Advanced Filtering. For information on N1QL date functions, see the N1QL page for Date Functions.

Reserved Words

Reserved Words are words used syntactically by XDCR Advanced Filtering. If these words have used as identifiers in JSON documents, in order to be referenced in filtering expressions, they must be escaped, by means of backticks. The complete list of words is as follows:

AND

EXISTS

FALSE

IF

IS

META

MISSING

NOT

NULL

OR

TRUE

Filtering Expression BNF

The relationships between available expressions for XDCR Advanced Filtering are expressed in the following table, in Backus-Naur Form.

Expression Is Equal To

FilterExpression

( "(" FilterExpression ")" { "AND" FilterExpression } { "OR" FilterExpression } ) | InnerExpression { "AND" FilterExpression }

InnerExpression

AndCondition { "OR" AndCondition }

AndCondition

Condition { "AND" Condition }

Condition

( [ "NOT" ] Condition ) | Operand

Operand

BooleanExpr | ( LHS ( CheckOp | ( CompareOp RHS) ) )

BooleanExpr

Boolean | BooleanFuncExpr

LHS

ConstFuncExpr | Boolean | Field | Value

RHS

ConstFuncExpr | Boolean | Value | Field

CompareOp

"=" | "==" | "<>" | "!=" | ">" | ">=" | "<" | "⇐"

CheckOp

( "IS" [ "NOT" ] ( NULL | MISSING ) )

Field

{ @"-" } OnePath { "." OnePath } { MathOp MathValue }

OnePath

( PathFuncExpression | StringType ){ ArrayIndex }

StringType

@String | @Ident | @RawString | @Char

ArrayIndex

"[" @Int "]"

Value

@String

ConstFuncExpr

ConstFuncNoArg | ConstFuncOneArg | ConstFuncTwoArgs

ConstFuncNoArg

ConstFuncNoArgName "(" ")"

ConstFuncNoArgName

"PI" | "E"

ConstFuncOneArg

ConstFuncOneArgName "(" ConstFuncArgument ")"

ConstFuncOneArgName

"ABS" | "ACOS"…​

ConstFuncTwoArgs

ConstFuncTwoArgsName "(" ConstFuncArgument "," ConstFuncArgument ")"

ConstFuncTwoArgsName

"ATAN2" | "POW"

ConstFuncArgument

Field | Value | ConstFuncExpr

ConstFuncArgumentRHS

Value

PathFuncExpression

OnePathFuncNoArg

OnePathFuncNoArg

OnePathFuncNoArgName "(" ")"

MathOp

@"+" | @"-" | @"*" | @"/" | @"%"

MathValue

@Int | @Float

OnePathFuncNoArgName

"META"

BooleanFuncExpr

BooleanFuncTwoArgs | ExistsClause

BooleanFuncTwoArgs

BooleanFuncTwoArgsName "(" ConstFuncArgument "," ConstFuncArgumentRHS ")"

BooleanFuncTwoArgsName

"REGEXP_CONTAINS"

ExistsClause

( "EXISTS" "(" Field ")" )