You are viewing the documentation for a prerelease version.

View Latest

XDCR Advanced Filtering

XDCR Advanced Filtering allows specified subsets of documents to be replicated from the source bucket.

Understanding XDCR Advanced Filtering

XDCR filtering allows a document to be included in or excluded from a filtered replication, based on the document’s fields and values.

Case-sensitive matches can be made on:

  • id and xattrs values, within the document’s metadata.

  • Field-names and values, within the document’s data, nested to any degree.

Every document on which a match is successfully made is included in the filtered replication. Other documents are not included.

Match-requirements are specified by means of:

  • Regular Expressions. These can be used to specify case-sensitive character-matches, and thereby determine whether a field-name or value may entitle a document to be included in a replication. See the reference information provided in XDCR Regular Expressions.

  • Filtering Expressions. These allow comparisons and calculations to be made on the fields and values identified by means of regular expressions: based on the results, a document either is or is not included in a replication. See the reference information provided in XDCR Filtering Expressions.

Note that fields and values on which matches are to be made should typically be kept immutable. If the fields or values of a given document are changed after a replication has started (see Filter-Expression Editing, below), such a document may no longer meet the criterion for replication, and so go unreplicated in its new form — and yet may already reside in its previous form on the target cluster; since it formerly met the criterion, and was duly replicated. In consequence, a single document would be maintained with a different value on each cluster.

This page explains XDCR Advanced Filtering at a conceptual level. For the practical steps involved, see Filter a Replication. See also the information provided in the XDCR Advanced Filtering Reference.

No Filter Applied

When no filter is applied, all documents in the specified source bucket are replicated to the specified target bucket. For example:

filter replication diagram 1
Figure 1. Replication with no filter applied

The replication R specifies as its source Source Bucket, on the Source Cluster; and specifies as its target Target Bucket, on the Target Cluster. The replication specifies no filter.

When it starts, the replication examines the documents in the source bucket, which are airline_10 and airport_8835. Since no filter is applied, both documents are suitable for replication, and are duly replicated to the Target Bucket.

Filter Applied

When a filter is applied, on those documents whose fields or values provide a successful match are included in the replication. For example:

filter replication diagram 2
Figure 2. Replication with filter applied

The replication R specifies as its source Source Bucket, on the Source Cluster; and specifies as its target Target Bucket, on the Target Cluster. The replication specifies a filter: this requires that a document have a type field, whose value is a string that contains the substring air, and that this be followed by the substring l. For details on this kind of expression (referred to as positive lookahead), see the reference provided for XDCR Filtering Expressions.

When it starts, the replication examines the documents in the source bucket. The document airline_10 has a type field whose value provides a successful match; therefore, the document is replicated. The document airport_8835 does have a type field, but its value does not contain a string that provides a successful match; therefore, the document is not replicated.

Multiple Filters Applied

Multiple Filters can be applied in either of two ways:

  • By means of ORing, within a single replication. This allows a document to be replicated if any one of the specified filters makes a successful match. For information, see the XDCR Advanced Filtering Reference.

  • By means of individual or multiple ORed filters, specified across multiple replications. For example:

filter replication diagram 3a
Figure 3. Replication with multiple filters applied simultaneously

The replication R1 specifies as its source Source Bucket, on the Source Cluster; and specifies as its target Target Bucket 1, on the Target Cluster 1. The replication specifies a filter: as in the previous example, this requires that a document have a type field, whose value is a string that contains the substring air, and that this be followed by the substring l.

When it starts, the replication examines the documents in the source bucket. The document airline_10 has a type field whose value provides a successful match; therefore, the document is replicated to Target Bucket 1. The document airport_8835 does have a type field, but its value does not contain a string that provides a successful match; therefore, the document is not replicated.

As R1, the replication R2 specifies as its source Source Bucket, on the Source Cluster. However, it specifies as its target Target Bucket 2, on the Target Cluster 2. The replication specifies a filter: this requires that a document have a type field, whose value is a string that contains the substring air, and that this be followed by the substring p.

The document airport_8835 has a type field whose value provides a successful match; therefore, the document is replicated to Target Bucket 2. The document airline_10 does have a type field, but its value does not contain a string that provides a successful match; therefore, the document is not replicated.

Thus, each of the two documents in the source is replicated to one, distinct target bucket, on its own target cluster. Note that many variants of this example can be designed; including replicated the contents of a single source bucket to multiple target buckets on a single target cluster.

Filter-Expression Editing

The filter-expressions defined for a particular replication can be edited after their initial definition and use. This allows a single replication to employ multiple different filters and filter-combinations, sequentially.

Note that once a document has been replicated, it can only be removed from the target by being removed from the source. Therefore, if a replication’s filter-expression is changed, although it changes the criterion whereby documents are to be replicated in future, it does not affect the presence on those documents already replicated to the target according to the old criterion. If the intention is to populate the target only with documents that meet the new criterion, those documents on the target that do not meet the criterion must either be manually removed, or removed by means of flushing: see XDCR Bucket Flush, for details.

Note also that a replication only prepares to replicate all documents in the source bucket during its initial process; and afterwards, only considers mutations as candidates for replication. See XDCR Process, for details. Two options are therefore made available, whereby the continuance of a replication can be configured, following the editing of a filter-expression:

  • Restart. The current instance of the replication is ended, and a new instance is started, with the new filtering criterion. This causes a new running of the replication’s initial process, whereby all documents in the source bucket are examined. In consequence, documents that already meet the new filtering criterion, but were not replicated according to the old filtering criterion, and have not been mutated, are determined to be candidates for replication. This is the default.

  • Continue. The current instance of the replication continues, with the new filtering criterion. The replication’s initial process is not re-run. Therefore, documents that already meet the new filtering criterion, but were not replicated according to the old filtering criterion, and have not been mutated, are not replicated — unless they are mutated subsequently.

For example, it might be desirable to modify the replication shown above in Figure 2 — which searches for the string air, followed by the string l — without deleting and recreating the replication. The possible results are shown below.

Restart

In the following illustration, the filter-expression used in Figure 2 is changed, to search for the string air, followed by the string p. The restart option is specified.

filter replication diagram 5
Figure 4. Filter-expression edited, with restart option

In its original version, R1, the replication had identified, during its initial process, the document airline_10, which was duly replicated to the target bucket. The original filter-expression is edited, so that the replication becomes R1a; and the replication is restarted. During its initial process, it examines all documents in the source bucket; finding no match on airline_10, but finding a match on airport_8835, which is duly replicated to the target bucket.

Subsequently, R1a will examine all mutations, and will replicate those on which it achieves a successful match.

Continue

In the following illustration, the filter-expression used in Figure 2 is again changed to search for the string air, followed by the string p. This time, the continue option is specified.

filter replication diagram 6
Figure 5. Filter-expression edited, with continue option

In its original version, R1, the replication had identified, during its initial process, the document airline_10, which was duly replicated to the target bucket. The original filter-expression is edited, so that the replication becomes R1a; and the replication is continued. There is no repetition of the initial process: therefore, the existing documents airline_10 and airport_8835 are not re-examined; and no replication occurs.

Subsequently, R1a will examine all mutations, and will replicate those on which it achieves a successful match. This is illustrated as follows:

filter replication diagram 7
Figure 6. Mutation recognized with continue option

The new document airline_8838 is added the the source bucket, and is examined by R1a. A successful match is made, and airline_8838 is duly replicated to the target bucket.