Data Enrichment

Goal: Given a legacy document set containing attributes whose format makes them difficult to search on. In order to correct this search deficiency new searchable attributes will be added to the document. These new attributes related to and can be calculated from the original attributes. On any mutation (a document creation or modification) the new attributes should also be created (or updated)

Implementation: Implementation: Create JavaScript function that contains an OnUpdate handler. The handler listens for mutations or data-changes within a specified, source bucket. When any document within the bucket is created or modified, the handler executes a user-defined routine. In this example, if the created or altered document contains two specifically named fields containing IP addresses (these respectively corresponding to the beginning and end of an address-range), the handler-routine converts each of the IP addresses to an integer and upserts them as new fields in the document.

  • Case 1: A new document is created in a specified, target bucket: this new document is identical to the old, except that it has two new additional fields, which contain integers that correspond to the IP addresses. The original document, in the source bucket, is not changed.

  • Case 2: The original document, in the source bucket, is mutated (or changed) to have two additional fields, which contain integers that correspond to the IP addresses.

In the 6.5 release, the handler code can directly mutate (or write back) to the source bucket. Case 2 demonstrates this ability.

Preparations (Common)

For this example, three buckets 'source', 'target', and 'metadata', are required (note the metadata bucket for Eventing can be shared with other Eventing functions). For Case 1 we will use both 'source and 'target' buckets. for Case 2 we will only use the 'source' bucket. Make all three buckets with minimum size of 100MB.

For steps to create buckets, see Create a Bucket.

The 'metadata' bucket is for the sole use of the Eventing system, do not add, modify, or delete documents from this bucket. In addition do not drop or flush or delete the bucket while you have any deployed Eventing functions.

Procedure (Case 1):

  1. Access the Couchbase Web Console > Buckets page and click the Documents link of the source bucket.

    • You should see no user records.

    • Click Add Document in the upper right banner

    • In the Add Document dialog, specify the name SampleDocument as the New Document ID

    • Click Save.

    • In the Edit Document dialog, the following text is displayed:

      {
      "click": "to edit",
      "with JSON": "there are no reserved field names"
      }
    • replace the above text with the following JSON document via a cut-n-paste

      {
      "country": "AD",
        "ip_start": "5.62.60.1",
        "ip_end": "5.62.60.9"
      }
    • Click Save.

  2. From the Couchbase Web Console > Eventing page, click ADD FUNCTION, to add a new Function. The ADD FUNCTION dialog appears.

  3. In the ADD FUNCTION dialog, for individual Function elements provide the below information:

    • For the Source Bucket drop-down, select source.

    • For the Metadata Bucket drop-down, select metadata.

    • Enter case_1_enrich_ips as the name of the Function you are creating in the Function Name text-box.

    • [Optional Step] Enter text On mutation create a new document in a different bucket with additional fields, in the Description text-box.

    • For the Settings option, use the default values.

    • For the Bindings option, add two bindings.

      • For the first binding, select "bucket alias", specify src as the "alias name" of the bucket, and select source as the associated bucket, and select "read only".

      • For the first binding, select the "bucket alias", specify tgt as the "alias name" of the bucket, and select target as the associated bucket, and select "read and write".

    • After configuring your settings your screen should look like:

      enrichcase1 01 settings
  4. After providing all the required information in the ADD FUNCTION dialog, click Next: Add Code. The case_1_enrich_ips dialog appears.

    • The case_1_enrich_ips dialog initially contains a placeholder code block. You will substitute your actual case_1_enrich_ips code in this block.

      enrichcase1 02 editor with default
    • Copy the following Function, and paste it in the placeholder code block of case_1_enrich_ips dialog.

      function OnUpdate(doc, meta) {
        log('document', doc);
        doc["ip_num_start"] = get_numip_first_3_octets(doc["ip_start"]);
        doc["ip_num_end"]   = get_numip_first_3_octets(doc["ip_end"]);
        tgt[meta.id]=doc;
      }
      function get_numip_first_3_octets(ip) {
        var return_val = 0;
        if (ip) {
          var parts = ip.split('.');
          //IP Number = A x (256*256*256) + B x (256*256) + C x 256 + D
          return_val = (parts[0]*(256*256*256)) + (parts[1]*(256*256)) + (parts[2]*256) + parseInt(parts[3]);
          return return_val;
        }
      }

      After pasting, the screen appears as displayed below:

      enrichcase1 03 editor with code
    • Click Save.

    • To return to the Eventing screen, click the '< back to Eventing' link (below the editor) or click Eventing tab.

  5. The OnUpdate routine specifies that when a change occurs to data within the bucket, the routine get_numip_first_3_octets is run on each document that contains ip_start and ip_end. A new document is created whose data and metadata are based on those of the document on which get_numip_first_3_octets is run; but with the addition of ip_num_start and ip_num_end data-fields, which contain the numeric values returned by get_numip_first_3_octets. The get_numip_first_3_octets routine splits the IP address, converts each fragment to a numeral, and adds the numerals together, to form a single value; which it returns.

  6. From the Eventing screen, click Deploy.

    • In the Confirm Deploy Function dialog, select Everything from the Feed boundary option.

    • Click Deploy Function.

  7. The Eventing function is deployed and starts running within a few seconds. From this point, the defined Function is executed on all existing documents and will also more importantly it will also run on subsequent mutations.

  8. To check the results of the deployed Eventing Function, access the Couchbase Web Console > Buckets page and click the Documents link of the target bucket.

    • Edit the document and you will see a duplicate of the source bucket but without two new calculated fields as follows:

      {
        "country": "AD",
        "ip_end": "5.62.60.9",
        "ip_start": "5.62.60.1",
        "ip_num_start": 87964673,
        "ip_num_end": 87964681
      }
    • Click Cancel to close the editor.

  9. Because our Eventing Function is deployed it will continue to process all new mutations, let’s test this out.

  10. Access the Couchbase Web Console > Buckets page and click the Documents link of the source bucket.

    • You should see one user record (the one we entered at the beginning of this procedure).

    • Click Add Document in the upper right banner

    • In the Add Document dialog, specify the name AnotherSampleDocument as the New Document ID

    • Click Save.

    • In the Edit Document dialog, the following text is displayed:

      {
      "click": "to edit",
      "with JSON": "there are no reserved field names"
      }
    • replace the above text with the following JSON document via a cut-n-paste

      {
        "country": "RU",
        "ip_start": "7.12.60.1",
        "ip_end": "7.62.60.9"
      }
    • Click Save.

  11. To check results (which were updated in real time) by the deployed Eventing Function, access the Couchbase Web Console > Buckets page and click the Documents link of the target bucket.

    • Edit the newly created document and you will see a duplicate of the source bucket but without two new calculated fields as follows:

      {
        "country": "RU",
        "ip_end": "7.62.60.9",
        "ip_start": "7.12.60.1",
        "ip_num_start": 118242305,
        "ip_num_end": 121519113
      }
    • Click Cancel to close the editor.

Procedure (Case 2):

  1. IMPORTANT undeploy the Eventing Function (if running) case_1_enrich_ips. Access the Couchbase Web Console > Eventing page and click the function name case_1_enrich_ips link of the source bucket.

    • Click Undeploy

    • Click Undeploy Function to confirm.

  2. We assume that the two documents from Case 1 above exist in the 'source' bucket. If they don’t please create them in the 'source' bucket.

    • Access the Couchbase Web Console > Buckets page and click the Documents link of the source bucket.

    • You should see two user records (as previously created above).

      {
      "country": "AD",
        "ip_start": "5.62.60.1",
        "ip_end": "5.62.60.9"
      }
      {
        "country": "RU",
        "ip_start": "7.12.60.1",
        "ip_end": "7.62.60.9"
      }
  3. From the Couchbase Web Console > Eventing page, click ADD FUNCTION, to add a new Function. The ADD FUNCTION dialog appears.

  4. In the ADD FUNCTION dialog, for individual Function elements provide the below information:

    • For the Source Bucket drop-down, select source.

    • For the Metadata Bucket drop-down, select metadata.

    • Enter case_2_enrich_ips as the name of the Function you are creating in the Function Name text-box.

    • [Optional Step] Enter text On mutation enrich the mutated document in the same bucket with additional fields, in the Description text-box.

    • For the Settings option, use the default values.

    • For the Bindings option, add one binding.

      • For the first binding, select the "bucket alias", specify src as the "alias name" of the bucket, and select source as the associated bucket, and select "read and write".

    • After configuring your settings your screen should look like:

      enrichcase2 01 settings
  5. After providing all the required information in the ADD FUNCTION dialog, click Next: Add Code. The case_2_enrich_ips dialog appears.

    • The case_2_enrich_ips dialog initially contains a placeholder code block. You will substitute your actual case_2_enrich_ips code in this block.

      enrichcase2 02 editor with default
    • Copy the following Function, and paste it in the placeholder code block of case_2_enrich_ips dialog.

      function OnUpdate(doc, meta) {
        log('document', doc);
        doc["ip_num_start"] = get_numip_first_3_octets(doc["ip_start"]);
        doc["ip_num_end"]   = get_numip_first_3_octets(doc["ip_end"]);
        // !!! write back to the source bucket !!!
        src[meta.id]=doc;
      }
      function get_numip_first_3_octets(ip) {
        var return_val = 0;
        if (ip) {
          var parts = ip.split('.');
          //IP Number = A x (256*256*256) + B x (256*256) + C x 256 + D
          return_val = (parts[0]*(256*256*256)) + (parts[1]*(256*256)) + (parts[2]*256) + parseInt(parts[3]);
          return return_val;
        }
      }

      After pasting, the screen appears as displayed below:

      enrichcase2 03 editor with code
    • Click Save.

    • To return to the Eventing screen, click the '< back to Eventing' link (below the editor) or click Eventing tab.

  6. The OnUpdate routine specifies that when a change occurs to data within the bucket, the routine get_numip_first_3_octets is run on each document that contains ip_start and ip_end. A new document is created whose data and metadata are based on those of the document on which get_numip_first_3_octets is run; but with the addition of ip_num_start and ip_num_end data-fields, which contain the numeric values returned by get_numip_first_3_octets. The get_numip_first_3_octets routine splits the IP address, converts each fragment to a numeral, and adds the numerals together, to form a single value; which it returns.

  7. From the Eventing screen, click Deploy.

    • In the Confirm Deploy Function dialog, select Everything from the Feed boundary option.

    • Click Deploy Function.

  8. The Eventing function is deployed and starts running within a few seconds. From this point, the defined Function is executed on all existing documents and will also more importantly it will also run on subsequent mutations.

  9. To check the results of the deployed Eventing Function, access the Couchbase Web Console > Buckets page and click the Documents link of the source bucket.

    • Edit the "SampleDocument" it will have been enriched or modified with two new calculated fields:

      {
        "country": "AD",
        "ip_end": "5.62.60.9",
        "ip_start": "5.62.60.1",
        "ip_num_start": 87964673,
        "ip_num_end": 87964681
      }
    • Edit the "AnotherSampleDocument" it will also have been enriched or modified with two new calculated fields:

      {
        "country": "RU",
        "ip_end": "7.62.60.9",
        "ip_start": "7.12.60.1",
        "ip_num_start": 118242305,
        "ip_num_end": 121519113
      }
    • Click Cancel to close the editor.

  10. Because our Eventing Function is deployed it will continue to process all new mutations, let’s test this out.

  11. Access the Couchbase Web Console > Buckets page and click the Documents link of the source bucket.

    • Edit at "AnotherSampleDocument" again BUT change "ip_start" to "6.12.60.1"

      {
        "country": "RU",
        "ip_end": "7.62.60.9",
        "ip_start": "6.12.60.1",
        "ip_num_start": 118242305,
        "ip_num_end": 121519113
      }
    • Click Save to update the document and close the editor.

    • Edit at "AnotherSampleDocument" again and see the recalculation of "ip_num_start": 118242305 to "ip_num_start": 101465089 happened in real-time.

      {
        "country": "RU",
        "ip_end": "7.62.60.9",
        "ip_start": "6.12.60.1",
        "ip_num_start": 101465089,
        "ip_num_end": 121519113
      }
    • Click Cancel to close the editor.

Cleanup (both Case 1 and Case 2):

Cleanup, go to the Eventing portion of the UI and undeploy the Function(a) case_1_enrich_ips and case_2_enrich_ips, this will remove the 1024 documents for each funtion from the 'metadata' bucket (in the Bucket view of the UI). Remember you may only delete the 'metadata' bucket if there are no deployed Eventing functions.