Data Enrichment

Goal: A document contains attributes whose form makes them difficult to search on. Therefore, on the document’s creation or modification, a new attribute should be created to accompany each original attribute; this new attribute being instantiated with a value that directly corresponds to that of its associated, original attribute; but takes a different form, thereby becoming more supportive of search. Original attributes are subsequently retrievable based on successful searches performed on new attributes.

Implementation: Create a JavaScript function that contains an OnUpdate handler. The handler listens for data-changes within a specified, source bucket. When any document within the bucket is created or mutated, the handler executes a user-defined routine. In this example, if the created or mutated document contains two specifically named fields containing IP addresses (these respectively corresponding to the beginning and end of an address-range), the handler-routine converts each IP address to an integer. A new document is created in a specified, target bucket: this new document is identical to the old, except that it has two additional fields, which contain integers that correspond to the IP addresses. The original document, in the source bucket, is not changed.

Preparations

This example requires the creation of three buckets: metadata, target and source buckets.

Proceed as follows:

  1. Create target and metadata buckets. To create a bucket, refer to Create a Bucket. The target bucket contains documents that will be created by the Function. Don’t add any documents explicitly to this bucket.

  2. Follow Step 1. and create a source bucket. In the Source bucket screen:

    1. Click Add Document.

    2. In the Add Document window, specify the name SampleDocument as the New*Document ID*

    3. Click Save.

    4. In the Edit Document dialog, paste the following within the edit panel.

      {

      "country": "AD",

      "ip_end": "5.62.60.9",

      "ip_start": "5.62.60.1"

      }

    5. Click Save. This step concludes all required preparations.

Procedure

Proceed as follows:

  1. From the Couchbase Web Console > Eventing page, click ADD FUNCTION, to add a new Function.

  2. In the ADD FUNCTION dialog, for individual Function elements provide the below information:

    • For the Source Bucket drop-down, select the source option that was created for this purpose.

    • For the Metadata Bucket drop-down, select the metadata option that was created for this purpose.

    • Enter enrich_ip_nums as the name of the Function you are creating in the Function*Name* text-box.

    • Enter Enrich a document, converts IP Strings to Integers that are stored in new attributes, in the Description text-box.

    • For the Settings option, use the default values.

    • For the Bindings option, specify target as the name of the bucket, and specify tgt as the associated value.

  3. After providing all the required information in the ADD FUNCTION dialog, click Next: Add Code. The enrich_ip_nums dialog appears. The enrich_ip_nums dialog initially contains a placeholder code block. You will substitute your actual enrich_ip_nums code in this block.

    addfunctions ex1
  4. Copy the following Function, and paste it in the placeholder code block of the enrich_ip_nums dialog:

     function OnUpdate(doc, meta) {
      log('document', doc);
      doc["ip_num_start"] = get_numip_first_3_octets(doc["ip_start"]);
      doc["ip_num_end"]   = get_numip_first_3_octets(doc["ip_end"]);
      tgt[meta.id]=doc;
    }
    
    function get_numip_first_3_octets(ip)
    {
      var return_val = 0;
      if (ip)
      {
        var parts = ip.split('.');
    
        //IP Number = A x (256*256*256) + B x (256*256) + C x 256 + D
        return_val = (parts[0]*(256*256*256)) + (parts[1]*(256*256)) + (parts[2]*256) + parseInt(parts[3]);
        return return_val;
      }
    }

    After pasting, the screen appears as displayed below:

    enrich ip nums

    The OnUpdate routine specifies that when a change occurs to data within the bucket, the routine get_numip_first_3_octets is run on each document that contains ip_start and ip_end. A new document is created whose data and metadata are based on those of the document on which get_numip_first_3_octets is run; but with the addition of ip_num_start and ip_num_end data-fields, which contain the numeric values returned by get_numip_first_3_octets. The get_numip_first_3_octets routine splits the IP address, converts each fragment to a numeral, and adds the numerals together, to form a single value; which it returns.

  5. Click Save.

  6. To return to the Eventing screen, click Eventing and click on the newly created Function name. The Function enrich_ip_nums is listed as a defined Function.

    deploy enrich ip nums
  7. Click Deploy.

  8. From the Confirm Deploy Function dialog, click Deploy Function. From this point, the defined Function is executed on all existing documents and on subsequent mutations.

  9. To check results of the deployed Function, click the Documents tab.

  10. Select target bucket from the Bucket drop-down.As this shows, a version of SampleDocument has been added to the target bucket. It contains all the attributes of the original document, with the addition of ip_num_start and ip_num_end; which contain the numeric values that correspond to ip_start and ip_end, respectively. Additional documents added to the source bucket, which share the ip_start and ip_end attributes, will be similarly handled by the defined Function: creating such a document, and changing any attribute in such a document both cause the Function’s execution.