Troubleshooting and Best Practices

  • Capella Operational
      +

      Why do similar functions that I write seem to run slower in 7.0.0 than 6.6.2?

      The default number of workers per function was three (3) in 6.X and is now one (1) in 7.0.0. You can simply raise the number of workers to 3 to get back the expected performance.

      all upgrades will carry forward the configured number of workers in an Eventing Function so you don’t have to worry about a production system slowing down during an upgrade.

      Raising the worker counts should be down if you need higher levels of throughput, for example cURL functions access slow external REST endpoints need more workers to scale performance up (in this case you are IO bound and not CPU bound).

      What is the Security role "Eventing Full Admin" for?

      Prior to 7.0.0 Eventing always run as "Full Admin" this blocked some use cases and adoption as this role allowed creation of new users and the ability to escalate privilege sets. The new "Eventing Full Admin" role removes the capability of creating users and modifying credentials thus providing a bit more security.

      What happens when more Workers are allocated for a Function?

      Couchbase Server for a specific Function limits the maximum number of workers to 64 (note the default is 3 workers). This upper limit is configured for system optimization purposes. You cannot create a function with more than this upper bound.

      When deploying (or resuming a paused function) a threshold is dynamically calculated based on node’s resources and if the number of workers exceeds this calculation, then the system automatically generates a warning message but does not prevent the Function deployment. An example follows:

      There are 104 eventing workers configured to run on 24 cores. A sizing exercise is recommended.

      Typically you shouldn’t configure more than 4 × the number of physical cores (or 2 × the number of vCPUS) across all your Eventing functions. If you have a high throughput for every Eventing Function for best performance the total number of workers should not exceed the number of physical cores. If you wish to support a high number of curl() calls to slow REST endpoints (> 20ms+) you may need to define more workers to increase parallelism.

      When should developers use the try-catch block in Eventing Functions?

      As a best practice, while writing the Eventing Function’s JavaScript code, for basic error handling and debugging operations, it is recommended that application developers use the try-catch block.

      Before deployment, Couchbase Server verifies the Eventing Function’s code. Only valid Functions get deployed. Using the log() option within a try-catch block(s), you can record errors. These error logs get stored in the Eventing function’s application log file. Note the Eventing function’s application log file on disk is specific to the node that processed the mutation and is not global across the cluster. By default, JavaScript runtime errors get stored in the system logs. Unlike system logs, troubleshooting and debugging operations are easy when you use the try-catch block and application log() options.

      During runtime, Application logs, by default, do not capture any Eventing Function code exceptions. To log exceptions, it is recommended to encapsulate your code in a try-catch block.

      A sample try-catch block is provided for reference:

      function OnUpdate(doc, meta) {
          log('document', doc);
          try {
              var time_rand = random_gen();
              dst_col[meta.id + time_rand] = doc;
          } catch(e) {
              log(e);
          }
      }

      What are bucket alias considerations during a Function definition?

      Eventing Functions can trigger data mutations. To avoid a cyclic generation of data changes, ensure that you carefully consider the below aspects while specifying source and destination collections:

      Infinite recursion protection

      • Avoid infinite recursions. If you are using a series of Eventing Functions, ensure they do not cause cyclic mutations by triggering write operations that, in turn, trigger other Eventing Functions. For example the following design demonstrates an infinite recursion:

        functionA with source collectionA target collectionB aliased as same.

        function OnUpdate(doc, meta) {
            collectionB[meta.id] = {"status":"updated by functionA"};
        }

        functionB with source collectionB target collectionC aliased as same.

        function OnUpdate(doc, meta) {
            collectionC[meta.id] = {"status":"updated by functionB"};
        }

        functionC with source collectionC target collectionA aliased as same.

        function OnUpdate(doc, meta) {
            collectionA[meta.id] = {"status":"updated by functionC"};
        }

        In the example above a single mutation in "collectionA" will create an infinite loop updating a record in "collectionB", then in "collectionC", then back to "collectionA", over and over.

        One possible solution is to change the design above such that functionC updated collectionD (instead of collectionA) we would have no recursion as follows:

        functionC (modified to write to a different collection) with source collectionA target collectionD aliased as same.

        function OnUpdate(doc, meta) {
            collectionD[meta.id] = {"status":"updated by functionC"};
        }

        Another possible solution to the design above is to change the design is changes such that functionA performs a check to ensure that if functionC has operated on the document to cease any new mutations as follows:

        functionA (modified to stop recursion) with source collectionA target collectionB aliased as same.

        function OnUpdate(doc, meta) {
            if (doc["status"] == "updated by functionC") return;
            collectionB[meta.id] = {"status":"updated by functionA"};
        }
      • Although the Couchbase Server can flag simple infinite recursions a long chain of source and destination collections with a series of Eventing Functions, a complex infinite recursion condition may occur. The developer, carefully consider and avoid these cases.

      • As a best practice, ensure that collections to which the Eventing Function performs a write operation do not have other Eventing Functions configured for tracking data mutations.

      There is a special case of direct self-recursion, which is highly useful, when a Eventing Function chooses to create a Read-Write binding to its own source collection we can perform document enrichment operations. In this case the direct self-recursive mutations and detected and suppressed by the Eventing framework. However this capability is only supported for the aliased JavaScript map and is not supported for mutations generated via SQL++.

      Since the 6.5 release, the Eventing Function JavaScript code can directly mutate (or write back) to the source bucket (now in 7.0.0 the source collection), e.g. direct self-recursion.
      • For example the following design is taken from the Data Enrichment, Case: 2:

        functionDirectEnrich with source collectionA target collectionA aliased as 'src'

        function OnUpdate(doc, meta) {
          log('document', doc);
          doc["ip_num_start"] = get_numip_first_3_octets(doc["ip_start"]);
          doc["ip_num_end"]   = get_numip_first_3_octets(doc["ip_end"]);
          // !!! write back to the source collection !!!
          src[meta.id]=doc;
        }
        function get_numip_first_3_octets(ip) {
          var return_val = 0;
          if (ip) {
            var parts = ip.split('.');
            //IP Number = A x (256*256*256) + B x (256*256) + C x 256 + D
            return_val = (parts[0]*(256*256*256)) + (parts[1]*(256*256)) + (parts[2]*256) + parseInt(parts[3]);
            return return_val;
          }
        }

      Disabling infinite recursion checks

      Although not encouraged, in some cases a client may still require the creation of potentially recursive loops.

      For example trying to run ConvertBucketToCollections in a single bucket (as both source and target) will emit an 'ERR_INTER_FUNCTION_RECURSION' error.

      To allow recursion and thus the above Function to run a special configuration flag may be toggled via a REST API call to any Eventing node as follows:

      To disable recursion checks (requires admin privileges):

      curl -X POST -u Administrator:password http://192.168.1.5:8091/_p/event/api/v1/config -d '{"allow_interbucket_recursion":true}'

      To re-enable recursion checks (requires admin privileges):

      curl -X POST -u Administrator:password http://192.168.1.5:8091/_p/event/api/v1/config -d '{"allow_interbucket_recursion":false}'

      In the cluster, I notice a sharp increase in the Timeout Statistics. What are my next steps?

      When the Timeout Statistics shows a sharp increase, it may be due to two possible scenarios:

      • Increase in execution time: When the Eventing Function execution time increases, the Function execution latency gets affected, and this in turn, leads to Function backlog and failure conditions.

      • Script timeout value: When the script timeout attribute value is not correctly configured, then you encounter timeout conditions frequently.

      As a workaround, it is recommended to increase the script timeout value. Ensure that you configure the script timeout value after carefully evaluating the execution latency of the Function.

      As a best practice use a combination of try-catch block and the application log options. This way you can monitor, debug and troubleshoot errors during the Function execution.

      Why is it important that the Eventing Storage keyspace (metadata collection) be 100% memory resident?

      If the collection you chose to hold your meta data spills over to disk access is not 100% resident, your Eventing system can essentially stall and/or slow down by orders of magnitude and you can also experience failures and/or missed mutations.

      Always make sure that the memory quota on your metadata Eventing Storage keyspace (metadata collection) is sufficiently large to ensure a residency ratio of 100%. Additionally avoid using an Ephemeral bucket for your Eventing Storage keyspace (refer to next question for details).

      You should only use Buckets of type Couchbase for data persistence of the Eventing Storage or metadata (for details refer to next question).

      Can I use Ephemeral Buckets with Eventing?

      Yes, Ephemeral are fine for user data but not for the Eventing Storage (metadata collection).

      The source bucket and any bucket (or keyspace) bindings of your Eventing Function can be Ephemeral. However, the Eventing Storage keyspace (metadata collection) should always be persistent.

      The Eventing Storage keyspace must be in a Bucket of type Couchbase. If this keyspace is not persistent the Data Service, or KV, will evict timer and checkpoint documents on hitting quota and Eventing can lose track of both timers and mutations processed. Furthermore at any point, refrain from deleting the Eventing metadata collection. Also, ensure that your Eventing Function’s JavaScript code or other services do not perform a write or delete operation on the Eventing metadata collection.

      Eventing worked fine when application was first deployed but now I am getting LCB_ETMPFAIL failures.

      A low residency ratio for either the source or the destination collection (sometimes these two can be the same) can result in a system that’s unable to keep up with rate of mutations and internal logic’s required reads and writes to the data service.

      Watch the number of documents in your collections (source, Eventing Storage, and destination(s)) and in particular pay close attention to the change in the resident ratio. Typically, this could be due to growth in your overall data set.

      For example, a high velocity Eventing function that is processing in excess of 12K mutations/sec with a source or destination collection residency ratio of 100% can easily start to experience issues if the residency ratio drops below 18% (this percentage isn’t hard and fast and may vary based on a variety of factors such as the number of mutations acted on, the storage type, and so on).

      2020-03-13T11:46:32.383-07:00 [INFO] "Exception: " {"message":{"code":392,"desc": \
      "Temporary failure received from server. Try again later","name":"LCB_ETMPFAIL"}, \
      "stack":"Error\n    at OnUpdate (MyEventingFunction.js:177:25)"}

      The above error indicates that the system is under provisioned for the load. Under the hood, Eventing will try to access to the data store five (5) times with a 200ms pause between attempts. If all of the attempts fail, the Eventing Function, in this case MyEventingFunction, throws an LCB_ETMPFAIL message from libcouchbase. This is important to understand as trapping the above exception and retrying the same operation inside your Eventing Function will only exacerbate the issue and make things worse. Of course your Eventing Function can take other actions such as creating a notification.

      There are two solutions:

      1. The first solution is to increase the memory quota of the collection’s bucket in question (thus increasing the resident ration).

      2. The second solution is to add more Data nodes, faster disk IO, and more memory to eliminate the resource bottleneck.

      Always escape quotes in regular expressions in your Eventing Function.

      When using bare regular expressions you should always escape a single quote or a double quote with a backslash character. Although non-escaped quotes are legal in the JavaScript language they do not pass Eventing Service’s parser.

      mystring.match(/(\S+)[^=]=["']?((?:.(?!["']?\s+(?:\S+)[^=]=|[>"']))+.)["']?/g);

      The above bare regular expression should be written with the quotes escaped via the \ character.

      mystring.match(/(\S+)[^=]=[\"\']?((?:.(?![\"\']?\s+(?:\S+)[^=]=|[>\"\']))+.)[\"\']?/g);