Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43446

Function with possible timers run 300X slower with very high worker counts

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • 7.0.0
    • 6.6.0, 6.6.1, 7.0.0-Beta1
    • eventing
    • Untriaged
    • 1
    • Unknown

    Description

      Testing and advance function (curl batching) on Enterprise Edition 6.6.1 build 9212 I came across an issue.  It seems this impacts 6.6.0 and 7.X (but NOT 6.5.1)

      Tracking it down to timers - I simplified the Eventing code down to the below trivial test cases.  Just load at least 600K docs in the source bucket.

      Note, I do get "sizing exercise warnings" in the UI on my system whenever I am over 48 workers across the set of all functions or just one function.

      Case 1) This runs at blazing speed with 3, 48 and 64 workers (on a 12 physical core Linux system)

      function timerCallback(context) {
       log('timerCallback',context);
      }
      function OnUpdate(doc, meta) {
       //if (false) createTimer(timerCallback, new Date(), "a1", {});
      }

      Case 2) The below (uncommented) runs just fine i.e. at blazing speed with 3 or 48, 51 workers (on a 12 physical core system).

      However the below runs pig slow if I raise the workers to 52, 56 or 64 (not 64 is the max allowed for the function on my 12 physical core Debian 10 system).

      function timerCallback(context) {
       log('timerCallback',context);
      }
      function OnUpdate(doc, meta) {
       if (false) createTimer(timerCallback, new Date(), "a1", {});
      }

      The only difference is that it might be "possible" in this later function to make Eventing timers but of course we never do since we wrap with "if (false) ..."

      This is obviously related to timers (but it is NOT related to the recent timer vBucket limitation to just 128 vBuckets implemented to improve scan times) as my 6.6.0 run had 2048 docs in the Eventing metadata bucket.

      Other

      In addition seemingly related if I start say four (4) identical functions (as per the Case 2) the uncommented version) with 24 workers each, then three (3) handlers run fast and complete the 600K data set and one (1) handler just crawls along super slow even after the other three (3) have nothing left to process.   

      I can even start six (6) function as per the commented version Case 1) with 60 workers each and more than 1/2 will go into the degenerate slow behaviour.  I only saw one finish fast.

      Obviously I am starting a ton of processes but the scale factor of the slowdown 300X just doesn't make sense.

      Yes, we also get "sizing exercise warnings" in these "Other" tests 

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jon.strabala Jon Strabala
            jon.strabala Jon Strabala
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty