Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40498

Eventing is not retrying bucket ops failures like ETMPFAIL that can be retried



    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.0
    • 6.6.0
    • eventing
    • None
    • Untriaged
    • 1
    • Unknown


      I am seeing failures with many Eventing workers and 25M+ docs

      I create an Eventing function "test_update_2" (attached) with an alias of "bdp_vardata" to a bucket called "crondata" (Memory Quota 7.9GB) and have 64 workers with the following source code:

      function OnUpdate(doc, meta) {
       var maxattempt = 2;
       for (var tries=1; tries<=maxattempt; tries++) {
         try {
           var doc = bdp_vardata[meta.id];
           doc.random = Math.random();
           bdp_vardata[meta.id] = doc;
         } catch (e) {
           if (tries === maxattempt) 
             log("attempt "+ tries + " error occured during deletion :: ", 
                 e, " for id ", meta.id); 

      The source bucket and the bucket that is updated is "crondata" in addition there is a 100MB Eventing meta data bucket "metadata"


      I load 25,528,448 document into crondata with a KEYs like todelete01::100006 and data like

       "type": "vbs_seed",
       "id": 100006,

      once the eventing function runs all documents in bucket "crondata" will be enriched with a new field called "random"

       "type": "vbs_seed",
       "id": 100006,
       "random": 0.22187920300189878

      The single node server

      When I run Eventing on my 12 core 2.1Ghz 64 MB Xeon

      uname -a
      Linux couch01 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux

      /opt/couchbase/bin/couchbase-server -v
      Couchbase Server 6.6.0-7883 (EE)

      Configured with Eventing 256 RAM, Data 7900 MB RAM no other services

      The Issue 

      The system will process about 7.6 Million doc (mutations) and then I will get LCB_ETMPFAIL errors.

      2020-07-15T18:52:46.795-07:00 [INFO] "attempt 2 error occured during deletion :: " {"message":{"code":392,"desc":"Temporary failure received from server. Try again later","name":"LCB_ETMPFAIL"},"stack":"Error\n at OnUpdate (test_update_2.js:10:35)"} " for id " "todelete22::63364"

      A work around

      In the UI simply pause then resume every 7 million rows.  This proves that Eventing can process the data with 64 workers but something odd is happening where we don't honor some sort of resource constraint.

      I believe I also have no issues if I set the workers down form sixty-four (64) to just three (3) workers

      I have prepared a video showing exactly how it fails hopefully the video and the uploaded Eventing function will help track down the root cause.



        1. MB-40498_test.zip
          22 kB
        2. MB-40498.logs.tar.gz
          1.34 MB
        3. test_update_2.json
          1 kB

        Issue Links

          For Gerrit Dashboard: MB-40498
          # Subject Branch Project Status CR V



              chanabasappa.ghali Chanabasappa Ghali
              jon.strabala Jon Strabala
              0 Vote for this issue
              8 Start watching this issue



                Gerrit Reviews

                  There are no open Gerrit changes
