Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40114

Volume: Timers not getting fired for pause/resume

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Build: 6.6.0-7834, passed on 6.6.0-7781

      Test step after it fails has data loading, pause handler rebalance out 2 nodes resume handlers

      [2020-06-23T16:31:10-07:00, sequoiatools/gideon:947169] kv --ops 1000000 --create 40 --get 60 --sizes 64 --hosts 172.23.104.16 --bucket bucket_op
       
      [2020-06-23T16:31:15-07:00, sequoiatools/eventing:6.5:c5aa8b] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op.json Administrator password pause true
      [2020-06-23T16:31:32-07:00, sequoiatools/eventing:6.5:bf2138] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op_timer.json Administrator password pause true
      [2020-06-23T16:31:49-07:00, sequoiatools/eventing:6.5:ca2e41] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op_curl.json Administrator password pause true
      [2020-06-23T16:32:06-07:00, sequoiatools/eventing:6.5:06bdf6] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op_sbm.json Administrator password pause true
      [2020-06-23T16:32:22-07:00, sequoiatools/eventing:6.5:2305f9] /eventing.py 172.23.104.19 8096 mad-hatter/n1ql_op.json Administrator password pause true
       
      [2020-06-23T16:33:13-07:00, sequoiatools/couchbase-cli:6.5:82fcb8] rebalance -c 172.23.104.16:8091 --server-remove 172.23.104.17,172.23.104.23 -u Administrator -p password
      [2020-06-23T16:48:02-07:00, sequoiatools/cmd:b74953] 60
       
      [2020-06-23T16:49:18-07:00, sequoiatools/eventing:6.5:9cb944] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op.json Administrator password resume true
      [2020-06-23T16:49:46-07:00, sequoiatools/eventing:6.5:706b91] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op_timer.json Administrator password resume true
      [2020-06-23T16:50:14-07:00, sequoiatools/eventing:6.5:96db2c] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op_curl.json Administrator password resume true
      [2020-06-23T16:50:42-07:00, sequoiatools/eventing:6.5:f74039] /eventing.py 172.23.104.19 8096 mad-hatter/bucket_op_sbm.json Administrator password resume true
      [2020-06-23T16:51:10-07:00, sequoiatools/eventing:6.5:b0c5e9] /eventing.py 172.23.104.19 8096 mad-hatter/n1ql_op.json Administrator password resume true
      [2020-06-23T16:51:38-07:00, sequoiatools/cmd:9578e3] 1800
      [2020-06-23T17:21:44-07:00, sequoiatools/eventing:6.5:6989aa] /eventing_validator.py 172.23.104.16 Administrator password bucket_op bucket_op_dst 600 60 True
      [2020-06-23T17:21:50-07:00, sequoiatools/eventing:6.5:4b3d85] /eventing_validator.py 172.23.104.16 Administrator password bucket_op timer_op_dst 1200 60 True
      → 
       
       
      Error occurred on container - sequoiatools/eventing:6.5:[/eventing_validator.py 172.23.104.16 Administrator password bucket_op timer_op_dst 1200 60 True]
       
       
      docker logs 4b3d85
      docker start 4b3d85
       
       
      xNo of docs in source and destination : Source Bucket(bucket_op) : 34908567, Destination Bucket(timer_op_dst) : 34908481 

      Metadata docs

      select RAW count(0) from metadata where 
        meta().id like 'eventing:%:cx%’ —   85
       
       
      select RAW count(0) from metadata where 
        meta().id like 'eventing:%:al%' —  72 

      Number of document 

      Source - 37,285,528

      Destination -37,285,442

      missing doc - 86

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-6.6.0-7872 contains eventing-ee commit 3b107a3 with commit message:
            MB-40114 : Handle the race between timer creation & scan when creation takes a long time due to load

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7872 contains eventing-ee commit 3b107a3 with commit message: MB-40114 : Handle the race between timer creation & scan when creation takes a long time due to load

            Not seen on 6.6.0-7872

            vikas.chaudhary Vikas Chaudhary added a comment - Not seen on 6.6.0-7872
            jeelan.poola Jeelan Poola added a comment -

            Vikas Chaudhary One small simple change which adds a 200msec delay when a KV op fails while creating timers is added to ensure we do not overload an already overloaded system. This was already happening before partitioning scheme changes. So it is a necessary change. Hence re-opening and resolving this issue. Would be great if you can re-run and verify one last time on a build with the change. Thanks a lot in advance!!

            jeelan.poola Jeelan Poola added a comment - Vikas Chaudhary One small simple change which adds a 200msec delay when a KV op fails while creating timers is added to ensure we do not overload an already overloaded system. This was already happening before partitioning scheme changes. So it is a necessary change. Hence re-opening and resolving this issue. Would be great if you can re-run and verify one last time on a build with the change. Thanks a lot in advance!!

            Build couchbase-server-6.6.0-7877 contains eventing-ee commit ad008ec with commit message:
            MB-40114 : Add a small delay before retrying KV op just like it happens in LcbRetryCommand()

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7877 contains eventing-ee commit ad008ec with commit message: MB-40114 : Add a small delay before retrying KV op just like it happens in LcbRetryCommand()

            verified on 6.6.0-7877

            vikas.chaudhary Vikas Chaudhary added a comment - verified on 6.6.0-7877

            People

              jeelan.poola Jeelan Poola
              vikas.chaudhary Vikas Chaudhary
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty