Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36380

Lost wakeup can delay Flusher up to 10s

    XMLWordPrintable

Details

    • Triaged
    • Unknown
    • KV-Engine Mad-Hatter GA

    Description

      Summary

      During investigation of SyncWrite persistTo performance, is was observed that the op/s for a single-threaded persist_majority workload would drop to zero for a number of seconds, recovering after 10s.

      Steps to Reproduce

      1. Start a two-node cluster run:

        ./cluster_run -n2 --dont-rename
        ./cluster_connect -n2
        

      2. Drive with a SyncWrite workload (SyncWrites not strictly necessary, but makes the bug very apparent as delay in flusher will cause op/s to go to zero). See attached sync_repl.py script.

        ./engines/ep/management/sync_repl.py localhost:12000 Administrator asdasd default loop_setD key value 100000 3
        

      3. Observe op/s on UI.

      Expected Results

      Op/s should be broadly constant, with only small (sub-second) drops in op/s when compaction runs.

      Actual Results

      Op/s drops to zero:

      During such "blips", Slow operation log messages were observed, each time the "slow" operation took very close to 4 or 8 seconds:

      ‡ grep "Slow operation" logs/n_0/memcached.log.00000*
      logs/n_0/memcached.log.000000.txt:2019-10-08T09:51:14.753174+01:00 WARNING 48: Slow operation. {"cid":"[::1]:50067/b2034b5a","duration":"4003 ms","trace":"request=378423659125160:4003656 store=378423659139440:15 sync_write.ack_remote=378423660320419:0 sync_write.ack_local=378427662099819:0 sync_write.prepare=378423659147291:4002977 store=378427662773987:0","command":"SET","peer":"[::1]:50067","bucket":"default"}
      logs/n_0/memcached.log.000000.txt:2019-10-08T09:51:23.697168+01:00 WARNING 48: Slow operation. {"cid":"[::1]:50067/63159010","duration":"4005 ms","trace":"request=378432601417239:4005461 store=378432601426289:20 sync_write.ack_remote=378432603242164:0 sync_write.ack_local=378436606766192:0 sync_write.prepare=378432601435464:4005363 store=378436606870788:1","command":"SET","peer":"[::1]:50067","bucket":"default"}
      48: Slow operation. {"cid":"[::1]:50312/747df525","duration":"8005 ms","trace":"request=379458120957691:8005773 store=379458120963442:16 sync_write.ack_local=379458121688761:0 sync_write.ack_remote=379466126503925:0 sync_write.prepare=379458120968581:8005592 store=379466126719482:1","command":"SET","peer":"[::1]:50312","bucket":"default"}
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            drigby Dave Rigby added a comment -

            Note this is a lot more pronounced when operating on a single vBucket (as is the case in the above script).

            The problem is a race between the Flusher completing a flush iteration and deciding if it needs to snooze or not, and a front-end thread notifying there's a new mutation to be flushed:

            Flusher::step

                ...
                case State::Running:
                    flushVB();
                    if (_state == State::Running) {
                        double tosleep = computeMinSleepTime();
            // *** notifyFlushEvent() called here ***
                        if (tosleep > 0) {
                            task->snooze(tosleep);
                        }
                 ...
            

            • computeMinSleepTime primarily checks if Flusher::pendingMutation is false, if that is the case then the flusher can be put to sleep for a period (between 2 and 10 seconds depending on when it last snoozed).
            • notifyFlushEvent does a compare-exchange on Flusher::pendingMutation, if successful (false -> true) then it calls wake().

            Consider the scenario where computeMinSleepTime has returned a non-zero value (i.e. no pending work currently), but immediately before task->snooze() is called, notifyFlushEvent() is called by a front-end thread. That will wake up the task, however the snooze immediately below will effectively cancel the wake up, and the task will be put to sleep.

            If no additional notifyFlushEvent() calls occur, then the Flusher will not be run to actually flush the outstanding work until the snooze() expires, up to 10s.

            drigby Dave Rigby added a comment - Note this is a lot more pronounced when operating on a single vBucket (as is the case in the above script). The problem is a race between the Flusher completing a flush iteration and deciding if it needs to snooze or not, and a front-end thread notifying there's a new mutation to be flushed: Flusher::step ... case State::Running: flushVB(); if (_state == State::Running) { double tosleep = computeMinSleepTime(); // *** notifyFlushEvent() called here *** if (tosleep > 0) { task->snooze(tosleep); } ... computeMinSleepTime primarily checks if Flusher::pendingMutation is false, if that is the case then the flusher can be put to sleep for a period (between 2 and 10 seconds depending on when it last snoozed). notifyFlushEvent does a compare-exchange on Flusher::pendingMutation , if successful (false -> true) then it calls wake(). Consider the scenario where computeMinSleepTime has returned a non-zero value (i.e. no pending work currently), but immediately before task->snooze() is called, notifyFlushEvent() is called by a front-end thread. That will wake up the task, however the snooze immediately below will effectively cancel the wake up, and the task will be put to sleep. If no additional notifyFlushEvent() calls occur, then the Flusher will not be run to actually flush the outstanding work until the snooze() expires, up to 10s.
            drigby Dave Rigby added a comment -

            Not yet confirmed, but this probably also affects 5.5.x (and earlier...)

            drigby Dave Rigby added a comment - Not yet confirmed, but this probably also affects 5.5.x (and earlier...)
            drigby Dave Rigby added a comment - - edited

            With proposed fix, the same test results in the following op/s graph:

            drigby Dave Rigby added a comment - - edited With proposed fix, the same test results in the following op/s graph:

            Build couchbase-server-6.5.0-4529 contains kv_engine commit 793f7f7 with commit message:
            MB-36380 [2/2]: Fix missing Flusher wakeup

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4529 contains kv_engine commit 793f7f7 with commit message: MB-36380 [2/2] : Fix missing Flusher wakeup

            Build couchbase-server-6.5.0-4529 contains kv_engine commit 9bde518 with commit message:
            MB-36380 [1/2]: Regression test for missing Flusher wakeup

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4529 contains kv_engine commit 9bde518 with commit message: MB-36380 [1/2] : Regression test for missing Flusher wakeup

            Build couchbase-server-7.0.0-1008 contains kv_engine commit 793f7f7 with commit message:
            MB-36380 [2/2]: Fix missing Flusher wakeup

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1008 contains kv_engine commit 793f7f7 with commit message: MB-36380 [2/2] : Fix missing Flusher wakeup

            Build couchbase-server-7.0.0-1008 contains kv_engine commit 9bde518 with commit message:
            MB-36380 [1/2]: Regression test for missing Flusher wakeup

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1008 contains kv_engine commit 9bde518 with commit message: MB-36380 [1/2] : Regression test for missing Flusher wakeup

            Verified on madhatter build and now the doc ops are not dropping to 0 after running the mentioned steps in the description.

            ritesh.agarwal Ritesh Agarwal added a comment - Verified on madhatter build and now the doc ops are not dropping to 0 after running the mentioned steps in the description.

            People

              ritesh.agarwal Ritesh Agarwal
              drigby Dave Rigby
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty