Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-21650

Changes for MB-20852 (set_vbucket is raceful) caused ~30% drop in KV throughput

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • Yes

    Description

      All cbc-pillowfight cases (read-heavy and mixed, 512 and 1024 bytes) indicate a significant drop in KV throughput after these changes:

      CHANGELOG for ep-engine
       
       * Commit: 66882e8dcd1c0952dc63f6608c04d6b29c39c86c (in build: 3441)
         Author: Dave Rigby
         MB-20852 [17/N]: Serialize VB state changes
         
         Background/Problem:
         
         MB-20852 exposed an issue with how VBucket state was persisted to disk
         - specifically that state can be persisted to disk out of order, and
         intermediate state changes could be dropped (not persisted at all).
         
         These problems are caused by the asynchronous tasks which are
         responsible for persisting state to disk -
         VBStatePersistTask{Low,High}. There are essentially two interrelated
         issues:
         
         1. We allow multiple tasks which persist VBState to exist concurrently
            (VBStatePersistTask{Low,High}, Flusher,
            VBSnapshotTask{Low,High}. Furthermore, multiple instances of the
            same task (e.g. VBStatePersistTask) can also exist concurrenlty.
         
         2. We do not maintain a queue of states to persist, instead we just
            mark that a given vbid's state is pending.
         
         (1) Can occur when a VBPersistTask has started to run on a BG writer
             thread (and has cleared the schedule_vbstate_persist flag), but
             then another scheduleVBStatePersist() call is made. As the
             schedule_vbstate_persist flag is clear, this second task is
             allowed to be created. There is then no guarantee which task will
             complete first (they could be scheduled to different OS writer
             threads, the first thread could be descheduled and the second one
             then runs and completes first).
         
             We could attempt to solve this by changing when
             schedule_vbstate_persist is cleared (say move it later in the
             persistVBState() function), but then the inverse problem is
             exposed - we may fail to schedule a second (different) state to be
             persisted if the current task is just finishing up (and will exit
             without persisting the now-outstanding work).
         
         (2) Presents a subtle problem relating to when the state of a VBucket
             is materialized. As we only record the vbid to persist (and not
             the state), by the time the VBucketPersistTask runs the actual
             state we /wanted/ to write may have moved forward. Even worse, the
             state could have "gone backwards" (as shown in the MB) if the
             state of the VBucket is read before the vbucket is deleted,
             meaning the task has a 'stale' view of the VBucket object (due to
             us using ref-counted pointers for VBucket objects).
         
             Additionally, not persisting all intermediate states makes
             debugging harder. We don't actually change VBucket state /that/
             often, and so having all the intermediate states a VBucket went
             through on disk is extremely valuable in debugging.
         
         Solution:
         
         Instead of having multiple different tasks which can persist state
         (and attempting to manage when they are created / when they run / what
         state they persist), we instead use a single task (the Flusher task)
         to persist state for a given vBucket. Note the Flusher *already*
         persists vbucket state if necessary during commit (see
         EventuallyPersistentStore::flushVBucket), so this path just adopts the
         Flusher as the canonical Task to perform vbstate persistence.
         
         To ensure that state is persisted even when there are no outstanding
         'normal' items in the vbucket's checkpoint queue, we use the new
         queue_op::set_vbucket_state meta-item to signify that a persist is
         pending (and to essentially make the pending items queue non-empty so
         the flusher will run).
         
         After this patch all the other Tasks which used to persist vbucket
         state are redundent - a subsequent patch will remove them.
         
         Change-Id: Ic44360a1dd14fb882ebfa6f28f4ccfe6127d17a8
         Reviewed-on: http://review.couchbase.org/69150
         Reviewed-by: Jim Walker 
         Tested-by: buildbot 
       
       
       * Commit: 941c24b6fb201686a445503c566c8fc9a865a5b7 (in build: 3441)
         Author: Dave Rigby
         MB-20852 [13/N]: Checkpoint: Add getNumMetaItems() method
         
         Add a new method to Checkpoint which returns the number of metaItems
         in the checkpoint.
         
         Initially this just returns a fixed value of 1 if the checkpoint is
         open, or 2 if closed, as this matches the current checkpoint meta item
         usage.
         
         However subsequent patches will modify this to track the count of how
         many meta items actually are in the checkpoint and hence allow us to
         determine the meta item count when n queue_op::set_vbstate is added.
         
         Change-Id: I411f2e97e16f9b11ac19a1b7165e4767d09f37d1
         Reviewed-on: http://review.couchbase.org/69018
         Reviewed-by: Trond Norbye 
         Tested-by: buildbot 
       
       
       * Commit: bed33dbde2a2acc56ebadac39b61c53c1ddbdcb2 (in build: 3441)
         Author: Dave Rigby
         MB-20852 [14/N]: Improve debug/logging in CheckpointManager
         
         Include information on the CheckpointCursors associated with a
         CheckpointManager by adding a operator<< for CheckpointCursor. Add a
         dump() method to CheckpointManager to assist in debugging their
         contents (e.g. from gdb).
         
         Also add some additional CheckpointManager/CouchKVStore logging.
         
         Change-Id: I8c3de5b5ec0e8e297db8530dee87ac0edd869a91
         Reviewed-on: http://review.couchbase.org/69021
         Reviewed-by: Trond Norbye 
         Tested-by: buildbot 
       
       
       * Commit: 3fa4aa0a7a27067fa55fa4658d2a8a126321829e (in build: 3441)
         Author: Dave Rigby
         MB-20852 [16/N]: Add queue_op::set_vbucket_state meta-item
         
         Add a new meta-item to queue_op enum - set_vbucket_state. This will be
         used to mark that a VBucket should have it's state persisted.
         
         Change-Id: I108befd70933962d262529ee230382e47c64e31a
         Reviewed-on: http://review.couchbase.org/69149
         Reviewed-by: Trond Norbye 
         Tested-by: buildbot 
       
       
       * Commit: ba75d06f48f29e85c9eda66434ce70aa36b81471 (in build: 3441)
         Author: Dave Rigby
         MB-20852 [15/N]: Accurately track meta items within checkpoints
         
         Instead of assuming that a Checkpoint only contains 1 (Open) or 2
         (Closed) meta-items, maintain a count of items within each Checkpoint,
         and track how many meta-items a CheckpointCursor has read.
         
         This allows us to support an arbitrary number of meta-items within a
         Checkpoint, and in any sequence. This feature will be used to add
         support for set_vbstate meta-items in a subsquent patches.
         
         Change-Id: I8fb3040cbe64e316aae1f693afee65001b2b4b17
         Reviewed-on: http://review.couchbase.org/69020
         Reviewed-by: Trond Norbye 
         Tested-by: buildbot 
      
      

      An example of workload:

      cbc-pillowfight --spec couchbase://172.23.96.117/bucket-1 --password password --batch-size 1000 --num-items 1000000 --num-threads 50 --num-cycles 40000 --min-size 512 --max-size 512 --set-pct 20
      

      It should be easy to navigate to the graphs and logs from ShowFast.

      Good run (build 3440)

      Bad run (build 3441)

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-21650
          # Subject Branch Project Status CR V

          Activity

            People

              pavelpaulau Pavel Paulau (Inactive)
              pavelpaulau Pavel Paulau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty