Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-29184

CouchRocks: R-0 Rebalance tests show very slow Backfill

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Major
    • backlog
    • master
    • couchbase-bucket
    • None

    Description

      The R-0 Swap Rebalance test http://perf.jenkins.couchbase.com/job/hera-hidd/245 shows a very slow rebalance progression (8% completed in approximately 9 hours, then I aborted the test).
      The baseline Couchstore test http://perf.jenkins.couchbase.com/job/hera-hidd/242 performs a full rebalance in approximately 3 hours.

      The test scenario is:

      • 3 KV nodes
      • 1 bucket, 1 replica
      • First: Load Phase of 2 billion items
      • Second: Access Phase (90% reads, 5% creates, 5% updated) + Swap Rebalance of 1 node (swap-out 172.23.96.117, swap-in 172.23.96.120)

      On the Producer (172.23.96.117), I see that the BackfillManagerTask is constantly very slow during the entire rebalance (from T14:35 to T23:29):

      2018-04-10T14:35:17.216210Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 1442 ms
      2018-04-10T14:35:18.522261Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 1306 ms
      2018-04-10T14:35:19.929839Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 1407 ms
      2018-04-10T14:35:21.315669Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 1385 ms
      2018-04-10T14:35:22.765742Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 1450 ms
      2018-04-10T14:35:24.195643Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 1429 ms
      2018-04-10T14:35:25.928889Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_0: 1733 ms
      ..
      2018-04-10T23:29:53.839544Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_2: 753 ms
      2018-04-10T23:29:54.535201Z WARNING (bucket-1) Slow runtime for 'Backfilling items for a DCP Connection' on thread auxIO_worker_2: 695 ms
      

      In 'kv_trace.json' (which contains the Phosphor event tracing), I see that the slow runtime of the BackfillManagerTask is caused by the slow runtime of the RocksDBKVStore::scan() function, e.g.:

      {"name":"BackFillManagerTask","cat":"ep-engine\/task","ph":"B","ts":50446059825,"pid":5677,"tid":5824,"args":{}},
      {"name":"scan","cat":"RocksDBKVStore","ph":"B","ts":50446059827,"pid":5677,"tid":5824,"args":{"vbid":120,"startSeqno":11628545}},
      {"name":"scan","cat":"RocksDBKVStore","ph":"E","ts":50447374056,"pid":5677,"tid":5824,"args":{}},
      {"name":"BackFillManagerTask","cat":"ep-engine\/task","ph":"E","ts":50447374058,"pid":5677,"tid":5824,"args":{}},
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            owend Daniel Owen
            paolo.cocchi Paolo Cocchi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty