Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-23163

Rebalance deadlock with busy writer threads and no traffic

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 5.0.0
    • 5.0.0
    • couchbase-bucket
    • None
    • Untriaged
    • Unknown

    Description

      3 node Spock cluster built from master 3rd March 2017. Running a heavy pillowfight workload and rebalance to do some profiling of the threads. Noticed that rebalance completely halted, even after killing the client workload, there was no DCP traffic (many successive DCP backoffs) and no disk activity, despite all 4 writer threads being pegged at 100% CPU.
      Logs captured while rebalance was running:
      https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-1.c.cb-googbench-101.internal.zip
      https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-2.c.cb-googbench-101.internal.zip
      https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-3.c.cb-googbench-101.internal.zip
      Also attached a perf trace of a writer thread at the time.

      Couple of questions:
      why the (perceived) rebalance hang / why was there no forward progress?
      (ii) what were the writer threads doing using 100% CPU whilst there was no write traffic)

      Attachments

        1. perf-report
          732 kB
          David Haikney

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              dhaikney David Haikney (Inactive)
              dhaikney David Haikney (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty