Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-23163

Rebalance deadlock with busy writer threads and no traffic

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 5.0.0
    • 5.0.0
    • couchbase-bucket
    • None
    • Untriaged
    • Unknown

    Description

      3 node Spock cluster built from master 3rd March 2017. Running a heavy pillowfight workload and rebalance to do some profiling of the threads. Noticed that rebalance completely halted, even after killing the client workload, there was no DCP traffic (many successive DCP backoffs) and no disk activity, despite all 4 writer threads being pegged at 100% CPU.
      Logs captured while rebalance was running:
      https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-1.c.cb-googbench-101.internal.zip
      https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-2.c.cb-googbench-101.internal.zip
      https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-3.c.cb-googbench-101.internal.zip
      Also attached a perf trace of a writer thread at the time.

      Couple of questions:
      why the (perceived) rebalance hang / why was there no forward progress?
      (ii) what were the writer threads doing using 100% CPU whilst there was no write traffic)

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-23163
          # Subject Branch Project Status CR V

          Activity

            People

              dhaikney David Haikney (Inactive)
              dhaikney David Haikney (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty