Details
-
Bug
-
Resolution: Duplicate
-
Major
-
5.0.0
-
None
-
Untriaged
-
Unknown
Description
3 node Spock cluster built from master 3rd March 2017. Running a heavy pillowfight workload and rebalance to do some profiling of the threads. Noticed that rebalance completely halted, even after killing the client workload, there was no DCP traffic (many successive DCP backoffs) and no disk activity, despite all 4 writer threads being pegged at 100% CPU.
Logs captured while rebalance was running:
https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-1.c.cb-googbench-101.internal.zip
https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-2.c.cb-googbench-101.internal.zip
https://cb-engineering.s3.amazonaws.com/davidH/collectinfo-2017-03-03T162931-ns_1%40dhaikney-server-3.c.cb-googbench-101.internal.zip
Also attached a perf trace of a writer thread at the time.
Couple of questions:
why the (perceived) rebalance hang / why was there no forward progress?
(ii) what were the writer threads doing using 100% CPU whilst there was no write traffic)
Attachments
Issue Links
- duplicates
-
MB-22451 Rebalance occasionally gets stuck when adding a new node to the destination cluster
- Closed