Details
-
Bug
-
Resolution: Won't Fix
-
Critical
-
None
-
3.0, 4.0.0, 4.1.0
-
Security Level: Public
-
None
-
Untriaged
-
Yes
Description
Doing some load testing with 3.0 exposed a potentially serious performance issue during rebalance.
When a client (in this case libcouchbase via pillowfight) receives the first "not my vbucket", it took nearly 200ms before it was able to get a config via carrier publication that had the correct vbucket in it.
From the application's perspective, 2.5.1 had only a few 'ms' delay on each missed operation during rebalance, and with 3.0 it is receiving up to 200ms. We need further investigation, but it seems like a regression.
Logs from a pillowfight run:
1 15408ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
|
2 15408ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
|
3 15408ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f4318067b90 to retry queue. Try count=1
|
4 15408ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 14 ms
|
5 15408ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
|
6 15408ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
|
7 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
|
8 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
|
9 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
|
10 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
|
11 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
|
12 15408ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
|
13 15424ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f4318067b90 (S=31359). VBID=550
|
14 15426ms [I0] {3376} [INFO] (confmon - L:161) Not applying configuration received via CCCP. No changes detected
|
15 15426ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f4318070370 to retry queue. Try count=2
|
16 15426ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 29 ms
|
17 15460ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f4318070370 (S=31359). VBID=550
|
18 15461ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
|
19 15461ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
|
20 15461ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f43180f4c20 to retry queue. Try count=3
|
21 15461ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 44 ms
|
22 15461ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
|
23 15461ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
|
24 15461ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
|
25 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
|
26 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
|
27 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
|
28 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
|
29 15462ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
|
30 15507ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f43180f4c20 (S=31359). VBID=550
|
31 15509ms [I0] {3376} [INFO] (confmon - L:161) Not applying configuration received via CCCP. No changes detected
|
32 15509ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f4318067c10 to retry queue. Try count=4
|
33 15509ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 59 ms
|
34 15570ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f4318067c10 (S=31359). VBID=550
|
35 15572ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
|
36 15572ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
|
37 15572ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f43180f50d0 to retry queue. Try count=5
|
38 15572ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 74 ms
|
39 15572ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
|
40 15572ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
|
41 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
|
42 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
|
43 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
|
44 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
|
45 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
|
46 15572ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
|
47 15648ms [I0] {3376} [WARN] (server - L:125) <ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210> (SRV=0xc52090,IX=1) NOT_MY_VBUCKET. Packet=0x7f43180f50d0 (S=31359). VBID=550
|
48 15649ms [I0] {3376} [INFO] (confmon - L:169) Setting new configuration. Received via CCCP
|
49 15649ms [I0] {3376} [INFO] (bootstrap - L:135) Got async step callback..
|
50 15649ms [I0] {3376} [DEBUG] (retryq - L:305) Adding PKT=0x7f43180704c0 to retry queue. Try count=6
|
51 15649ms [I0] {3376} [TRACE] (retryq - L:167) Next tick in 89 ms
|
52 15649ms [I0] {3376} [DEBUG] (bootstrap - L:38) Instance configured!
|
53 15649ms [I0] {3376} [INFO] (newconfig - L:59) Config Diff: [ vBuckets Modified=1 ], [Sequence Changed=0]
|
54 15649ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-204-236-170-174.us-west-1.compute.amazonaws.com:11210 (0xc51100). OldIndex=0. NewIndex=0
|
55 15649ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-116-129.us-west-1.compute.amazonaws.com:11210 (0xc52090). OldIndex=1. NewIndex=1
|
56 15649ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-171-142.us-west-1.compute.amazonaws.com:11210 (0xc530a0). OldIndex=2. NewIndex=2
|
57 15650ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-176-213-125.us-west-1.compute.amazonaws.com:11210 (0xc540b0). OldIndex=3. NewIndex=3
|
58 15650ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-177-26-252.us-west-1.compute.amazonaws.com:11210 (0xc550c0). OldIndex=4. NewIndex=4
|
59 15650ms [I0] {3376} [INFO] (newconfig - L:148) Reusing server ec2-54-219-88-7.us-west-1.compute.amazonaws.com:11210 (0xc560d0). OldIndex=5. NewIndex=5
|
Attachments
Issue Links
- relates to
-
JCBC-896 Default backoff policy should backoff much more aggressively
- Resolved
For Gerrit Dashboard: MB-12268 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
41746,2 | MB-12268: Adding a debug log, when clusterConfig is updated | 3.0.1 | ep-engine | Status: MERGED | +2 | +1 |
41824,2 | MB-12268: Adding a debug log, when clusterConfig is updated | master | ep-engine | Status: MERGED | +2 | +1 |
41840,3 | MB-12268: Restore 'incorrect_master' functionality | master | libcouchbase | Status: MERGED | +2 | +1 |