Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.0.0
Affects Version/s: 4.0.0
Component/s: XDCR
Security Level: Public
Labels:
- performance
Environment:
centOS 6.x

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
C1
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-05-13T233340-ns_1%40127.0.0.1.zip

C2

https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-05-13T232039-ns_1%4010.3.4.188.zip
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-05-13T232039-ns_1%4010.3.4.189.zip

Show
C1 https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-05-13T233340-ns_1%40127.0.0.1.zip C2 https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-05-13T232039-ns_1%4010.3.4.188.zip https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-05-13T232039-ns_1%4010.3.4.189.zip
Is this a Regression?:
Unknown

Description

Build

4.0.0-2109

Found during manual testing.

1.C1 [.186] --> C2[.188], existing default buckets, replication
2. Rebalance-in .189 on C2.
3. In parallel, start load on C1.
Replication stops with error NOT_MY_VBUCKET

Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default failed. err=map[xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_1:Fatal error when receiving responses from memcached in target cluster.]

This error was seen at 16:16:47 - Wed May 13, 2015

Then pipeline was constructed -
Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default started running. xdcr000 ns_1@127.0.0.1 16:17:00 - Wed May 13, 2015

4. Rebalance completed at 16:19:06 - Wed May 13, 2015

Rebalance completed successfully.

ns_orchestrator001	ns_1@10.3.4.188	16:19:06 - Wed May 13, 2015

5. Although pipeline was constructed at 16:17:00, for next 10 mins there was no replication between .186 and .188 until the next error message is reported on .186 at 16:27:04 - Wed May 13, 2015

Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default failed. err=map[xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_1:Xmem is stuck]	xdcr000	ns_1@127.0.0.1	16:27:04 - Wed May 13, 2015

6. Replication then starts on .186 @
Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default started running. xdcr000 ns_1@127.0.0.1 16:27:17 - Wed May 13, 2015

Questions
---------
1. Why is C1 reporting xmem stuck, although target cluster rebalance completed 8 mins earlier?
2. Why does it take 10 mins to report Xmem is stuck?

Attaching cbcollect from .186 and .188

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aruna Piravi (Inactive)

Reporter:: Aruna Piravi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/May/15 4:54 PM

Updated:: 25/Oct/23 8:25 PM

Resolved:: 19/May/15 1:56 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-14973 fix xmem stuck issue after rebalance: Gerrit Review:

MB-14973 fix xmem stuck issue after rebalance: Gerrit Review:

GoXDCR: It takes 10 mins of no replication to detect xmem is stuck

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty