Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.0.0
-
Security Level: Public
-
centOS 6.x
-
Untriaged
-
-
Unknown
Description
Build
4.0.0-2109
Found during manual testing.
1.C1 [.186] --> C2[.188], existing default buckets, replication
2. Rebalance-in .189 on C2.
3. In parallel, start load on C1.
Replication stops with error NOT_MY_VBUCKET
Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default failed. err=map[xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_1:Fatal error when receiving responses from memcached in target cluster.]
|
This error was seen at 16:16:47 - Wed May 13, 2015
Then pipeline was constructed -
Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default started running. xdcr000 ns_1@127.0.0.1 16:17:00 - Wed May 13, 2015
4. Rebalance completed at 16:19:06 - Wed May 13, 2015
Rebalance completed successfully.
|
ns_orchestrator001 ns_1@10.3.4.188 16:19:06 - Wed May 13, 2015
|
5. Although pipeline was constructed at 16:17:00, for next 10 mins there was no replication between .186 and .188 until the next error message is reported on .186 at 16:27:04 - Wed May 13, 2015
Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default failed. err=map[xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_1:Xmem is stuck] xdcr000 ns_1@127.0.0.1 16:27:04 - Wed May 13, 2015
|
6. Replication then starts on .186 @
Replication b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default started running. xdcr000 ns_1@127.0.0.1 16:27:17 - Wed May 13, 2015
Questions
---------
1. Why is C1 reporting xmem stuck, although target cluster rebalance completed 8 mins earlier?
2. Why does it take 10 mins to report Xmem is stuck?
Attaching cbcollect from .186 and .188