Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
2.0-beta
-
Security Level: Public
-
None
Description
Seeing this in (reopening) logs of MB-4673. That was root cause of rebalance failure. Main bug is in ns_server (because we expect backfills from replicas in replica count > 1 case).
Still I think we should investigate possible issue in ep-engine. Or maybe incorrectness in how ns_server is ensuring replicas are built.
This happens after movement of vbucket 285 when after moving it we establish new replication chain. Second replica sees this:
[rebalance:info,2012-08-30T8:24:01.533,ns_1@10.3.121.94:<0.21104.17>:ebucketmigrator_srv:init:485]Some vbuckets were not yet ready to replicate from:
[285]
which means open checkpoint for this vbucket was either missing or 0 on first replica. Looking at first replica we indeed see this:
[ns_server:debug,2012-08-30T8:24:01.296,ns_1@10.3.121.98:<0.26188.21>:ebucketmigrator_srv:init:536]Reusing old upstream:
[
,
,
]
[rebalance:debug,2012-08-30T8:24:01.300,ns_1@10.3.121.98:<0.26188.21>:ebucketmigrator_srv:init:555]upstream_sender pid: <0.26189.21>
[rebalance:info,2012-08-30T8:24:01.301,ns_1@10.3.121.98:<0.26188.21>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 285
[ns_server:debug,2012-08-30T8:24:01.311,ns_1@10.3.121.98:<0.28617.6>:mc_connection:do_delete_vbucket:118]Notifying mc_couch_events of vbucket deletion: bucket-1/285
Because first replica replicates from new master it cannot be ahead of it on open checkpoint and because we're doing reliable replica building we expect 285 on first replica to be up-to-date when new replication chain is built. In fact logs indicate that first replica had closed checkpoint 4 when vbucket filter was changed.