Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 3.0
Affects Version/s: 3.0
Component/s: None
Security Level: Public
Labels:
- UPR

Triage:
Untriaged
Is this a Regression?:
Unknown

Description

I've run this 2 node failure scenario:

load 100k items to nodeA
rebalance in NodeB
crash nodeA during rebalance
failover NodeA and continue to rebalance in NodeB
There will be some data loss, but stream requests return initial snapshot

Keeping track of the failover tables here is NodeA vb0 stats before crashing:

{'failovers:vb_0:0:id': '53942919883496', 'failovers:vb_0:num_entries': '1', 'failovers:vb_0:0:seq': '0'} {'vb_0:purge_seqno': '0', 'vb_0:uuid': '53942919883496', 'vb_0:high_seqno': '173'}

After NodeA is rebalanced out here are the stats on NodeB:

{'failovers:vb_0:0:id': '66362171126476', 'failovers:vb_0:num_entries': '1', 'failovers:vb_0:0:seq': '0'} {'vb_0:purge_seqno': '0', 'vb_0:uuid': '66362171126476', 'vb_0:high_seqno': '0'}

There is still data, although high_seqno has become 0.

Attempting to stream from NodeB gives:

{'status': 0, 'body': '', 'opcode': 80} {'status': 0, 'opcode': 83, 'failover_log': [(66362171126476, 0)]} {'vbucket': 0, 'opcode': 86}

There is similar behavior here to ~~MB-10947~~ except reason for sending empty_item is because CheckpointManager thinks vbucket is still in backfill phase.

I have a script to repro this but I'm adding these into testrunner.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

nslogs.tar.gz
676 kB
29/Apr/14 7:56 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Tommie McAfee (Inactive)

Reporter:: Tommie McAfee (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/Apr/14 7:56 AM

Updated:: 11/Sep/14 9:50 AM

Resolved:: 06/May/14 9:00 AM

Gerrit Reviews

There are no open Gerrit changes

producer only returns snap_start after node crash during rebalance

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty