Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-10994

producer only returns snap_start after node crash during rebalance

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.0
    • 3.0
    • None
    • Security Level: Public
    • Untriaged
    • Unknown

    Description

      I've run this 2 node failure scenario:

      • load 100k items to nodeA
      • rebalance in NodeB
      • crash nodeA during rebalance
      • failover NodeA and continue to rebalance in NodeB
      • There will be some data loss, but stream requests return initial snapshot

      Keeping track of the failover tables here is NodeA vb0 stats before crashing:

      {'failovers:vb_0:0:id': '53942919883496', 'failovers:vb_0:num_entries': '1', 'failovers:vb_0:0:seq': '0'} {'vb_0:purge_seqno': '0', 'vb_0:uuid': '53942919883496', 'vb_0:high_seqno': '173'}

      After NodeA is rebalanced out here are the stats on NodeB:

      {'failovers:vb_0:0:id': '66362171126476', 'failovers:vb_0:num_entries': '1', 'failovers:vb_0:0:seq': '0'} {'vb_0:purge_seqno': '0', 'vb_0:uuid': '66362171126476', 'vb_0:high_seqno': '0'}

      There is still data, although high_seqno has become 0.

      Attempting to stream from NodeB gives:

      {'status': 0, 'body': '', 'opcode': 80} {'status': 0, 'opcode': 83, 'failover_log': [(66362171126476, 0)]} {'vbucket': 0, 'opcode': 86}

      There is similar behavior here to MB-10947 except reason for sending empty_item is because CheckpointManager thinks vbucket is still in backfill phase.

      I have a script to repro this but I'm adding these into testrunner.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tommie Tommie McAfee (Inactive)
            tommie Tommie McAfee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty