Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60859

[BP 7.2.5][XDCR] Raceful handling for topology errors

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 7.2.5
    • 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, Elixir, 7.1.3, 7.2.1, 7.1.5, 7.2.2, 7.1.6, 7.2.3
    • XDCR
    • Untriaged
    • 1
    • Unknown

    Description

      Note: This is uncovered due to MB-52547. This normally is not something of concern, explained below.

      As part of collection, XMEM now recycle objects to be reused.

      https://github.com/couchbase/goxdcr/blob/e7e96f60ac5fdbf67f8af5d9706aa905dc74dc52/parts/xmem_nozzle.go#L2687-L2691

      When XMEM receives a topology change, XDCR will recycle the object, but never evict the buffer. This means that the retry timer will kick in and try to resend the object.

      By this point, the object has already been recycled. This means all the opcodes, document key, has already been cleared. The resend mechanism will retrieve the position from the buffer, and try to resend the request. This presents two issues:

      1. The request's opcode is no longer SET_WITH_META, but GET (0x00)
      2. The request's doc Key is nil

      Fortunately, under normal circumstances, this corrupted GET command is issued to the target node, and another NOT_MY_VBUCKET response is returned. This re-occurs until pipeline restarts. (The retry is also a waste of resource). So, the corruption usually doesn't show up in production.

      For MB-52547, the issue happens where the VBUCKET is actually valid, and KV returns "NOT_BUCKET" instead. And the resend actually issues a completely empty GET command, resulting in some odd error response back.

      This needs to be fixed for correctness. Another MB will be opened to address the assumption that "NOT_BUCKET" is the same as "NOT_MY_VBUCKET".

      Attachments

        For Gerrit Dashboard: MB-60859
        # Subject Branch Project Status CR V

        Activity

          People

            ayush.nayyar Ayush Nayyar
            sumukh.bhat Sumukh Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty