Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, Elixir, 7.1.3, 7.2.1, 7.1.5, 7.2.2, 7.1.6, 7.2.3
-
Untriaged
-
1
-
Unknown
Description
Note: This is uncovered due to MB-52547. This normally is not something of concern, explained below.
As part of collection, XMEM now recycle objects to be reused.
When XMEM receives a topology change, XDCR will recycle the object, but never evict the buffer. This means that the retry timer will kick in and try to resend the object.
By this point, the object has already been recycled. This means all the opcodes, document key, has already been cleared. The resend mechanism will retrieve the position from the buffer, and try to resend the request. This presents two issues:
- The request's opcode is no longer SET_WITH_META, but GET (0x00)
- The request's doc Key is nil
Fortunately, under normal circumstances, this corrupted GET command is issued to the target node, and another NOT_MY_VBUCKET response is returned. This re-occurs until pipeline restarts. (The retry is also a waste of resource). So, the corruption usually doesn't show up in production.
For MB-52547, the issue happens where the VBUCKET is actually valid, and KV returns "NOT_BUCKET" instead. And the resend actually issues a completely empty GET command, resulting in some odd error response back.
This needs to be fixed for correctness. Another MB will be opened to address the assumption that "NOT_BUCKET" is the same as "NOT_MY_VBUCKET".
Attachments
For Gerrit Dashboard: MB-60859 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
206011,2 | MB-60859: evict item from buffer for topology errors | neo | goxdcr | Status: MERGED | +2 | +1 |