Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.0
-
Untriaged
-
1
-
No
-
KV Sprint 2020-June
Description
Summary
The ability to insert documents directly in a tombstone state, through the Sub-Document API, was added recently for 6.6 in MB-37374.
There's an bug where sometimes this operation results in an ERR_KEY_NOT_FOUND error being returned from memcached. Since I'm inserting the document (which does not already exist, either in tombstone or regular form), this error is unexpected.
Attached is a heisenbug.pcapng containing two packets (request and response) showing the bug in action.
Impact
This effectively stops transactions working, or at least those that try to insert documents.
Replicating
I have a Windows cluster running 6.6.0 build 7785, running a single node locally on my dev machine, which currently replicates the issue 100% of the time, when running the transactional test-suite. As far as I can tell, every test that does an insert, hits the bug. The pcap above was taken from this cluster. The cluster has been failing in this way for 2-3 days now and I think (I've been trying so many things on this bug that I'm no longer certain of anything...) that it was passing these same tests before that. It persists after restarting my machine. There is nothing interesting that I can see in the memcached logs. Let me know if there's any diagnostics I can supply from this machine.
I tried to replicate in a simpler form, using just SDK code. This Java https://gist.github.com/programmatix/a31778394b9f41db9233ed4112924427 creates a packet that is essentially identical to the failing one in the pcap - the only thing that changes, that I can see, are some UUIDs. But despite sending to the same cluster above that always fails (currently) - it always succeeds. This really puzzles me. The only thing transactions will be doing differently to this code, is creating an ATR entry just prior to the insert - on the same vbucket as the insert. I'll try adding this to the replication code.
I have a Linux cluster_run (compiled yesterday from master), which I do not see the bug against. I'll keep an eye for it while working on the transaction testing. If I can replicate it there, I'll add some printfs to memcached and try to get some more diagnostics.
Paolo Cocchi has also tried to replicate the bug, unsuccessfully so far.
Attachments
Issue Links
- relates to
-
MB-37374 Implement support so Transactions do not need to create visible temporary docs
- Closed