Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40162

Intermittent return of ERR_KEY_NOT_FOUND when trying to insert a doc using new CreateAsDeleted flag

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • No
    • KV Sprint 2020-June

    Description

      Summary

      The ability to insert documents directly in a tombstone state, through the Sub-Document API, was added recently for 6.6 in MB-37374.

      There's an bug where sometimes this operation results in an ERR_KEY_NOT_FOUND error being returned from memcached.  Since I'm inserting the document (which does not already exist, either in tombstone or regular form), this error is unexpected.

      Attached is a heisenbug.pcapng containing two packets (request and response) showing the bug in action.

      Impact

      This effectively stops transactions working, or at least those that try to insert documents.

      Replicating

      I have a Windows cluster running 6.6.0 build 7785, running a single node locally on my dev machine, which currently replicates the issue 100% of the time, when running the transactional test-suite.  As far as I can tell, every test that does an insert, hits the bug.  The pcap above was taken from this cluster.  The cluster has been failing in this way for 2-3 days now and I think (I've been trying so many things on this bug that I'm no longer certain of anything...) that it was passing these same tests before that.  It persists after restarting my machine.  There is nothing interesting that I can see in the memcached logs.  Let me know if there's any diagnostics I can supply from this machine.

      I tried to replicate in a simpler form, using just SDK code.  This Java https://gist.github.com/programmatix/a31778394b9f41db9233ed4112924427 creates a packet that is essentially identical to the failing one in the pcap - the only thing that changes, that I can see, are some UUIDs.  But despite sending to the same cluster above that always fails (currently) - it always succeeds.  This really puzzles me.  The only thing transactions will be doing differently to this code, is creating an ATR entry just prior to the insert - on the same vbucket as the insert.  I'll try adding this to the replication code.

      I have a Linux cluster_run (compiled yesterday from master), which I do not see the bug against.  I'll keep an eye for it while working on the transaction testing.  If I can replicate it there, I'll add some printfs to memcached and try to get some more diagnostics.

      Paolo Cocchi has also tried to replicate the bug, unsuccessfully so far.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              graham.pople Graham Pople
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty