Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40126

Subdoc CreateAsDeleted without any value path can result in non-empty value

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • No
    • KV Sprint 2020-June

    Description

      Build: 6.6.0-7834

      Test case:

      ./testrunner -i node.ini threads_to_use=8,rerun=False,skip_cleanup=True,get-cbcollect-info=False,skip_collections_cleanup=True -t Atomicity.doc_isolation.IsolationDocTest.test_staged_doc_read,nodes_init=2,num_items=1,replicas=1,transaction_timeout=30,transaction_commit=True,doc_op=create,operation=beforeAtrPending,GROUP=P0

      Cbcollect logs:
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/transaction_crash/collectinfo-2020-06-24T195940-ns_1%4010.112.191.105.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/transaction_crash/collectinfo-2020-06-24T195940-ns_1%4010.112.191.106.zip
      Scenario:

      • Two node cluster, Couchbase bucket with replica=1
      • Start transaction with expiry of 30 seconds. Forcefully fail the transaction during "beforeAtrPending" and retry

      Expected behavior:

      Failed transaction should succeed without any issues

      Service 'memcached' exited with status 134. Restarting. Messages:
      2020-06-24T12:53:01.819552-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f0c401a8000+0x8f213]
      2020-06-24T12:53:01.819559-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xcccd0]
      2020-06-24T12:53:01.819564-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xc811a]
      2020-06-24T12:53:01.819570-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xca523]
      2020-06-24T12:53:01.819576-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0x18f490]
      2020-06-24T12:53:01.819581-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xcfa4d]
      2020-06-24T12:53:01.819586-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0x12b764]
      2020-06-24T12:53:01.819590-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f0c42257000+0x8f17]
      2020-06-24T12:53:01.819594-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f0c3fa73000+0x7df3]
      2020-06-24T12:53:01.819621-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f0c3f6b2000+0xf61ad]

      Attaching test.log and  trans_expired_with_d_ambiguous.pcapng for further debugging.

      Attachments

        1. _7845_client-op.png
          _7845_client-op.png
          390 kB
        2. _7845_invalid-prepare.png
          _7845_invalid-prepare.png
          381 kB
        3. _dcp-prepare-png.png
          _dcp-prepare-png.png
          346 kB
        4. _no non-subdoc fronend op.png
          _no non-subdoc fronend op.png
          33 kB
        5. _subdoc ops only against 105.png
          _subdoc ops only against 105.png
          212 kB
        6. 6.6.0-7845_client_run_1.pcapng
          1.74 MB
        7. 6.6.0-7845_node_run_1.pcap
          4.17 MB
        8. test.log
          33 kB
        9. test.pcap
          20.11 MB
        10. trans_expired_with_d_ambiguous_new.pcapng
          2.06 MB
        11. trans_expired_with_d_ambiguous.pcapng
          2.09 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            ashwin.govindarajulu Ashwin Govindarajulu created issue -
            owend Daniel Owen made changes -
            Field Original Value New Value
            Assignee Daniel Owen [ owend ] Paolo Cocchi [ paolo.cocchi ]
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Attachment test.pcap [ 98846 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment _dcp-prepare-png.png [ 98851 ]
            Attachment _no non-subdoc fronend op.png [ 98852 ]
            Attachment _subdoc ops only against 105.png [ 98853 ]
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Attachment 6.6.0-7845_client_run_1.pcapng [ 98856 ]
            Attachment 6.6.0-7845_node_run_1.pcap [ 98857 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment _7845_client-op.png [ 98868 ]
            Attachment _7845_invalid-prepare.png [ 98869 ]
            owend Daniel Owen made changes -
            Summary [Transaction DocIsolation]: Memcached crahsed with status 134 resulting in AmbiguousTimeoutException during SubdocMutateRequest [Transaction DocIsolation]: Memcached crashed with status 134 resulting in AmbiguousTimeoutException during SubdocMutateRequest
            drigby Dave Rigby made changes -
            Assignee Paolo Cocchi [ paolo.cocchi ] Dave Rigby [ drigby ]
            drigby Dave Rigby made changes -
            Sprint KV Sprint 2020-June [ 1106 ]
            till Till Westmann made changes -
            Link This issue blocks MB-38724 [ MB-38724 ]
            till Till Westmann made changes -
            Labels Transactions durability functional-test Transactions approved-for-6.6.0 durability functional-test
            ritam.sharma Ritam Sharma made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]
            drigby Dave Rigby made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            drigby Dave Rigby made changes -
            Comment [ Issue is caused by a bug in the {{subdoc_fetch}} code when allowing execution to continue even if no existing doc is found (say when using CreateAsDeleted), _if_ the check to see if a tombstone is present needs to go to disk (i.e. the Bloom filter failed to avoid the bgFetch). This explains why this only occurs intermittently.

            In that situation, the subdoc state machine sequence for the multi-mutation from test.pcap looks like:

            # subdoc_fetch -> bucket_get(AliveOrDeleted) -> EventuallyPersistentEngine::get() -> not found in HT and bloom filter cannot determine -> EWOULDBLOCK, schedule bgFetch.
            # subdoc_fetch returns back to runloop with EWOULDBLOCK, waiting for notify_io_complete
            # bgFetch completes, returns KEY_ENOENT (no tombstone on disk) -> notify_IO_complete(fd, KEY_ENOENT).
            # subdoc_fetch called again, checks AsyncIO status code KEY_ENOENT. CreateAsDeleted specified, ok, setup empty document and return true (to continue execution).
            #* *However, the status code {{ret}}) is _not_ modified, is still KEY_ENOENT*.
            # subdoc_execute called, performs requested operations on newly-formed empty doc.
            # subdoc_update called, checks {{ret}}. Upon finding non-successful result returns early with that same status code.

            Fix is to ensure that after subdoc_fetch() succeeds (i.e. is happy with no tombstone being present), then {{ret}} is set to SUCCESS.

            There's also a tangental issue to understand why this was 100% reproducible on Windows - sounds like some bloomfilter bug.
            ]
            owend Daniel Owen made changes -
            Link This issue relates to MB-37374 [ MB-37374 ]
            drigby Dave Rigby made changes -
            Summary [Transaction DocIsolation]: Memcached crashed with status 134 resulting in AmbiguousTimeoutException during SubdocMutateRequest [Transaction DocIsolation]: DcpConsumer::prepare: (vb:971) Value cannot contain a body for SyncDelete
            drigby Dave Rigby made changes -
            Link This issue is duplicated by MB-40167 [ MB-40167 ]
            drigby Dave Rigby made changes -
            Assignee Dave Rigby [ drigby ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            drigby Dave Rigby made changes -
            Summary [Transaction DocIsolation]: DcpConsumer::prepare: (vb:971) Value cannot contain a body for SyncDelete Subdoc CreateAsDeleted without any value path can result in non-empty value
            drigby Dave Rigby made changes -
            Link This issue causes MB-40262 [ MB-40262 ]
            arunkumar Arunkumar Senthilnathan (Inactive) made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            arunkumar Arunkumar Senthilnathan (Inactive) made changes -
            Labels Transactions approved-for-6.6.0 durability functional-test Transactions approved-for-6.6.0 durability functional-test txn_testing
            owend Daniel Owen made changes -
            Link This issue relates to MB-40366 [ MB-40366 ]
            drigby Dave Rigby made changes -
            Link This issue relates to MB-40366 [ MB-40366 ]

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty