Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
6.6.0
-
Enterprise Edition 6.6.0 build 7834
Java SDK: Transactions 1.1.0-SNAPSHOT
-
Untriaged
-
Centos 64-bit
-
1
-
No
-
KV Sprint 2020-June
Description
Build: 6.6.0-7834
Test case:
./testrunner -i node.ini threads_to_use=8,rerun=False,skip_cleanup=True,get-cbcollect-info=False,skip_collections_cleanup=True -t Atomicity.doc_isolation.IsolationDocTest.test_staged_doc_read,nodes_init=2,num_items=1,replicas=1,transaction_timeout=30,transaction_commit=True,doc_op=create,operation=beforeAtrPending,GROUP=P0
|
Cbcollect logs:
https://cb-jira.s3.us-east-2.amazonaws.com/logs/transaction_crash/collectinfo-2020-06-24T195940-ns_1%4010.112.191.105.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/transaction_crash/collectinfo-2020-06-24T195940-ns_1%4010.112.191.106.zip
Scenario:
- Two node cluster, Couchbase bucket with replica=1
- Start transaction with expiry of 30 seconds. Forcefully fail the transaction during "beforeAtrPending" and retry
Expected behavior:
Failed transaction should succeed without any issues
Service 'memcached' exited with status 134. Restarting. Messages:
|
2020-06-24T12:53:01.819552-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f0c401a8000+0x8f213]
|
2020-06-24T12:53:01.819559-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xcccd0]
|
2020-06-24T12:53:01.819564-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xc811a]
|
2020-06-24T12:53:01.819570-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xca523]
|
2020-06-24T12:53:01.819576-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0x18f490]
|
2020-06-24T12:53:01.819581-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0xcfa4d]
|
2020-06-24T12:53:01.819586-07:00 CRITICAL /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f0c3aca6000+0x12b764]
|
2020-06-24T12:53:01.819590-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f0c42257000+0x8f17]
|
2020-06-24T12:53:01.819594-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f0c3fa73000+0x7df3]
|
2020-06-24T12:53:01.819621-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f0c3f6b2000+0xf61ad]
|
Attaching test.log and trans_expired_with_d_ambiguous.pcapng
for further debugging.
Attachments
Issue Links
- causes
-
MB-40262 A Sub-Document insertion only specifying XATTR paths no longer creates an empty JSON body
-
- Closed
-
- is duplicated by
-
MB-40167 [Transaction DocIsolation]: Expects(bodySize == 0) fails in Item::removeUserXattrs()
-
- Closed
-
- relates to
-
MB-37374 Implement support so Transactions do not need to create visible temporary docs
-
- Closed
-
For Gerrit Dashboard: MB-40126 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
131577,3 | MB-40126: subdoc CreateAsDeleted: User value shouldn't be '{}' | mad-hatter | kv_engine | Status: MERGED | +2 | +1 |
131730,2 | MB-40243: Simplify subdoc testapp_xattr | mad-hatter | kv_engine | Status: MERGED | +2 | +1 |
131807,3 | Merge branch 'mad-hatter' | master | kv_engine | Status: MERGED | +2 | +1 |
131817,4 | MB-40262: Subdoc inserts of Alive empty docs should be '{}' | mad-hatter | kv_engine | Status: MERGED | +2 | +1 |
Activity
Field | Original Value | New Value |
---|---|---|
Assignee | Daniel Owen [ owend ] | Paolo Cocchi [ paolo.cocchi ] |
Attachment | trans_expired_with_d_ambiguous_new.pcapng [ 98819 ] |
Attachment | test.pcap [ 98846 ] |
Attachment | _dcp-prepare-png.png [ 98851 ] | |
Attachment | _no non-subdoc fronend op.png [ 98852 ] | |
Attachment | _subdoc ops only against 105.png [ 98853 ] |
Attachment | 6.6.0-7845_client_run_1.pcapng [ 98856 ] | |
Attachment | 6.6.0-7845_node_run_1.pcap [ 98857 ] |
Attachment | _7845_client-op.png [ 98868 ] | |
Attachment | _7845_invalid-prepare.png [ 98869 ] |
Summary | [Transaction DocIsolation]: Memcached crahsed with status 134 resulting in AmbiguousTimeoutException during SubdocMutateRequest | [Transaction DocIsolation]: Memcached crashed with status 134 resulting in AmbiguousTimeoutException during SubdocMutateRequest |
Assignee | Paolo Cocchi [ paolo.cocchi ] | Dave Rigby [ drigby ] |
Sprint | KV Sprint 2020-June [ 1106 ] |
Link | This issue blocks MB-38724 [ MB-38724 ] |
Labels | Transactions durability functional-test | Transactions approved-for-6.6.0 durability functional-test |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Comment |
[ Issue is caused by a bug in the {{subdoc_fetch}} code when allowing execution to continue even if no existing doc is found (say when using CreateAsDeleted), _if_ the check to see if a tombstone is present needs to go to disk (i.e. the Bloom filter failed to avoid the bgFetch). This explains why this only occurs intermittently.
In that situation, the subdoc state machine sequence for the multi-mutation from test.pcap looks like: # subdoc_fetch -> bucket_get(AliveOrDeleted) -> EventuallyPersistentEngine::get() -> not found in HT and bloom filter cannot determine -> EWOULDBLOCK, schedule bgFetch. # subdoc_fetch returns back to runloop with EWOULDBLOCK, waiting for notify_io_complete # bgFetch completes, returns KEY_ENOENT (no tombstone on disk) -> notify_IO_complete(fd, KEY_ENOENT). # subdoc_fetch called again, checks AsyncIO status code KEY_ENOENT. CreateAsDeleted specified, ok, setup empty document and return true (to continue execution). #* *However, the status code {{ret}}) is _not_ modified, is still KEY_ENOENT*. # subdoc_execute called, performs requested operations on newly-formed empty doc. # subdoc_update called, checks {{ret}}. Upon finding non-successful result returns early with that same status code. Fix is to ensure that after subdoc_fetch() succeeds (i.e. is happy with no tombstone being present), then {{ret}} is set to SUCCESS. There's also a tangental issue to understand why this was 100% reproducible on Windows - sounds like some bloomfilter bug. ] |
Summary | [Transaction DocIsolation]: Memcached crashed with status 134 resulting in AmbiguousTimeoutException during SubdocMutateRequest | [Transaction DocIsolation]: DcpConsumer::prepare: (vb:971) Value cannot contain a body for SyncDelete |
Assignee | Dave Rigby [ drigby ] | Ashwin Govindarajulu [ ashwin.govindarajulu ] |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Resolved [ 5 ] |
Summary | [Transaction DocIsolation]: DcpConsumer::prepare: (vb:971) Value cannot contain a body for SyncDelete | Subdoc CreateAsDeleted without any value path can result in non-empty value |
Status | Resolved [ 5 ] | Closed [ 6 ] |
Labels | Transactions approved-for-6.6.0 durability functional-test | Transactions approved-for-6.6.0 durability functional-test txn_testing |
On node 106 see:
2020-06-24T12:20:41.950253-07:00 ERROR 52: (default) DCP (Producer) eq_dcpq:replication:ns_1@10.112.191.106->ns_1@10.112.191.105:default - DcpProducer::handleResponse disconnecting, received unexpected response:{"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"keylen":0,"magic":"ClientResponse","opaque":462,"opcode":"DCP_PREPARE","status":"Invalid arguments"}
then later on see:
2020-06-24T12:20:46.478481-07:00 ERROR (default) VBucket::abort (vb:971) - active failed as no HashTableitem found with key:<ud>cid:0x0:test_docs-0</ud>
2020-06-24T12:20:46.478566-07:00 CRITICAL *** Fatal error encountered during exception handling ***
2020-06-24T12:20:46.478596-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ActiveDurabilityMonitor::abort vb:971 failed with status:1