Details
-
Bug
-
Resolution: Fixed
-
Critical
-
master, 6.5.0
-
Untriaged
-
No
-
KV-Engine Mad-Hatter Beta
Description
During kv-engine-jepsen-post-commit-124 we see the following error messages in memcached.log:
2019-07-12T05:11:03.695563-07:00 ERROR 52: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:38391 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007fed62857010","ewouldblock":false,"packet":{"bodylen":46,"cas":1562933463452286976,"datatype":["JSON"],"extlen":33,"key":"<ud>.jepsen0003</ud>","keylen":11,"magic":"ClientRequest","opaque":52,"opcode":"DCP_PREPARE","vbucket":134},"refcount":1}] - closing connection ([ 127.0.0.1:38391 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): CheckpointManager::queueDirty(vb:134) - got Ckpt::queueDirty() status:failure:duplicate item when vbstate is non-active:2
|
2019-07-12T05:11:06.288306-07:00 ERROR 54: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:41366 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007fed628a2810","ewouldblock":false,"packet":{"bodylen":48,"cas":1562933465434030080,"datatype":["JSON","Snappy"],"extlen":33,"key":"<ud>.jepsen0003</ud>","keylen":11,"magic":"ClientRequest","opaque":52,"opcode":"DCP_PREPARE","vbucket":134},"refcount":1}] - closing connection ([ 127.0.0.1:41366 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): GSL: Precondition failure at /home/couchbase/jenkins/workspace/kv-engine-jepsen-post-commit/kv_engine/engines/ep/src/vbucket.cc: 3151
|
|
... Same error message repeats after connect / reconnect attempts ...
|
This rest is performing a Jepsen register workload, during this it performs a hard fail failover of one of the nodes. Jepsen doesn't see anything wrong till the rebalance of the cluster fails which causes a crash in the Jepsen framework.
The following test parameters were used for this test:
workload=failover,failover-type=hard,recovery-type=delta,replicas=2,no-autofailover,disrupt-count=1,rate=0,durability=0:100:0:0,eviction-policy=value
To run the test use this command:
lein trampoline run test --nodes-file ./nodes --username vagrant --ssh-private-key ./resources/vagrantkey --package {path to package} --workload=failover --failover-type=hard --recovery-type=delta --replicas=2 --no-autofailover --disrupt-count=1 --rate=0 --durability=0:100:0:0 --eviction-policy=value
Code from vbucket.cc:3151:
std::pair<MutationStatus, boost::optional<VBNotifyCtx>> VBucket::processSet(
|
HashTable::FindCommitResult& htRes,
|
StoredValue*& v,
|
Item& itm,
|
uint64_t cas,
|
bool allowExisting, |
bool hasMetaData, |
const VBQueueItemCtx& queueItmCtx, |
cb::StoreIfStatus storeIfStatus,
|
bool maybeKeyExists) { |
if (v) { |
if (v->isPending()) { |
// It is not valid for an active vBucket to attempt to overwrite an |
// in flight SyncWrite. If this vBucket is not active, we are |
// allowed to overwrite an in flight SyncWrite iff we are receiving |
// a disk snapshot. This is due to disk based de-dupe that allows |
// only 1 value per key. In this case, the active node may send a |
// mutation instead of a commit if it knows that this replica may be |
// missing a prepare. This code allows this mutation to be accepted |
// and overwrites the existing prepare. |
if (getState() == vbucket_state_active || |
!isReceivingDiskSnapshot()) {
|
return {MutationStatus::IsPendingSyncWrite, {}}; |
}
|
|
3151: Expects(itm.isCommitted());
|
getPassiveDM().completeSyncWrite(
|
itm.getKey(), PassiveDurabilityMonitor::Resolution::Commit);
|
|
// Deal with the already existing prepare |
processImplicitlyCompletedPrepare(htRes.pending);
|
|
// Add a new or overwrite the existing mutation |
return processSetInner(htRes.pending.getHBL(), |
htRes.committed,
|
itm,
|
cas,
|
allowExisting,
|
hasMetaData,
|
queueItmCtx,
|
storeIfStatus,
|
maybeKeyExists);
|
}
|
|
// This is a new SyncWrite, we just want to add a new prepare unless we |
// still have a completed prepare (Ephemeral) which we should replace |
// instead. |
if (!v->isCompleted() && itm.isPending()) { |
VBNotifyCtx notifyCtx;
|
// We have to modify the StoredValue pointer passed in or we do not |
// return the correct cas to the client. |
std::tie(v, notifyCtx) = addNewStoredValue(
|
htRes.getHBL(), itm, queueItmCtx, GenerateRevSeqno::No);
|
return {MutationStatus::WasClean, notifyCtx}; |
}
|
}
|
Looking at the logs and the function VBucket::processSet it looks like we're steaming to a replica node and trying to perform a prepared set on top of a prepared set.
Attachments
Issue Links
- blocks
-
MB-34643 DCP replication does not recover after SIGKILL/restart to memcached (dupe PREPARE?) [ETA 2019/7/19]
- Closed
- is duplicated by
-
MB-35078 [Rebalance failure] GSL: Precondition failure being triggered when HCS > HPS [2019/7/19]
- Closed
- relates to
-
MB-35078 [Rebalance failure] GSL: Precondition failure being triggered when HCS > HPS [2019/7/19]
- Closed