Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
6.5.0
-
Untriaged
-
No
-
KV-Engine Mad-Hatter Beta
Description
During kv-engine-jepsen-post-commit-131 we observe GSL: precondition failure being triggered because the HCS > HPS.
2019-07-15T09:18:51.497858-07:00 ERROR 56: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:41574 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f7749dbf010","ewouldblock":false,"packet":{"bodylen":20,"cas":0,"datatype":"raw","extlen":20,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":8,"opcode":"DCP_SNAPSHOT_MARKER","vbucket":688},"refcount":1}] - closing connection ([ 127.0.0.1:41574 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]): GSL: Precondition failure at /home/couchbase/jenkins/workspace/kv-engine-jepsen-post-commit/kv_engine/engines/ep/src/vbucket.cc: 3895
|
Which then goes on to lead to a Rebalance failure for the Jepsen test
2019-07-15 09:19:01,266{GMT} WARN [jepsen nemesis] jepsen.core: Process :nemesis crashed
|
java.lang.RuntimeException: Rebalance failed
|
at couchbase.util$wait_for_rebalance_complete.invokeStatic(util.clj:123) ~[na:na]
|
at couchbase.util$wait_for_rebalance_complete.invoke(util.clj:112) ~[na:na]
|
at couchbase.util$wait_for_rebalance_complete.invokeStatic(util.clj:114) ~[na:na]
|
at couchbase.util$wait_for_rebalance_complete.invoke(util.clj:112) ~[na:na]
|
at couchbase.util$wait_for_rebalance_complete.invokeStatic(util.clj:113) ~[na:na]
|
at couchbase.util$wait_for_rebalance_complete.invoke(util.clj:112) ~[na:na]
|
at couchbase.util$rebalance.invokeStatic(util.clj:153) ~[na:na]
|
at couchbase.util$rebalance.invoke(util.clj:136) ~[na:na]
|
at couchbase.util$rebalance.invokeStatic(util.clj:138) ~[na:na]
|
at couchbase.util$rebalance.invoke(util.clj:136) ~[na:na]
|
at couchbase.nemesis$couchbase$reify__3367.invoke_BANG_(nemesis.clj:181) ~[na:na]
|
at jepsen.nemesis$invoke_compat_BANG_.invokeStatic(nemesis.clj:40) ~[jepsen-0.1.14.jar:na]
|
at jepsen.nemesis$invoke_compat_BANG_.invoke(nemesis.clj:36) ~[jepsen-0.1.14.jar:na]
|
at couchbase.core$_main$fn__4709$fn__4710.invoke(core.clj:236) ~[na:na]
|
.....
|
Method that contains the GSL precondition:
void VBucket::setUpAllowedDuplicatePrepareWindow() { |
auto& dm = getDurabilityMonitor();
|
auto hcs = dm.getHighCompletedSeqno();
|
auto hps = dm.getHighPreparedSeqno();
|
Expects(hcs <= hps);
|
|
int64_t newDuplicateCount = hps - hcs;
|
allowedDuplicatePrepareSeqnos.reserve(allowedDuplicatePrepareSeqnos.size() +
|
newDuplicateCount);
|
|
for (int64_t dupSeqno = hcs + 1; dupSeqno <= hps; dupSeqno++) { |
allowedDuplicatePrepareSeqnos.insert(dupSeqno);
|
}
|
}
|
To run the jepsen test use the following command:
lein trampoline run test --nodes-file ./nodes --username vagrant --ssh-private-key ./resources/vagrantkey --package /home/couchbase/jenkins/workspace/kv-engine-jepsen-post-commit/install --workload=failover --failover-type=hard --recovery-type=delta --replicas=2 --no-autofailover --disrupt-count=1 --rate=0 --durability=0:100:0:0 --eviction-policy=value
|