Details
-
Task
-
Resolution: Unresolved
-
Major
-
7.6.0
-
None
Description
https://cv.jenkins.couchbase.com/job/ns-server-cluster-tests/3285
The test:
- Creates a couchstore bucket named bucket-2
- Starts migration to magma
- For each node in the cluster it:
- # Fails over the node
- # Recovers it
- # Tests that the upgrade was successful
One of the recoveries here failed which was caught as a rebalance failure. We failed when waiting for some seqno in memcached because the socket closed:
[ns_server:info,2023-08-02T17:37:47.809Z,n_16@127.0.0.1:janitor_agent-bucket-2<0.3208.0>:janitor_agent:handle_info:793]Rebalancer <34410.3966.0> died with reason {unexpected_exit,
|
{'EXIT',<34410.4241.0>,
|
{{{{{badmatch,
|
[{<34410.4251.0>,
|
{done,exit,
|
{socket_closed,
|
{gen_server,call,
|
[<34410.4056.0>,
|
{takeover,31},
|
infinity]}},
|
[{gen_server,call,3,
|
[{file,
|
"gen_server.erl"},
|
{line,385}]},
|
{dcp_replicator,
|
'-spawn_and_wait/1-fun-0-',
|
1,
|
[{file,
|
"src/dcp_replicator.erl"},
|
{line,336}]}]}}]},
|
Looking at the memcached logs on node n_15 we hit a GSL assertion causing us to take down the connection. It's in collections code so seems unlikely that it is storage migration related.
2023-08-02T17:37:47.039262+00:00 ERROR 159: Exception occurred during packet execution. Closing connection [ {"ip":"127.0.0.1","port":52354} - {"ip":"127.0.0.1","port":11939} (System, <ud>@ns_server</ud>) ]: GSL: Precondition failure: '!changes.scopesToModify.empty() || changes.changeScopeWithDataLimitExists' at /home/couchbase/jenkins/workspace/ns-server-cluster-tests/kv_engine/engines/ep/src/collections/vbucket_manifest.cc:317. Cookies: [{"aiostat":"success","ewouldblock":false,"packet":{"bodylen":1,"cas":0,"datatype":"raw","extlen":1,"extras":{"state":1},"keylen":0,"magic":"ClientRequest","opaque":23,"opcode":"DCP_SET_VBUCKET_STATE","vbucket":31},"refcount":1,"started":"53104068593626967 (476 us ago)","throttled":false}]
|
2023-08-02T17:37:47.039305+00:00 INFO 159: (No Engine) DCP (Consumer) eq_dcpq:replication:n_16@127.0.0.1->n_15@127.0.0.1:bucket-2 - Removing connection [ {"ip":"127.0.0.1","port":52354} - {"ip":"127.0.0.1","port":11939} (System, <ud>@ns_server</ud>) ]
|
CC Abhijeeth Nuthan, this was caught in one of our cluster tests, but looks like it might be a memcached issue.