Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
7.2.0
-
Enterprise Edition 7.2.0 build 5241
-
Triaged
-
Centos 64-bit
-
0
-
No
Description
Script to Repro
./sequoia -client 172.23.104.27:2375 -provider file:centos_pine.yml -test tests/integration/7.2/test_7.2.yml -scope tests/integration/7.2/scope_7.2_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.2.0-5241 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
This is basically the neo longetivity test that has been repurposed to run with CDC enabled buckets.
When we tried to hard failover a node and rebalance out the node ,rebalance failed. Repeated retrying of rebalance did not help.
From cli
2023-03-10T07:00:30-08:00, sequoiatools/couchbase-cli:7.1:bd14e6] failover -c 172.23.108.103:8091 --server-failover 172.23.121.117:8091 -u Administrator -p password --hard
|
[2023-03-10T07:00:38-08:00, sequoiatools/couchbase-cli:7.1:247f47] rebalance -c 172.23.108.103:8091 -u Administrator -p password
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
|
|
docker logs 247f47
|
docker start 247f47
|
|
������sWARNING: couchbase-cli version 7.1.0-1345-enterprise does not match couchbase server version 7.2.0-5241-enterprise
|
������*Unable to display progress bar on this os
|
������JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
172.23.108.103 7:00:32 AM 10 Mar, 2023
Failover completed successfully.
|
Rebalance Operation Id = 99f9a9ca302c8ce1447689ff6a3cf95e
|
172.23.108.103 7:00:39 AM 10 Mar, 2023
Starting rebalance, KeepNodes = ['ns_1@172.23.104.137','ns_1@172.23.104.155',
|
'ns_1@172.23.104.157','ns_1@172.23.104.5',
|
'ns_1@172.23.104.67','ns_1@172.23.104.69',
|
'ns_1@172.23.104.70','ns_1@172.23.105.107',
|
'ns_1@172.23.105.111','ns_1@172.23.106.100',
|
'ns_1@172.23.106.188','ns_1@172.23.108.103',
|
'ns_1@172.23.120.107','ns_1@172.23.120.245',
|
'ns_1@172.23.123.28','ns_1@172.23.96.148',
|
'ns_1@172.23.96.192','ns_1@172.23.96.252',
|
'ns_1@172.23.96.253','ns_1@172.23.97.119',
|
'ns_1@172.23.97.121','ns_1@172.23.97.122',
|
'ns_1@172.23.97.239','ns_1@172.23.99.11',
|
'ns_1@172.23.99.20','ns_1@172.23.99.21',
|
'ns_1@172.23.99.25'], EjectNodes = [], Failed over and being ejected nodes = ['ns_1@172.23.121.117']; no delta recovery nodes; Operation Id = abc3fbd1e8c1fc50d21e810f86fc0c1e
|
172.23.108.103 7:00:52 AM 10 Mar, 2023
Worker <0.990.200> (for action {move,{170,
|
['ns_1@172.23.99.21',
|
'ns_1@172.23.99.25'],
|
['ns_1@172.23.106.100',
|
'ns_1@172.23.99.25'],
|
[]}}) exited with reason {unexpected_exit,
|
{'EXIT',
|
<0.32364.199>,
|
{{wait_seqno_persisted_failed,
|
"ITEM",170,
|
45569,
|
[{'ns_1@172.23.106.100',
|
{'EXIT',
|
{socket_closed,
|
{gen_server,
|
call,
|
[{'janitor_agent-ITEM',
|
'ns_1@172.23.106.100'},
|
{if_rebalance,
|
<0.824.200>,
|
{wait_seqno_persisted,
|
170,
|
45569}},
|
infinity]}}}}]},
|
[{ns_single_vbucket_mover,
|
'-wait_seqno_persisted_many/5-fun-2-',
|
5,
|
[{file,
|
"src/ns_single_vbucket_mover.erl"},
|
{line,
|
474}]},
|
{proc_lib,
|
init_p,3,
|
[{file,
|
"proc_lib.erl"},
|
{line,
|
211}]}]}}}
|
172.23.108.103 7:00:52 AM 10 Mar, 2023
Rebalance exited with reason {mover_crashed,
|
{unexpected_exit,
|
{'EXIT',<0.32364.199>,
|
{{wait_seqno_persisted_failed,"ITEM",170,
|
45569,
|
[{'ns_1@172.23.106.100',
|
{'EXIT',
|
{socket_closed,
|
{gen_server,call,
|
[{'janitor_agent-ITEM',
|
'ns_1@172.23.106.100'},
|
{if_rebalance,<0.824.200>,
|
{wait_seqno_persisted,170,45569}},
|
infinity]}}}}]},
|
[{ns_single_vbucket_mover,
|
'-wait_seqno_persisted_many/5-fun-2-',5,
|
[{file,"src/ns_single_vbucket_mover.erl"},
|
{line,474}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,211}]}]}}}}.
|
Rebalance Operation Id = abc3fbd1e8c1fc50d21e810f86fc0c1e
|
Retried failed rebalance 10 times. Failed every single time.
We had a run on 7.2.0-5237 where we did not hit this issue. So, marking it a regression.
Attachments
Issue Links
- is duplicated by
-
MB-55927 Caught unhandled std::exception-derived exception. what(): VBucket::processSet: vb:1011 expected a complete item but the item is a prepare <ud>cid:0x0:00000000000000004724</ud> with seqno:116913. Existing prepare has seqno:116640
- Closed