Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
7.2.1
-
Enterprise Edition 7.2.1 build 5849
-
Untriaged
-
Centos 64-bit
-
0
-
No
Description
Script to Repro
./sequoia -client 172.23.104.27:2375 -provider file:centos_pine.yml -test tests/integration/7.2/test_7.2.yml -scope tests/integration/7.2/scope_7.2_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.2.1-5849 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
The test was running fine for 6 days. On day 6 we had a rebalance in of kv nodes which failed.
[2023-07-16T15:08:59-07:00, sequoiatools/couchbase-cli:7.1:35297f] server-add -c 172.23.108.103:8091 --server-add https://172.23.99.20 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
|
[2023-07-16T15:09:38-07:00, sequoiatools/couchbase-cli:7.1:e896ce] rebalance -c 172.23.108.103:8091 -u Administrator -p password
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
|
|
docker logs e896ce
|
docker start e896ce
|
|
*Unable to display progress bar on this os
|
JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
172.23.108.103 : rebalance
[user:error,2023-07-16T18:29:21.931-07:00,ns_1@172.23.108.103<0.25113.0>:ns_orchestrator:log_rebalance_completion:1433]Rebalance exited with reason bad_replicas.
|
From debug.log of 172.23.108.103
[user:info,2023-07-16T14:52:20.377-07:00,ns_1@172.23.108.103:<0.31907.2404>:ns_rebalancer:verify_replication:849]Bad replicators after rebalance:
|
Missing = [{'ns_1@172.23.106.100','ns_1@172.23.99.25',69},
|
{'ns_1@172.23.106.100','ns_1@172.23.99.25',186},
|
{'ns_1@172.23.106.100','ns_1@172.23.99.25',258},
|
{'ns_1@172.23.106.100','ns_1@172.23.99.25',464},
|
{'ns_1@172.23.106.100','ns_1@172.23.99.25',550},
|
{'ns_1@172.23.106.100','ns_1@172.23.99.25',790},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',88},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',89},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',176},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',177},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',181},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',182},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',185},
|
{'ns_1@172.23.108.103','ns_1@172.23.99.25',202},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',288},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',349},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',364},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',365},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',366},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',368},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',371},
|
{'ns_1@172.23.121.117','ns_1@172.23.99.25',777},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',549},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',551},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',552},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',553},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',554},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',555},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',556},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',557},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',558},
|
{'ns_1@172.23.97.121','ns_1@172.23.99.25',750},
|
{'ns_1@172.23.97.122','ns_1@172.23.99.25',626},
|
{'ns_1@172.23.97.122','ns_1@172.23.99.25',644},
|
{'ns_1@172.23.97.122','ns_1@172.23.99.25',648},
|
{'ns_1@172.23.97.122','ns_1@172.23.99.25',651},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',842},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',921},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',923},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',924},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',925},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',926},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',927},
|
{'ns_1@172.23.99.21','ns_1@172.23.99.25',928}]
|
Extras = []
|
[ns_server:info,2023-07-16T14:52:20.379-07:00,ns_1@172.23.108.103:rebalance_agent<0.23399.0>:rebalance_agent:handle_down:290]Rebalancer process <0.31907.2404> died (reason bad_replicas).
|
[ns_server:debug,2023-07-16T14:52:20.380-07:00,ns_1@172.23.108.103:leader_activities<0.25076.0>:leader_activities:handle_activity_down:450]Activity terminated with reason {shutdown,
|
{async_died,
|
{raised,
|
{exit,bad_replicas,
|
[{ns_rebalancer,verify_replication,3,
|
[{file,"src/ns_rebalancer.erl"},
|
{line,852}]},
|
{lists,foreach,2,
|
[{file,"lists.erl"},{line,1342}]},
|
{ns_rebalancer,rebalance_kv,4,
|
[{file,"src/ns_rebalancer.erl"},
|
{line,573}]},
|
{ns_rebalancer,rebalance_body,5,
|
[{file,"src/ns_rebalancer.erl"},
|
{line,524}]},
|
{async,'-async_init/4-fun-1-',3,
|
[{file,"src/async.erl"},
|
{line,191}]}]}}}}. Activity:
|
{activity,<0.32525.2404>,#Ref<0.3410623904.2699821063.174559>,default,
|
<<"bc6150dd5e92a7291c7d716fa589547a">>,
|
[rebalance],
|
majority,[]}
|
[error_logger:error,2023-07-16T14:52:20.380-07:00,ns_1@172.23.108.103:<0.26414.2404>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: erlang:apply/2
|
pid: <0.26414.2404>
|
registered_name: []
|
exception exit: bad_replicas
|
in function ns_rebalancer:verify_replication/3 (src/ns_rebalancer.erl, line 852)
|
in call from lists:foreach/2 (lists.erl, line 1342)
|
in call from ns_rebalancer:rebalance_kv/4 (src/ns_rebalancer.erl, line 573)
|
in call from ns_rebalancer:rebalance_body/5 (src/ns_rebalancer.erl, line 524)
|
in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 191)
|
ancestors: [<0.25113.0>,ns_orchestrator_child_sup,ns_orchestrator_sup,
|
mb_master_sup,mb_master,leader_registry_sup,
|
leader_services_sup,<0.23335.0>,ns_server_sup,
|
ns_server_nodes_sup,<0.269.0>,ns_server_cluster_sup,
|
root_sup,<0.145.0>]
|
message_queue_len: 0
|
messages: []
|
links: [<0.25113.0>]
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 121536
|
stack_size: 29
|
reductions: 12697
|
neighbours:
|
|
[user:error,2023-07-16T14:52:20.388-07:00,ns_1@172.23.108.103:<0.25113.0>:ns_orchestrator:log_rebalance_completion:1433]Rebalance exited with reason bad_replicas.
|
Rebalance Operation Id = f1175217dfc503ff2f64e14420629045
|
We haven't yet had a clean run to get a baseline. Marking this as not a regression.
cbcollect_info attached.