Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
7.2.1
-
Enterprise Edition 7.2.1 build 5882
-
Untriaged
-
Centos 64-bit
-
0
-
Unknown
Description
Script to Repro
./sequoia -client 172.23.104.27:2375 -provider file:centos_pine.yml -test tests/integration/7.2/test_7.2.yml -scope tests/integration/7.2/scope_7.2_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.2.1-5882 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=1209600 -show_topology=true
|
We have been running Longevity runs for almost 7 days now. We initially saw the issue https://issues.couchbase.com/browse/MB-57874?focusedId=698741&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-698741
Then we saw another instance of kv rebalance failure in subsequent rebalances.
[2023-07-31T04:08:39-07:00, sequoiatools/couchbase-cli:7.1:fe851a] server-add -c 172.23.108.103:8091 --server-add https://172.23.105.107 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
|
|
[2023-07-31T04:08:51-07:00, sequoiatools/couchbase-cli:7.1:5a6558] rebalance -c 172.23.108.103:8091 -u Administrator -p password
|
→
|
Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
|
|
docker logs 5a6558
|
docker start 5a6558
|
|
*Unable to display progress bar on this os
|
JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
Rebalance start
2023-07-31T04:08:45.907-07:00, ns_cluster:3:info:message(ns_1@172.23.105.107) - Node ns_1@172.23.105.107 joined cluster
|
2023-07-31T04:08:46.045-07:00, memcached_config_mgr:0:info:message(ns_1@172.23.105.107) - Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]
|
2023-07-31T04:08:52.455-07:00, ns_orchestrator:0:info:message(ns_1@172.23.108.103) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.137','ns_1@172.23.104.155',
|
'ns_1@172.23.104.157','ns_1@172.23.104.67',
|
'ns_1@172.23.104.69','ns_1@172.23.104.70',
|
'ns_1@172.23.105.107','ns_1@172.23.105.111',
|
'ns_1@172.23.105.168','ns_1@172.23.106.100',
|
'ns_1@172.23.106.188','ns_1@172.23.108.103',
|
'ns_1@172.23.120.107','ns_1@172.23.120.245',
|
'ns_1@172.23.121.117','ns_1@172.23.123.28',
|
'ns_1@172.23.96.148','ns_1@172.23.96.192',
|
'ns_1@172.23.96.252','ns_1@172.23.96.253',
|
'ns_1@172.23.97.119','ns_1@172.23.97.121',
|
'ns_1@172.23.97.122','ns_1@172.23.97.239',
|
'ns_1@172.23.99.11','ns_1@172.23.99.20',
|
'ns_1@172.23.99.21','ns_1@172.23.99.25'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = dce5f69909dffdb90a1c5de0dd4015d9
|
Rebalance failure
2023-07-31T09:09:40.338-07:00, ns_vbucket_mover:0:critical:message(ns_1@172.23.108.103) - Worker <0.3117.2773> (for action {move,
|
{464,
|
['ns_1@172.23.97.119',
|
'ns_1@172.23.99.25'],
|
['ns_1@172.23.105.107',
|
'ns_1@172.23.99.25'],
|
[]}}) exited with reason {unexpected_exit,
|
{'EXIT',
|
<0.4415.2773>,
|
{{dcp_wait_for_data_move_failed,
|
"default",
|
464,
|
'ns_1@172.23.97.119',
|
['ns_1@172.23.105.107',
|
'ns_1@172.23.99.25'],
|
{error,
|
no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',
|
5,
|
[{file,
|
"src/ns_single_vbucket_mover.erl"},
|
{line,
|
451}]},
|
{proc_lib,
|
init_p,3,
|
[{file,
|
"proc_lib.erl"},
|
{line,
|
211}]}]}}}
|
2023-07-31T09:09:40.401-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {mover_crashed,
|
{unexpected_exit,
|
{'EXIT',<0.4415.2773>,
|
{{dcp_wait_for_data_move_failed,"default",
|
464,'ns_1@172.23.97.119',
|
['ns_1@172.23.105.107','ns_1@172.23.99.25'],
|
{error,no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',5,
|
[{file,"src/ns_single_vbucket_mover.erl"},
|
{line,451}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,211}]}]}}}}.
|
Rebalance Operation Id = dce5f69909dffdb90a1c5de0dd4015d9
|
cbcollect_info attached.
Attachments
Issue Links
- relates to
-
MB-57874 Tombstone purging triggered cancellation of a DCP backfill during rebalance. Was: [System Test] :- Rebalance in of KV nodes fails with "Rebalance exited with reason bad_replicas."
- Closed