Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.2.0
-
Enterprise Edition 7.2.0 build 5284
-
Untriaged
-
Centos 64-bit
-
0
-
Unknown
Description
Build: 7.2.0-5284
Steps:
- 5 node kv cluster
+----------------+-----------------+-----------+-----------+
| Node | CPU_utilization | Mem_total | Mem_free |
+----------------+-----------------+-----------+-----------+
| 172.23.107.217 | 1.11707161163 | 23.36 GiB | 21.81 GiB |
| 172.23.107.222 | 1.2470319506 | 23.36 GiB | 21.87 GiB |
| 172.23.107.102 | 2.46339231162 | 23.36 GiB | 22.26 GiB |
| 172.23.107.99 | 3.04053523462 | 23.36 GiB | 22.15 GiB |
| 172.23.107.223 | 1.21592981087 | 23.36 GiB | 22.15 GiB |
+----------------+-----------------+-----------+-----------+
- 3 buckets (2 Magma & 1 Couchstore) with replicas=2
+---------+-----------------+----------+---------+-----------+------------+------------+---------------+
| Bucket | Storage Backend | Replicas | Items | RAM Quota | RAM Used | Disk Used | ARR |
+---------+-----------------+----------+---------+-----------+------------+------------+---------------+
| bucket1 | couchstore | 2 | 100000 | 9.77 GiB | 319.14 MiB | 230.10 MiB | 100 |
| bucket2 | magma | 2 | 50000 | 4.88 GiB | 449.42 MiB | 365.12 MiB | 100 |
| default | magma | 2 | 8140000 | 2.50 GiB | 1.65 GiB | 17.03 GiB | 20.9565356265 |
+---------+-----------------+----------+---------+-----------+------------+------------+---------------+
- Initial load + load history data using cont. upserts (All with durability=MAJORITY)
- Graceful failover the node '172.23.107.102' (Success)
- Add back the node using 'Delta' recovery and trigger rebalance
Operation Id = 5b2a77f71136e936a2128d26358ee30b
Observation:
From ns_server logs
[error_logger:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.25969.34>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: ns_single_vbucket_mover:'-wait_dcp_data_move/5-fun-0-'/0
|
pid: <0.25969.34>
|
registered_name: []
|
exception error: {dcp_wait_for_data_move_failed,"default",457,
|
'ns_1@172.23.107.222',
|
['ns_1@172.23.107.102','ns_1@172.23.107.99'],
|
{error,no_stats_for_this_vbucket}}
|
in function ns_single_vbucket_mover:'-wait_dcp_data_move/5-fun-0-'/5 (src/ns_single_vbucket_mover.erl, line 451)
|
ancestors: [<0.26092.34>,<0.25649.34>,<0.6191.33>]
|
message_queue_len: 0
|
messages: []
|
links: [<0.26092.34>]
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 987
|
stack_size: 29
|
reductions: 3479
|
neighbours:
|
|
[ns_server:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.26092.34>:ns_single_vbucket_mover:spawn_and_wait:79]Got unexpected exit signal {'EXIT',<0.25969.34>,
|
{{dcp_wait_for_data_move_failed,"default",457,
|
'ns_1@172.23.107.222',
|
['ns_1@172.23.107.102',
|
'ns_1@172.23.107.99'],
|
{error,no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',5,
|
[{file,"src/ns_single_vbucket_mover.erl"},
|
{line,451}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,211}]}]}}
|
[ns_server:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.26092.34>:misc:sync_shutdown_many_i_am_trapping_exits:1456]Shutdown of the following failed: [{<0.25969.34>,
|
{{dcp_wait_for_data_move_failed,
|
"default",457,'ns_1@172.23.107.222',
|
['ns_1@172.23.107.102',
|
'ns_1@172.23.107.99'],
|
{error,no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',5,
|
[{file,
|
"src/ns_single_vbucket_mover.erl"},
|
{line,451}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,211}]}]}}]
|
[ns_server:error,2023-03-31T01:06:14.802-07:00,ns_1@172.23.107.222:<0.26092.34>:misc:try_with_maybe_ignorant_after:1491]Eating exception from ignorant after-block:
|
{error,
|
{badmatch,
|
[{<0.25969.34>,
|
{{dcp_wait_for_data_move_failed,"default",457,
|
'ns_1@172.23.107.222',
|
['ns_1@172.23.107.102','ns_1@172.23.107.99'],
|
{error,no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,'-wait_dcp_data_move/5-fun-0-',5,
|
[{file,"src/ns_single_vbucket_mover.erl"},{line,451}]},
|
{proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,211}]}]}}]},
|
[{misc,sync_shutdown_many_i_am_trapping_exits,1,
|
[{file,"src/misc.erl"},{line,1458}]},
|
{misc,try_with_maybe_ignorant_after,2,
|
[{file,"src/misc.erl"},{line,1489}]},
|
{ns_single_vbucket_mover,mover,6,
|
[{file,"src/ns_single_vbucket_mover.erl"},{line,49}]},
|
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
|
[error_logger:error,2023-03-31T01:06:14.803-07:00,ns_1@172.23.107.222:<0.26092.34>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: ns_single_vbucket_mover:mover/6
|
pid: <0.26092.34>
|
registered_name: []
|
exception exit: {unexpected_exit,
|
{'EXIT',<0.25969.34>,
|
{{dcp_wait_for_data_move_failed,"default",457,
|
'ns_1@172.23.107.222',
|
['ns_1@172.23.107.102','ns_1@172.23.107.99'],
|
{error,no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',5,
|
[{file,"src/ns_single_vbucket_mover.erl"},
|
{line,451}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,211}]}]}}}
|
in function ns_single_vbucket_mover:spawn_and_wait/1 (src/ns_single_vbucket_mover.erl, line 80)
|
in call from ns_single_vbucket_mover:mover_inner/6 (src/ns_single_vbucket_mover.erl, line 152)
|
in call from ns_single_vbucket_mover:'-mover/6-fun-1-'/6 (src/ns_single_vbucket_mover.erl, line 52)
|
in call from misc:try_with_maybe_ignorant_after/2 (src/misc.erl, line 1487)
|
in call from ns_single_vbucket_mover:mover/6 (src/ns_single_vbucket_mover.erl, line 49)
|
ancestors: [<0.25649.34>,<0.6191.33>]
|
message_queue_len: 0
|
messages: []
|
links: [<0.25649.34>]
|
dictionary: [{cleanup_list,[<0.25969.34>]}]
|
trap_exit: true
|
status: running
|
heap_size: 6772
|
stack_size: 29
|
reductions: 19619
|
neighbours:
|
[rebalance:error,2023-03-31T01:06:14.803-07:00,ns_1@172.23.107.222:<0.25649.34>:ns_vbucket_mover:handle_info:212]Worker <0.26092.34> (for action {move,{457, ['ns_1@172.23.107.222', 'ns_1@172.23.107.99', 'ns_1@172.23.107.102'], ['ns_1@172.23.107.222', 'ns_1@172.23.107.102', 'ns_1@172.23.107.99'], []}}) exited with reason {unexpected_exit, {'EXIT', <0.25969.34>, {{dcp_wait_for_data_move_failed, "default", 457, 'ns_1@172.23.107.222', ['ns_1@172.23.107.102', 'ns_1@172.23.107.99'], {error, no_stats_for_this_vbucket}}, [{ns_single_vbucket_mover, '-wait_dcp_data_move/5-fun-0-', 5, [{file, "src/ns_single_vbucket_mover.erl"}, {line, 451}]}, {proc_lib, init_p,3, [{file, "proc_lib.erl"}, {line, 211}]}]}}}
|
Note: Same test passes with very low amount of historical data
TAF test:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i node.ini -p rerun=False,get-cbcollect-info=False,skip_cluster_reset=True,upgrade_version=7.2.0-5284 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_recovery,doc_size=512,data_load_stage=during,bucket_history_retention_seconds=300,bucket_spec=magma_dgm.20_percent_dgm.5_node_2_replica_magma_512,bucket_history_retention_bytes=2147483648,nodes_failover=1,durability=MAJORITY,recovery_type=delta,skip_validations=False,nodes_init=5,default_history_retention_for_collections=false,disk_optimized_thread_settings=True,autoCompactionDefined=true,randomize_value=True,override_spec_params=durability,disk_optimized_thread_settings=True,get-cbcollect-info=False,autoCompactionDefined=true,dedupe_update_itrs=3000,log_level=debug,infra_log_level=debug'
|
Attachments
For Gerrit Dashboard: MB-56256 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
189083,3 | MB-56256: CDC don't deduplicate abort/prepare | neo | kv_engine | Status: MERGED | +2 | +1 |
189152,3 | MB-56256: Don't deduplicate away abort/commit for historical flush | neo | kv_engine | Status: MERGED | +2 | +1 |
193447,4 | Merge neo/dc1edd11b into master | master | kv_engine | Status: MERGED | +2 | +1 |