Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
6.6.0
-
Enterprise Edition 6.0.1 build 2037 (Existing 4 node cluster)
Enterprise Edition 6.6.1 build 9182 (New node coming in)
-
Triaged
-
Centos 64-bit
-
-
1
-
Yes
Description
Scenario:
- 4 Node cluster (6.0.1 build 2037)
- Create couchbase-bucket with replica=1, size=100M
- Load bucket into DGM using the cbworkloadgen,
./cbworkloadgen -n 172.23.105.155:8091 -b default -u Administrator -p password --max-items=1200000 -r .95 -l
- Rebalance_in 6.6.1-9182 node into the cluster (172.23.105.244)
Observation:
Rebalance failed with following reason,
Worker <0.3486.0> (for action {move,{577,
|
['ns_1@172.23.105.212',
|
'ns_1@172.23.105.155'],
|
['ns_1@172.23.105.155',
|
'ns_1@172.23.105.244'],
|
[]}}) exited with reason {unexpected_exit,
|
{'EXIT', <0.3512.0>,
|
{{bulk_set_vbucket_state_failed,
|
[{'ns_1@172.23.105.244',
|
{'EXIT',
|
{{{{{badmatch,
|
[{<0.3659.0>,
|
{done, exit, {socket_closed,
|
{gen_server, call, [<0.2333.0>,
|
{setup_streams, [83, 339]}, infinity]}},
|
[{gen_server, call, 3,
|
[{file, "gen_server.erl"}, {line, 214}]},
|
{dcp_replicator, '-spawn_and_wait/1-fun-0-', 1,
|
[{file, "src/dcp_replicator.erl"}, {line, 243}]}]}}]},
|
[{misc, sync_shutdown_many_i_am_trapping_exits, 1,
|
[{file, "src/misc.erl"}, {line, 1374}]},
|
{dcp_replicator, spawn_and_wait, 1,
|
[{file, "src/dcp_replicator.erl"}, {line, 265}]},
|
{dcp_replicator, handle_call, 3,
|
[{file, "src/dcp_replicator.erl"}, {line, 121}]},
|
{gen_server, try_handle_call, 4,
|
[{file, "gen_server.erl"}, {line, 636}]},
|
{gen_server, handle_msg, 6,
|
[{file, "gen_server.erl"}, {line, 665}]},
|
{proc_lib, init_p_do_apply, 3,
|
[{file, "proc_lib.erl"}, {line, 247}]}]},
|
{gen_server, call,
|
[<0.2332.0>, get_partitions, infinity]}},
|
{gen_server, call,
|
['dcp_replication_manager-default',
|
{get_replicator_pid, 83}, infinity]}},
|
{gen_server, call,
|
[{'janitor_agent-default', 'ns_1@172.23.105.244'},
|
{if_rebalance, <0.2246.0>,
|
{update_vbucket_state, 577, replica,
|
undefined, 'ns_1@172.23.105.212'}}, infinity]}}}}]},
|
[{janitor_agent, bulk_set_vbucket_state, 4,
|
[{file, "src/janitor_agent.erl"}, {line, 403}]},
|
{proc_lib, init_p,3,
|
[{file, "proc_lib.erl"}, {line, 232}]}]}}}
|
Note: Hit this while trying to validate MB-41283
ns_server.info.log:
[ns_server:info,2020-11-16T22:39:20.252-08:00,ns_1@172.23.105.155:janitor_agent-default<0.14362.0>:janitor_agent:handle_info:801]Rebalancer <13742.2246.0> died with reason {unexpected_
exit,
{'EXIT',<13742.3512.0>,
{{bulk_set_vbucket_state_failed,
[{'ns_1@172.23.105.244',
{'EXIT',
{{{{{badmatch,
[{<13742.3659.0>,
{done,exit,
{socket_closed,
memcached error on node .244 (which is the only 6.6.1 node), replica connection being closed:
2020-11-16T22:39:19.828037-08:00 INFO 53: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb:83) Attempting to add stream: opaque_:19, start_seqno_:0, end_seqno_:18446744073709551615, vb_uuid:267978355066003, snap_start_seqno_:0, snap_end_seqno_:0, last_seqno:0, stream_req_value:{"uid":"0"}
..
2020-11-16T22:39:19.858135-08:00 ERROR 53: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ {\"ip\":\"127.0.0.1\",\"port\":43312} - {\"ip\":\"127.0.0.1\",\"port\":11209} (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f4283feac10","ewouldblock":false,"packet":{"bodylen":52,"cas":1605595159858380800,"datatype":"raw","extlen":31,"key":"<ud>pymc1033629</ud>","keylen":11,"magic":"ClientRequest","opaque":19,"opcode":"DCP_MUTATION","vbucket":83},"refcount":1}] - closing connection ([ {"ip":"127.0.0.1","port":43312} - {"ip":"127.0.0.1","port":11209} (<ud>@ns_server</ud>) ]): Checkpoint::queueDirty: Unable to find key in keyIndex with op:mutation seqno:11190for cursor:persistence in current checkpoint.
The producer is node .155 (6.0.1) that is sending a disk snapshot:
2020-11-16T22:39:19.831717-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615
2020-11-16T22:39:19.831763-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Scheduling backfill from 1 to 12024, reschedule flag : False
2020-11-16T22:39:19.832021-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Sending disk snapshot with start seqno 0 and end seqno 12106