Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Enterprise Edition 7.0.0 build 4060
Enterprise Edition 6.5.0 build 4960
-
Triaged
-
Centos 64-bit
-
-
1
-
Unknown
-
KV-Engine 2021-Jan
Description
Build: 7.0.0-4060 from 6.5.0-4960
Scenario:
- 4 node KV cluster (6.5.0-4960) with couchbase bucket (replica=1)
+----------------+----------+-----------------+------------+------------+-----------------------+-----------------------+
| Node | Services | CPU_utilization | Mem_total | Mem_free | Swap_mem_used | Version |
+----------------+----------+-----------------+------------+------------+-----------------------+-----------------------+
| 172.23.105.212 | kv | 1.25628140704 | 4201840640 | 3647959040 | 6553600 / 3758092288 | 6.5.0-4960-enterprise |
| 172.23.105.155 | kv | 1.25628140704 | 4201840640 | 3691266048 | 0 / 3758092288 | 6.5.0-4960-enterprise |
| 172.23.105.213 | kv | 1.51133501259 | 4201840640 | 3686121472 | 55312384 / 3758092288 | 6.5.0-4960-enterprise |
| 172.23.105.211 | kv | 0.759493670886 | 4201840640 | 3658952704 | 14680064 / 3758092288 | 6.5.0-4960-enterprise |
+----------------+----------+-----------------+------------+------------+-----------------------+-----------------------++---------+-----------+----------+------------+-----+-------+-------------+----------+-----------+
| Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
+---------+-----------+----------+------------+-----+-------+-------------+----------+-----------+
| default | couchbase | 1 | none | 0 | 50000 | 13434355712 | 80015424 | 171941441 |
+---------+-----------+----------+------------+-----+-------+-------------+----------+-----------+
- Upgrading to 7.0.0-4060 using swap rebalance with sync-writes updates in background
Observation:
During upgrade of node "172.23.105.212 <-> 172.23.100.163" seeing rebalance failure with following logs
Node: 172.23.105.212 file: memcached.log.000000.txt
2020-12-16T23:22:49.729775-08:00 ERROR 44: exception occurred in runloop during packet execution. Cookie info: [] - closing connection ([ 172.23.100.163:59737 - 172.23.105.212:11209 (<ud>@ns_server</ud>) ]): to_string(cb::mcbp::Status): Invalid status code: 11
|
UI logs:
Worker <0.7388.3> (for action {move,{601,
|
['ns_1@172.23.105.212',
|
'ns_1@172.23.100.162'],
|
['ns_1@172.23.100.163',
|
'ns_1@172.23.100.162'],
|
[]}}) exited with reason {unexpected_exit, {'EXIT', <0.8856.3>,
|
{{{{{child_interrupted, {'EXIT', <28291.16197.0>, socket_closed}},
|
[{dcp_replicator, spawn_and_wait, 1, [{file, "src/dcp_replicator.erl"}, {line, 265}]},
|
{dcp_replicator, handle_call, 3, [{file, "src/dcp_replicator.erl"}, {line, 127}]},
|
{gen_server, try_handle_call, 4, [{file, "gen_server.erl"}, {line, 661}]},
|
{gen_server, handle_msg, 6, [{file, "gen_server.erl"}, {line, 690}]},
|
{proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 249}]}]},
|
{gen_server, call, [<28291.16195.0>, get_partitions, infinity]}},
|
{gen_server, call,
|
['dcp_replication_manager-default', {get_replicator_pid, 586}, infinity]}},
|
{gen_server, call, [{'janitor_agent-default',
|
'ns_1@172.23.100.163'}, {if_rebalance, <0.18173.2>,
|
{dcp_takeover, 'ns_1@172.23.105.212', 601}}, infinity]}}}}
|
|
Rebalance exited with reason {mover_crashed,
|
{unexpected_exit, {'EXIT',<0.8856.3>,
|
{{{{{child_interrupted,
|
{'EXIT',<28291.16197.0>,socket_closed}},
|
[{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,265}]},
|
{dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,127}]},
|
{gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,661}]},
|
{gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,690}]},
|
{proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,249}]}]},
|
{gen_server,call,
|
[<28291.16195.0>,get_partitions, infinity]}}, {gen_server,call,
|
['dcp_replication_manager-default', {get_replicator_pid,586}, infinity]}},
|
{gen_server,call,
|
[{'janitor_agent-default', 'ns_1@172.23.100.163'},
|
{if_rebalance,<0.18173.2>,
|
{dcp_takeover,'ns_1@172.23.105.212',601}}, infinity]}}}}}.
|
Rebalance Operation Id = 38dc297bf54d83472be688f1f6539e36
|