Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Operating System : Debian GNU/Linux 12 (bookworm)
Couchbase Enterprise Edition 7.6.0-1851
-
Untriaged
-
-
0
-
Unknown
-
KV 2023-4
Description
Steps to repro
- Created a 4 node kv cluster
- Created 10 buckets with different configurations
- Created 5 scopes per bucket and 20 collections per scope
- Loaded data onto each collection (Around 4000 docs onto each collection)
- Multiple operations were performed
- Add node
- Remove node
- Failover
- Failover and recovery
- Shuffling nodes between groups
- Editing bucket properties
- Stop rebalance and restart (Rebalance failed once at Timestamp 2023-11-26T00:10:10.905-08:00 reported in
MB-59828)
- A failure was induced in the latest rebalance by stopping couchbase server in one of the nodes (Rebalance at timestamp 2023-11-26T12:27:20.999-08:00)
- The failure was reverted by starting couchbase server
- Rebalance was retried multiple times and it fails
Rebalance fails
2023-11-26T12:28:11.541-08:00, ns_orchestrator:0:info:message(ns_1@172.23.104.66) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.66','ns_1@172.23.105.179', 'ns_1@172.23.105.192','ns_1@172.23.121.71', 'ns_1@172.23.96.168','ns_1@172.23.96.196', 'ns_1@172.23.96.220','ns_1@172.23.96.221', 'ns_1@172.23.97.78'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 1122b687523e734d64a07288f16a24f92023-11-26T12:28:21.627-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.66) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket1", {error,wait_for_memcached_failed, ['ns_1@172.23.97.78']}}.Rebalance Operation Id = 1122b687523e734d64a07288f16a24f92023-11-26T12:28:46.228-08:00, ns_memcached:0:info:message(ns_1@172.23.97.78) - Bucket "bucket8" loaded on node 'ns_1@172.23.97.78' in 1 seconds.2023-11-26T12:28:46.228-08:00, ns_memcached:0:info:message(ns_1@172.23.97.78) - Bucket "bucket10" loaded on node 'ns_1@172.23.97.78' in 40 seconds.2023-11-26T12:28:46.228-08:00, ns_memcached:0:info:message(ns_1@172.23.97.78) - Bucket "bucket9" loaded on node 'ns_1@172.23.97.78' in 1 seconds.2023-11-26T12:29:00.953-08:00, ns_orchestrator:0:info:message(ns_1@172.23.104.66) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.66','ns_1@172.23.105.179', 'ns_1@172.23.105.192','ns_1@172.23.121.71', 'ns_1@172.23.96.168','ns_1@172.23.96.196', 'ns_1@172.23.96.220','ns_1@172.23.96.221', 'ns_1@172.23.97.78'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = fdca67823a6a9fcf58980cb30da671932023-11-26T12:29:11.063-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.66) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket1", {error,wait_for_memcached_failed, ['ns_1@172.23.97.78']}}.Rebalance Operation Id = fdca67823a6a9fcf58980cb30da671932023-11-26T12:29:15.809-08:00, ns_orchestrator:0:info:message(ns_1@172.23.104.66) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.66','ns_1@172.23.105.179', 'ns_1@172.23.105.192','ns_1@172.23.121.71', 'ns_1@172.23.96.168','ns_1@172.23.96.196', 'ns_1@172.23.96.220','ns_1@172.23.96.221', 'ns_1@172.23.97.78'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = bdb1583cf32b0d8b3bb0d171033cca932023-11-26T12:29:25.973-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.66) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket1", {error,wait_for_memcached_failed, ['ns_1@172.23.97.78']}}.Rebalance Operation Id = bdb1583cf32b0d8b3bb0d171033cca93 |