Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
7.2.1
-
couchbase-cloud-server-7.2.1-5904-v1.0.19
-
Untriaged
-
0
-
Unknown
Description
Test that led to failure:
- Create a 7 node cluster(c5.2xlarge), KV-3, GSI-2, N1QL-2
- Create bucket, 10 collections, 100M items in each.
- Create GSI indexes on 2 collections.
- Start KV Read+expiry load at 10k ops/s(9k Reads, 1k Expiry). Start the n1ql query load in parallel.
- Scale up cluster to 4-KV, 3-GSI & 3-N1QL. Wait for rebalance to finish.
- Turn cluster off and back on.
- Scale up cluster to 5-KV, 4-GSI & 4-N1QL.Wait for rebalance to finish.
- Turn cluster off and back on.
- Scale down cluster to 4-KV, 3-GSI & 3-N1QL.Wait for rebalance to finish.
- Turn cluster off and back on.
- Scale down cluster to 3-KV, 2-GSI & 2-N1QL.Wait for rebalance to finish.
- Turn cluster off and back on.
- Do a EBS volume up scaling. Wait for rebalance to finish.
- Turn cluster off and back on.
- Do a EBS volume down scaling.
Rebalance seems to be failing because of node getting failed over many times.
2023-08-15T19:52:21.953Z, ns_orchestrator:0:critical:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Rebalance exited with reason {service_rebalance_failed,index, {agent_died,<34620.2990.0>,noconnection}}.Rebalance Operation Id = b3707548ffa4d611203aed50aeb1510c2023-08-15T19:52:22.020Z, failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Starting failing over ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']2023-08-15T19:52:22.020Z, ns_orchestrator:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Starting failover of nodes ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']. Operation Id = 3b459ed48506c12925d1739dcd7afcce2023-08-15T19:52:22.139Z, failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Failed over ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']: ok2023-08-15T19:52:24.146Z, failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Deactivating failed over nodes ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']2023-08-15T19:52:24.294Z, ns_orchestrator:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Failover completed successfully.Rebalance Operation Id = 3b459ed48506c12925d1739dcd7afcce2023-08-15T19:52:24.355Z, auto_failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Node ('ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com') was automatically failed over. Reason: The cluster manager did not respond for the duration of the auto-failover threshold. |
Not sure if this is a duplicate of/related to MB-57814