Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 7.6.0, 7.2.1
Affects Version/s: 7.2.1
Component/s: secondary-index
Labels:
Environment:
couchbase-cloud-server-7.2.1-5904-v1.0.19

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:
https://supportal.couchbase.com/snapshot/ba8b47cb2fff5d38bb8d4c1ecb8066c7::1
Story Points:
0
Is this a Regression?:
Unknown

Description

Test that led to failure:

Create a 7 node cluster(c5.2xlarge), KV-3, GSI-2, N1QL-2
Create bucket, 10 collections, 100M items in each.
Create GSI indexes on 2 collections.
Start KV Read+expiry load at 10k ops/s(9k Reads, 1k Expiry). Start the n1ql query load in parallel.
Scale up cluster to 4-KV, 3-GSI & 3-N1QL. Wait for rebalance to finish.
Turn cluster off and back on.
Scale up cluster to 5-KV, 4-GSI & 4-N1QL.Wait for rebalance to finish.
Turn cluster off and back on.
Scale down cluster to 4-KV, 3-GSI & 3-N1QL.Wait for rebalance to finish.
Turn cluster off and back on.
Scale down cluster to 3-KV, 2-GSI & 2-N1QL.Wait for rebalance to finish.
Turn cluster off and back on.
Do a EBS volume up scaling. Wait for rebalance to finish.
Turn cluster off and back on.
Do a EBS volume down scaling.

Rebalance seems to be failing because of node getting failed over many times.

2023-08-15T19:52:21.953Z, ns_orchestrator:0:critical:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Rebalance exited with reason {service_rebalance_failed,index,                                 {agent_died,<34620.2990.0>,noconnection}}.Rebalance Operation Id = b3707548ffa4d611203aed50aeb1510c2023-08-15T19:52:22.020Z, failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Starting failing over ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']2023-08-15T19:52:22.020Z, ns_orchestrator:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Starting failover of nodes ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']. Operation Id = 3b459ed48506c12925d1739dcd7afcce2023-08-15T19:52:22.139Z, failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Failed over ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']: ok2023-08-15T19:52:24.146Z, failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Deactivating failed over nodes ['ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com']2023-08-15T19:52:24.294Z, ns_orchestrator:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Failover completed successfully.Rebalance Operation Id = 3b459ed48506c12925d1739dcd7afcce2023-08-15T19:52:24.355Z, auto_failover:0:info:message(ns_1@svc-d-node-013.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com) - Node ('ns_1@svc-i-node-016.qgsopockw4jhf3qd.sandbox.nonprod-project-avengers.com') was automatically failed over. Reason: The cluster manager did not respond for the duration of the auto-failover threshold.

Not sure if this is a duplicate of/related to ~~MB-57814~~

Attachments

Activity

People

Assignee:: Pavan PB

Reporter:: Mohsin Ahmed

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 15/Aug/23 11:29 PM

Updated:: 28/Aug/23 1:55 AM

Resolved:: 18/Aug/23 4:00 PM

Indexer rebalance stuck for more than 17 hours

Details

Description

Attachments

Activity

People

Dates

PagerDuty