Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.2.6, 7.6.4, Columnar 1.0.1
Affects Version/s: 7.2.6
Component/s: analytics
Labels:

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
GCP Server Logs :
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-001.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-006.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-007.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-008.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-009.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-010.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip

AWS Server Logs :
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-001.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-004.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-006.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-007.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-008.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-009.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip

Show
GCP Server Logs : https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-001.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-006.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-007.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-008.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-009.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-07-31T091521-ns_1%40svc-dqisea-node-010.xlxymlpegigmsjgu.sandbox.nonprod-project-avengers.com.zip AWS Server Logs : https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-001.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-004.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-006.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-007.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-008.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip https://cb-engineering.s3.amazonaws.com/Vipul/collectinfo-2024-08-08T095646-ns_1%40svc-dqisea-node-009.hworpry2fo4e2knj.sandbox.nonprod-project-avengers.com.zip
Story Points:
0
Is this a Regression?:
Unknown
Sprint:
Analytics Sprint 48

Description

Observed in RC-1 for 7.2.6

Test Steps :

Deployed a GCP cluster with the image =
couchbase-cloud-server-7-2-3-6705-v1-0-25
Loads Buckets, 1 scope and 5 collections onto it

Loads docs onto it so the CPU usage goes up

Triggered a Scale Out to 5 Nodes

After a successful ScaleOut, Triggered an upgrade to =
couchbase-cloud-server-7-2-6-8101-v1-0-34
Post successful upgrade, triggers a Scale-In back to 3 nodes.

Destroys the cluster.

Observations :

After the scale up step, the upgrade got triggered successfully
While upgrading, another node got added
But the cluster is stuck in a repetitive rebalancing state since then
6 nodes in the cluster (5 after scale-out & 1 for upgrade rebalance), (Attached)
One node is in the version 7.2.3-6705 while other 5 have upgraded to 7.2.6-8101

Logs attached below

Server Rebalance Failure Log :

Rebalance exited with reason {service_rebalance_failed,cbas, {worker_died, {'EXIT',<0.7948.630>, {rebalance_failed,

{service_error, <<"Rebalance e211f82a01482b692f206d47baaff1e9 failed: The MetadataNode failed to bind before the configured timeout (60 seconds); the MetadataNode was configured to run on NC: 457650eba6dca7dbe24354bf0809ba20">>}

}}}}. Rebalance Operation Id = 4eed05384a65d4ae403fe7d9f1cb443d

UPDATE (RC-2) :

The same issue has this time been observed on AWS (RC-2)
Server is stuck in a rebalancing state while it shows `upgrading` status on control plane
CSP : AWS
Deploy Version : 7.2.5
Deploy image : couchbase-cloud-server-7.2.5-7596-x86_64-v1.0.32
Upgrade version : 7.2.6
Upgrade image : couchbase-cloud-server-7.2.6-8103-x86_64-v1.0.34
Rebalance while upgrading has been stuck due to dead worker on cbas service.

Specific AWS Server Log :

Rebalance exited with reason {service_rebalance_failed,cbas, {worker_died, {'EXIT',<0.32519.440>, {rebalance_failed,

{service_error, <<"Rebalance 06a51d6ef1ef6f212c2a09d93bfe5982 failed: The MetadataNode failed to bind before the configured timeout (60 seconds); the MetadataNode was configured to run on NC: d41a1deddf76ef20fd12c784cd8afeca">>}

}}}}. Rebalance Operation Id = 7274b73c693b68ff270ebfd194f559dc

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2024-07-31-15-35-57-421.png
493 kB
31/Jul/24 3:05 AM
Screen Recording 2024-07-31 at 3.38.10 PM.mov
43.71 MB
31/Jul/24 3:09 AM

Issue Links

relates to

MB-62977 Analytics service down while upgrade

Closed

MB-63076 [System test][Analytics] Java.lang.NullPointerException

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Vipul Bhardwaj

Reporter:: Vipul Bhardwaj

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 31/Jul/24 3:12 AM

Updated:: 29/Aug/24 4:05 PM

Resolved:: 12/Aug/24 5:05 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 3 closed Gerrit changes

Hide There are 3 closed Gerrit changes

MB-62957: ping public (not listen) address of known nodes: Gerrit Review:

MB-62957: merge branch 'neo' into 'trinity': Gerrit Review:

MB-56704,MB-62957: merge branch 'trinity' into 'goldfish': Gerrit Review:

Cluster stuck in a repetitive rebalance while upgrade, Rebalance exited with reason {service_rebalance_failed,cbas The MetadataNode failed to bind before the configured timeout

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty