Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: Ionic
Affects Version/s: Columnar 1.0.0
Component/s: analytics
Labels:
- columnar
- rc3
- triaged
- volume-test
Environment:
2239

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:
rebl_failure → http://supportal.couchbase.com/snapshot/e7c6d3db44608dd76dc319bb2632f13b::0
Story Points:
0
Is this a Regression?:
Unknown

Description

Create a 32 node columnar cluster. Ingest 1B items per remote collection in 20 collections.
Disconnect previous link and create new link and 20 more collections.
Start scaling operations from 32 -> 16 -> 8 -> 4 -> 2 -> 4
Problems started occurring from the last scaling.
Rebalance failed but CP reports Healthy
Server

CP:
CP triggered another rebalance. Not sure why but i guess its due to balance:False state

Rebalance successful and CP kept on re-triggering.

Rebalance exited with reason {service_rebalance_failed,cbas,

{worker_died,

{'EXIT',<0.20475.186>,

{task_failed,rebalance,

{service_error,

<<"Rebalance 57dd2c9beb18711bfe37466f50599b4c failed: The MetadataNode failed to bind before the configured timeout (300 seconds); the MetadataNode was configured to run on NC: svc-da-node-079.bz1ehzjwbivhxhz.sandbox.nonprod-project-avengers.com:8091 (d02ec1dcbb6e15eb047ccef1fd0c1db8)">>}}}}}.

Rebalance Operation Id = 04c227ae8956515df7fe64cb31359d43

After 4 successful no-op rebalances finally backend cluster settles down. All this while scaling operation of CP was resulting in 422 error because cluster state healthy allow us trigger next scaling while the cluster is already rebalancing in the bg.

Issues to identify:

Why did the rebalance failed?
If the rebalance is failed and CP is retrying then why does CP reports Healthy state?
Why is CP retrying when rebalance passed on the first rebalance retry attempt?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2024-07-26-16-37-08-368.png
624 kB
26/Jul/24 4:37 PM
image-2024-07-26-16-42-19-048.png
587 kB
26/Jul/24 4:42 PM
image-2024-07-26-16-43-14-855.png
165 kB
26/Jul/24 4:43 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Michael Blow

Reporter:: Ritesh Agarwal

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 26/Jul/24 4:56 PM

Updated:: 29/Jul/24 11:04 AM

Gerrit Reviews

There are no open Gerrit changes

[20TB]: Rebalance failed from 2->4 node scaling. On CP retry rebalance was successful but isBalanced is False results in further more rebalances while CP state Healthy.

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty