Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.2.4
Affects Version/s: 7.2.3
Component/s: analytics
Labels:
- approved-for-7.2.4
- triaged

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown
Sprint:
Analytics Sprint 32

Description

Found this server issue in during 7.2.3 Capella testing -

An aws cluster with ami - couchbase-cloud-server-7.2.3-6705-x86_64-v1.0.24 failed to rebalance after a node was randomly killed and a new node spawned up but the rebalance failed.

recurring error in the server logs-
Rebalance exited with reason {service_rebalance_failed,cbas, {worker_died, {'EXIT',<0.22011.134>, {rebalance_failed,

{service_error, <<"Rebalance 1e1b4d83a9b8b3d3f1ef591ad3186b91 failed: CBAS0001: Analytics collection `travel-sample`.inventory.airport in different partitions has different DCP states. Seqno gap = 2. User action: Try again later">>}

}}}}.

cluster can be found here - https://ui.sbx-3.sandbox.nonprod-project-avengers.com/database/datatools?oid=259d212d-002f-40cb-9d87-dcc138110c8c&pid=42270d0b-d978-4f19-b3d1-c833193668fc&dbid=acc8336e-afcf-46d3-bec1-8acafa6dd124

dd logs - https://app.datadoghq.com/logs?query=%40clusterId%3Aacc8336e-afcf-46d3-bec1-8acafa6dd124 &cols=host%2Cservice&index=*&messageDisplay=inline&refresh_mode=sliding&stream_sort=desc&viz=stream&from_ts=1699521768839&to_ts=1699525368839&live=true
server logs -
https://cb-engineering.s3.amazonaws.com/Aman/collectinfo-2023-11-09T095131-ns_1%40svc-dqisea-node-001.t9msz82bn5isouab.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/Aman/collectinfo-2023-11-09T095131-ns_1%40svc-dqisea-node-002.t9msz82bn5isouab.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/Aman/collectinfo-2023-11-09T095131-ns_1%40svc-dqisea-node-003.t9msz82bn5isouab.sandbox.nonprod-project-avengers.com.zip

https://cb-engineering.s3.amazonaws.com/Aman/collectinfo-2023-11-09T095131-ns_1%40svc-dqisea-node-004.t9msz82bn5isouab.sandbox.nonprod-project-avengers.com.zip

corresponding av - https://couchbasecloud.atlassian.net/browse/AV-67126

Attachments

Issue Links

is duplicated by

MB-62348 Cluster stuck in repetitive rebalance state

Resolved

is triggering

MB-62726 Analytics rebalance during upgrade to 7.6.2 is continuously failing

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-59582
#	Subject	Branch	Project	Status	CR	V
201900,2	MB-59582: short circuit waitForSeqnos w/ length 0	neo	analytics-dcp-client	Status: MERGED	+2	+1
201901,7	MB-59582: disregard seqno differences > collection high seqno in kv master	neo	cbas-core	Status: MERGED	+2	+1

Activity

People

Assignee:: Aman Srivastava

Reporter:: Aman Srivastava

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Due:: 01/Dec/23

Created:: 09/Nov/23 9:56 PM

Updated:: 15/Jul/24 5:53 AM

Resolved:: 01/Dec/23 7:16 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-59582: short circuit waitForSeqnos w/ length 0: Gerrit Review:

MB-59582: disregard seqno differences > collection high seqno in kv master: Gerrit Review:

Out of sync DCP states may cause Analytics service to fail rebalances indefinitely

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty