Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: feature-backlog
Affects Version/s: 7.0.3, 7.1.0
Component/s: ns_server, UI
Labels:
None

Story Points:
1

Description

As was seen in some CBSEs, when different nodes in the cluster have different disk performance, it can cause users to experience unexpected behaviours from different couchbase services which are hard to troubleshoot & explain. In one case, it resulted in periodic spikes in num_docs_pending to be indexed which can also potentially cause stale=false scans to time out. Often, it takes a long time to RCA and attribute reported symptoms to underlying disk issues and multiple teams need to get involved.

Would be good if this condition can be detected by the server automatically, an alert raised on the server UI (perhaps system events too), data collected in cbcollect and mortimer highlight the issue when a supportal snapshot is uploaded. This will help multiple stake holders (customers, support, engineering etc).

This is applicable to all couchbase services that store data. I added ns_server & UI as the components to begin with as I thought thats probably where we should start. We can add/modify more components as necessary after we spend sometime figuring out how to go about this platform wide.

Attachments

Issue Links

relates to

MB-34155 Support Auto-failover for exceptionally slow/hanging disks

Open

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Abhijeeth Nuthan

Reporter:: Jeelan Poola

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 17/Feb/22 11:30 PM

Updated:: 01/Aug/23 11:12 AM

Gerrit Reviews

There are no open Gerrit changes

Detect & Alert users on non-homogenous-disk-performance across nodes in the cluster

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty