Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Critical
Fix Version/s: 5.5.0
Affects Version/s: 2.5.1, 3.1.0, 4.0.0
Component/s: couchbase-bucket
Security Level: Public
Labels:

Description

Customer added 6 nodes to a large cluster in an attempt to increase the overall percentage of active bucket data in cache, and observed that the active bucket residency decreased after rebalancing. Decrease in active data residency after adding nodes and rebalancing turns out to be reproducible.

To reproduce this anomaly, create an 8-node cluster with RAM quota of 100Mbytes per node and populate the default bucket until active percent in memory is about 40%. (I used cbWorkLoadGen and inserted 300K items into the default bucket, specifying an item size of 2K bytes and enabling the -j (JSON) option.) Add 3 nodes to this cluster and rebalance. The resulting default active memory residency percentage will drop significantly and the replica residency percentage will increase. Note that if 3 random nodes are then removed and rebalanced and then added back and rebalanced again, active residency will increase beyond the initial level.

The critical factor in reproducing this anomaly is that the bucket data size must exceed its RAM quota such that the majority of bucket data resides on disk at any given time. When nodes are added to the cluster, the subsequent rebalance results in entire vbuckets read from disk on 1 node and dumped to cache on the receiving node via TAP protocol. Eventually, the node high-water mark will be exceeded and ejections occur. What is consistently observable is that active ejections occur at a greater rate than replica ejections and results in a decreased active bucket residency percentage and an increased replica bucket residency percentage.

Possible workarounds include adding/rebalancing nodes in stages, e.g., instead of adding 6 nodes to a cluster at once, add 3 nodes, rebalance than add 3 more nodes and rebalance again. A 2nd potential workaround would be to alter the default ejection probabilities for replica and active data to reduce the probability of ejecting active data and increase the probability of ejecting replica data. I have not had time to test these possible workarounds.

After discussion in the Support group, our thinking is that any configuration change which is enabled with the intention of improving performance should not result in worsened performance, but that is what can happen in this case. Accordingly we believe that this is a bug and that the rebalancing algorithm should be examined to figure out why - under certain circumstances - rebalancing can cause a higher probability of active data to be ejected .

Attachments

Issue Links

relates to

MB-26705 DGM Rebalance (Add Node) Causes Client TMP_OOM errors

Closed

MB-22010 Enhance fidelity of ep-engine's LRU

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: David Haikney (Inactive)

Reporter:: Morrie Schreibman (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 22/Jul/14 2:57 AM

Updated:: 11/Jun/18 6:22 AM

Resolved:: 11/Jun/18 6:22 AM

Gerrit Reviews

There are no open Gerrit changes

Adding Nodes To A Cluster Can Result In Reduced Active Residency Percentages

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty