[BP to 6.0.4] - couchstore node flushing doesn't respect node quota size

Description

Backport of MB-36424 to 6.0.4

Summary

When couchstore writes out modified node elements during document saves, it it supposed to limit the number of bytes written in a single node to a _chunk_threshold - by default 1279 bytes. If the node is larger than the limit it should be split into multiple sibling nodes.

However, this limit is not respected, resulting in overly-large nodes being written out. In the case of the by-seqno B-Tree (which always writes values to the rightmost leaf as seqnos are increasing), it results in all leaf elements residing in a single leaf node. Moreover, this means that adding another element to the B-Tree effectively re-writes the entire tree, resulting in massive Write Amplification.

Steps to Reproduce

Start 2-node cluster run (single replica):
./cluster_run --nodes=2
Start a SyncWrite workload; single threaded client updating the same 100,000 items (each key 10 times) with level=persistMajority:
./engines/ep/management/sync_repl.py localhost:12000 Administrator asdasd default loop_bulk_setD key value 100000 1000000 3
Observe the op/s and Write Amplification

Expected Results

Op/s should be broadly constant (given both key and seqno B-Tree should have a constant number of items in them).
Write amplification should also be broadly constant.

Actual Results

The op/s quickly drops from a peak of ~600 down to 150:
The Write Amplification increase (corresponding with the top in op/s) from 5.6x, up to 743x :warning:.
Both op/s and Write Amplification temporarily recover when compaction occurs, but same pattern is observed over time:

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Attachments

07 Nov 2019, 10:45 PM
07 Nov 2019, 10:45 PM

Linked issues

is a backport of

MB-36424

couchstore node flushing doesn't respect node quota size

relates to

MB-36249

SyncWrite throughput for persistence-related levels is ~30% slower than OBSERVE-based durability

Activity

Show:

CB robot June 9, 2020 at 9:08 AM

Build couchbase-server-7.0.0-2304 contains couchstore commit 143858d with commit message:
https://couchbasecloud.atlassian.net/browse/MB-36804#icft=MB-36804: Ensure flush_mr_partial() obeys node size quota

CB robot April 29, 2020 at 6:25 AM

Build couchbase-server-1006.5.1-1125 contains couchstore commit 143858d with commit message:
https://couchbasecloud.atlassian.net/browse/MB-36804#icft=MB-36804: Ensure flush_mr_partial() obeys node size quota

CB robot February 20, 2020 at 12:24 PM

Build couchbase-server-1006.5.1-1065 contains couchstore commit 143858d with commit message:
https://couchbasecloud.atlassian.net/browse/MB-36804#icft=MB-36804: Ensure flush_mr_partial() obeys node size quota

CB robot February 6, 2020 at 4:15 PM

Build couchbase-server-6.5.1-6127 contains couchstore commit 143858d with commit message:
https://couchbasecloud.atlassian.net/browse/MB-36804#icft=MB-36804: Ensure flush_mr_partial() obeys node size quota

CB robot November 8, 2019 at 11:12 AM

Build couchbase-server-6.0.4-3020 contains couchstore commit 143858d with commit message:
https://couchbasecloud.atlassian.net/browse/MB-36804#icft=MB-36804: Ensure flush_mr_partial() obeys node size quota

Fixed

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

Ritesh Agarwal

Reporter

Dave Rigby(Deactivated)

Is this a Regression?

Yes

Triage

Triaged

Priority

Critical

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created November 7, 2019 at 10:45 PM

Updated June 9, 2020 at 9:08 AM

Resolved November 8, 2019 at 10:23 AM

Configure

Instabug

[BP to 6.0.4] - couchstore node flushing doesn't respect node quota size

Description

Components

Affects versions

Fix versions

Labels

Environment

Link to Log File, atop/blg, CBCollectInfo, Core dump

Release Notes Description

Attachments

Linked issues

is a backport of

relates to

Activity

CB robot June 9, 2020 at 9:08 AM

CB robot April 29, 2020 at 6:25 AM

CB robot February 20, 2020 at 12:24 PM

CB robot February 6, 2020 at 4:15 PM

CB robot November 8, 2019 at 11:12 AM

Details

Assignee

Reporter

Is this a Regression?

Triage

Priority

Instabug

PagerDuty

Sentry

Zendesk Support