Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: couchbase-bucket
Labels:
None

Story Points:
1

Description

Part 3 of backup improvement proposals...

In the analysis of the backfill horror show seen in ~~MB-52923~~ with a small part of the whole is discussed in this comment another improvement is possible.

Background:

A single disk backfill is a multi-stage operation and in neo it consists of 3 stages (2 in master following clean-up from MB-35297).

create - this could be abstractly thought of as handle = open(vbucket file)
- some metadata is also read at this point, the high-seqno of the file is read and used to register a checkpoint cursor
- Stage always runs once
scan - this is a loop of document = read(handle)
- The scan stage iterates the seqno (or key) index, reading keys or documents and pushing them to the DCP stream ready for the front-end to drain/send the queue of data. This stage can be interrupted and later resumed if memory pressure is high.
done - this is close(handle)
- stage always runs once and after this the backfill object/task is removed from all queues and destroyed.

In ~~MB-52923~~ the system is very busy with many backfills competing for an AuxIO thread to run through their stages.

What is seen is for many backfills is the following pattern (Tx=time x)

T0: create, register a cursor at seqno X (and the vbucket high-seqno is X or close to X).
T1: scan, reading items from 0 to X (not always 0 for the start).

However due to the system busyness T0 and T1 are far apart in real time and because T0 placed a cursor at X, the real high-seqno has moved on, e.g. we see tens of minutes between create and scan.

The following then happens:

T0: create, register a cursor at seqno X (and the vbucket high-seqno is X or close to X).
T1: real high-seqno now Y, must cursor drop the DCP stream - another backfill is now needed from X to Y
T2: scan, reading items from 0 to X (not always 0 for the start).

In ~~MB-52923~~, this pattern just repeats and some backfills seem perpetually stuck in this cycle.

This MB proposes that we do not break DCP backfills like this, it's not clear that there is a benefit in having separate create and scan phases, all this allows to happen is for the create to place a cursor which drifts away from the real high-seqno depending on how much time is between the two phases.

If we fold all stages of a backfill into one stage then

we'll reduce the life-span of a backfill => reduce pressure on the backfill manager.
reduce how long we have disk snapshots open (reducing disk usage)

When a backfill gets time on a thread it:

opens snapshot
registers a cursor
scans the index
- The scan loop can still be interrupted if memory pressure requires it, so the task can still yield and resume the scan, it doesn't re-open etc...

Now it's at a much lower risk of being cursor dropped and stands a much better chance of switching to in-memory streaming

Attachments

Issue Links

causes

MB-59855 Multiple backfills running when asking for sequential order in cbbackupmgr

Closed

relates to

MB-52923 [System Test][CBBS] three tasks failed with error - Cannot dispatch task as the task is already marked as running and failed the check {"err": "task is already running or waiting to run"}

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Jim Walker

Reporter:: Jim Walker

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Jul/22 1:26 AM

Updated:: 20/Dec/23 2:51 AM

Resolved:: 15/Aug/22 7:55 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-52975: Fold backfill create and scan into one invocation of run: Gerrit Review:

MB-59855: Do not DCPBackfill::scan() on create for Sequential backfills: Gerrit Review:

Fold all stages of a backfill into one run of backfill task

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty