Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
7.6.0
-
None
-
Triaged
-
0
-
No
-
March-June 24
Description
The following is enforced for in-memory streams in their TakeoverSend phase:
"dcp_takeover_max_time": {
|
"default": "60",
|
"descr": "Max amount of time for takeover send (in seconds) after which front end ops would return ETMPFAIL",
|
"dynamic": true,
|
"type": "size_t"
|
},
|
But, we fail to backup frontend loads if any memory-pressure state on the node pushes the stream back to backfill continuously.
Eg, this the ActiveStream on a local cluster_run node:
024-02-26T15:56:35.951749+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Creating takeover stream with start seqno 651 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, flags:1, snapshot:{651,651} filter:{passthrough, size:0}, sid:none
|
2024-02-26T15:56:35.951792+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example" lastProcessedSeqno:651, result{tryBackfill:true, op:empty, seqno:708, nextSeqno:7
|
08}
|
2024-02-26T15:56:35.951858+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Scheduling backfill from 652 to 707, uid:6, reschedule flag : False
|
2024-02-26T15:56:35.952259+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::markDiskSnapshot: Sending disk snapshot with start:651, end:707, flags:0x6, flagsDecoded:[DISK,CHK], hcs:0, mvs:707, lastBackfilledSeqno:0
|
2024-02-26T15:56:35.952287+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::registerCursor name "eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example", lastProcessedSeqno:707, pendingBackfill:false, result{tryBackfill:false, op:empty, seqno:708, nextSeqno:708}
|
2024-02-26T15:56:36.031031+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Handling slow stream; state_ : backfilling, lastReadSeqno : 651, lastSentSeqno : 678, vBucketHighSeqno : 708, curChkSeqno: 708, isBackfillTaskRunning : True
|
2024-02-26T15:56:36.245266+01:00 INFO (example) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2024-02-26T15:56:37.273116+01:00 INFO (example) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2024-02-26T15:56:40.666761+01:00 INFO (example) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2024-02-26T15:56:40.791046+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Backfill complete. 10 items consisting of 201010720 bytes read from disk, 0 items from memory, lastReadSeqno:707, lastSentSeqnoAdvance:0, lastSentSnapStartSeqno:651, lastSentSnapEndSeqno:707, pendingBackfill:True
|
, numBackfillPauses:10. Total runtime 315 ms (31 item/s, 608 MB/s)
|
2024-02-26T15:56:40.850688+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example" lastProcessedSeqno:707, result{tryBackfill:true, op:empty, seqno:749, nextSeqno:7
|
49}
|
2024-02-26T15:56:40.850930+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Scheduling backfill from 708 to 748, uid:7, reschedule flag : True
|
2024-02-26T15:56:40.851306+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::markDiskSnapshot: Sending disk snapshot with start:708, end:749, flags:0x6, flagsDecoded:[DISK,CHK], hcs:0, mvs:749, lastBackfilledSeqno:707
|
2024-02-26T15:56:40.851335+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::registerCursor name "eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example", lastProcessedSeqno:749, pendingBackfill:false, result{tryBackfill:false, op:mutation, seqno:749, nextSeqno:750}
|
2024-02-26T15:56:40.929715+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Handling slow stream; state_ : backfilling, lastReadSeqno : 707, lastSentSeqno : 707, vBucketHighSeqno : 750, curChkSeqno: 750, isBackfillTaskRunning : True
|
2024-02-26T15:56:50.817072+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Backfill complete. 20 items consisting of 201010720 bytes read from disk, 0 items from memory, lastReadSeqno:749, lastSentSeqnoAdvance:0, lastSentSnapStartSeqno:708, lastSentSnapEndSeqno:749, pendingBackfill:True
|
, numBackfillPauses:10. Total runtime 357 ms (55 item/s, 536 MB/s)
|
2024-02-26T15:56:50.902340+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example" lastProcessedSeqno:749, result{tryBackfill:true, op:empty, seqno:836, nextSeqno:8
|
36}
|
2024-02-26T15:56:50.902547+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Scheduling backfill from 750 to 835, uid:8, reschedule flag : True
|
2024-02-26T15:56:50.903441+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::markDiskSnapshot: Sending disk snapshot with start:750, end:836, flags:0x6, flagsDecoded:[DISK,CHK], hcs:0, mvs:836, lastBackfilledSeqno:749
|
2024-02-26T15:56:50.903538+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) ActiveStream::registerCursor name "eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example", lastProcessedSeqno:836, pendingBackfill:false, result{tryBackfill:false, op:mutation, seqno:836, nextSeqno:837}
|
2024-02-26T15:56:51.067382+01:00 INFO 69: (example) DCP (Producer) eq_dcpq:replication:n_0@192.168.100.104->n_1@127.0.0.1:example - (vb:1) Handling slow stream; state_ : backfilling, lastReadSeqno : 749, lastSentSeqno : 749, vBucketHighSeqno : 837, curChkSeqno: 837, isBackfillTaskRunning : True
|
2024-02-26T15:56:51.972161+01:00 INFO (example) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2024-02-26T15:56:52.978802+01:00 INFO (example) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2024-02-26T15:56:53.978557+01:00 INFO (example) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2024-02-26T15:56:55.782781+01:00 INFO (example) DCP backfilling task temporarily suspended because the current memory usage is too high
|
..
|
<endless CursorDrop + Backfill>
|
..
|