Details
-
Bug
-
Resolution: Fixed
-
Major
-
5.5.0
-
Triaged
-
No
Description
We have seen this issue in:
MB-29926on build 5.5.0-2814. Logs:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-8/172.23.96.5.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-8/172.23.96.7.zip
MB-29189on build 5.5.0-2852. Logs:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-10/172.23.96.5.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-10/172.23.96.7.zip
I'm looking at the second set of logs here (build 5.5.0-2852 from MB-29189).
This is what we see:
Note that image shows the memory usage on node .7, with the detail of vb_replica_checkpoint_memory on node .7 (which increases quickly, ~13GB at marker-1) and the vb_active_checkpoint_memory on node .5 (which spikes at first, then drops to ~0 at marker-1).
That is to visualise the correlation between what is happening on the Producer (node .5) and the consequence on the Consumer (node .7).
Note that on node .7 the resident-ratio drops to 0 at marker-1.
Looking at stats.log on node .7, I see that vb_197 has the highest memory usage among replica vbuckets (~130MB):
vb_197:last_closed_checkpoint_id: 8
|
vb_197:mem_usage: 130416317
|
vb_197:num_checkpoint_items: 292481
|
vb_197:num_checkpoints: 1
|
vb_197:num_conn_cursors: 1
|
vb_197:num_items_for_persistence: 0
|
vb_197:num_open_checkpoint_items: 292480
|
vb_197:open_checkpoint_id: 9
|
vb_197:persisted_checkpoint_id: 8
|
vb_197:persistence:cursor_checkpoint_id: 9
|
vb_197:persistence:cursor_seqno: 307792
|
vb_197:persistence:num_visits: 55876
|
vb_197:state: replica
|
So on replica:
- men_usage very high (~130MB)
- num_conn_cursors = 1 (expected on replica, we have only the persistence cursor)
- num_checkpoints = 1
- num_checkpoint_items = 292481 (very high, max allowed is 10000 if num_checkpoints < 2 )
Looking at the same vb_197 on the active (node .5):
vb_197:last_closed_checkpoint_id: 23
|
vb_197:mem_usage: 233
|
vb_197:num_checkpoint_items: 1
|
vb_197:num_checkpoints: 1
|
vb_197:num_conn_cursors: 1
|
vb_197:num_items_for_persistence: 0
|
vb_197:num_open_checkpoint_items: 0
|
vb_197:open_checkpoint_id: 24
|
vb_197:persisted_checkpoint_id: 23
|
vb_197:persistence:cursor_checkpoint_id: 24
|
vb_197:persistence:cursor_seqno: 312507
|
vb_197:persistence:num_visits: 186986
|
vb_197:state: active
|
So on the active:
- mem_usage is low (233 bytes)
- num_conn_cursors = 1 (unexpected, we have only the persistence cursor)
Where is the DCP cursor for the .5->.7 stream ? It must have been dropped, and never re-created.
Looking for vb_197 in memcached.log on node .5 (the Producer):
Paolos-MacBook-Pro:cbcollect_info_ns_1@172.23.96.5_20180605-203303 paoloc$ grep "vb 197" memcached.log
|
2018-06-05T11:10:01.002156Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615
|
2018-06-05T11:10:01.002160Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - ActiveStream::transitionState: (vb 197) Transitioning from backfilling to in-memory
|
2018-06-05T11:11:23.355657Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_2: 15 ms
|
2018-06-05T11:11:59.056878Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_1: 52 ms
|
2018-06-05T11:12:42.147576Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Handling slow stream; state_ : in-memory, lastReadSeqno : 25792, lastSentSeqno : 20460, vBucketHighSeqno : 42480, isBackfillTaskRunning : False
|
2018-06-05T11:12:45.309969Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_0: 15 ms
|
2018-06-05T11:12:59.031373Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_4: 75 ms
|
2018-06-05T11:13:21.281769Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - ActiveStream::transitionState: (vb 197) Transitioning from in-memory to backfilling
|
2018-06-05T11:13:21.281774Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Scheduling backfill from 25793 to 45072, reschedule flag : True
|
2018-06-05T11:13:33.508429Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Sending disk snapshot with start seqno 25793 and end seqno 45072
|
2018-06-05T11:14:05.105484Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Handling slow stream; state_ : backfilling, lastReadSeqno : 29888, lastSentSeqno : 29888, vBucketHighSeqno : 73792, isBackfillTaskRunning : True
|
2018-06-05T11:14:05.342435Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_2: 20 ms
|
2018-06-05T11:15:38.361410Z WARNING (bucket-1) Slow runtime for 'Item pager on vb 197' on thread nonIO_worker_0: 667 ms
|
2018-06-05T11:15:53.801698Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Backfill complete, 19280 items read from disk, 0 from memory, last seqno read: 45072, pendingBackfill : True
|
2018-06-05T11:15:53.802147Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Scheduling backfill from 45073 to 114784, reschedule flag : True
|
2018-06-05T11:16:04.788517Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Sending disk snapshot with start seqno 45073 and end seqno 114784
|
2018-06-05T11:17:35.111213Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_4: 33 ms
|
2018-06-05T11:17:35.790884Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Handling slow stream; state_ : backfilling, lastReadSeqno : 65552, lastSentSeqno : 64775, vBucketHighSeqno : 181808, isBackfillTaskRunning : True
|
2018-06-05T11:19:37.095283Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_1: 1123 ms
|
2018-06-05T11:19:37.095354Z WARNING (bucket-1) Slow scheduling for NON_IO task 'Checkpoint Remover on vb 197' on thread nonIO_worker_4. Schedule overhead: 1123 ms
|
2018-06-05T11:21:41.519494Z WARNING (bucket-1) Slow runtime for 'Checkpoint Remover on vb 197' on thread nonIO_worker_2: 519 ms
|
2018-06-05T11:22:03.869880Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Backfill complete, 88992 items read from disk, 0 from memory, last seqno read: 114784, pendingBackfill : True
|
2018-06-05T11:22:03.870543Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Scheduling backfill from 114785 to 307792, reschedule flag : True
|
2018-06-05T11:22:20.582212Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Sending disk snapshot with start seqno 114785 and end seqno 307792
|
2018-06-05T11:24:10.157944Z INFO 158: (bucket-1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.96.5->ns_1@172.23.96.7:bucket-1 - (vb 197) Handling slow stream; state_ : backfilling, lastReadSeqno : 143456, lastSentSeqno : 143456, vBucketHighSeqno : 312506, isBackfillTaskRunning : True
|
Thus, for vb_197:
- we create the stream from seqno 0 to maxSeqno
- we transition from backfill to in-memory
- "Handling slow stream" triggers
- we drop the replication cursor
- we transition from in-memory to backfill
- we schedule a backfill
- we send a disk snapshot
From that point, "Handling slow stream" re-triggers many times, we keep on sending disk snapshots and we never transition back to in-memory.
Many active vbuckets on node .5 show the same behaviour:
Paolos-MacBook-Pro:cbcollect_info_ns_1@172.23.96.5_20180605-203303 paoloc$ grep "Handling slow stream" memcached.log | wc -l
|
3485
|
Note that the same thing happens on the other direction (vb_replica_checkpoint_memory on node .5), which is where we enter the TempOOM phase and we never recover:
In general, for a replica vbucket we expect to close the current open checkpoint (and then deallocating memory) in two cases:
- when the Consumer receives snapshot_marker (it is the snapshot-start message sent by the Producer)
- when the Consumer receives the snapshot-end mutation
Looking at the code. PassiveStream::processMarker() (http://src.couchbase.org/source/xref/vulcan/kv_engine/engines/ep/src/dcp/stream.cc#2673):
2673 void PassiveStream::processMarker(SnapshotMarker* marker) {
|
2674 VBucketPtr vb = engine->getVBucket(vb_);
|
2675
|
2676 cur_snapshot_start.store(marker->getStartSeqno());
|
2677 cur_snapshot_end.store(marker->getEndSeqno());
|
2678 cur_snapshot_type.store((marker->getFlags() & MARKER_FLAG_DISK) ?
|
2679 Snapshot::Disk : Snapshot::Memory);
|
2680
|
2681 if (vb) {
|
2682 auto& ckptMgr = *vb->checkpointManager;
|
2683 if (marker->getFlags() & MARKER_FLAG_DISK && vb->getHighSeqno() == 0) {
|
2684 vb->setBackfillPhase(true);
|
2685 // calling setBackfillPhase sets the openCheckpointId to zero.
|
2686 ckptMgr.setBackfillPhase(cur_snapshot_start.load(),
|
2687 cur_snapshot_end.load());
|
2688 } else {
|
2689 if (marker->getFlags() & MARKER_FLAG_CHK ||
|
2690 vb->checkpointManager->getOpenCheckpointId() == 0) {
|
2691 ckptMgr.createSnapshot(cur_snapshot_start.load(),
|
2692 cur_snapshot_end.load());
|
2693 } else {
|
2694 ckptMgr.updateCurrentSnapshotEnd(cur_snapshot_end.load());
|
2695 }
|
2696 vb->setBackfillPhase(false);
|
..
|
We are interested in disk-snapshot, block 2683-2688. So, when the Consumer receives a snapshot-start for a disk-snaphot:
- we don't close the open checkpoint (we do only for memory-snapshot at line 2691 when we call CheckpointManager::createSnapshot())
- we vb->setBackfillPhase(true) at line 2684, but only if vb->getHighSeqno() == 0
Looking at the code for PassiveStream::handleSnapshotEnd() (http://src.couchbase.org/source/xref/vulcan/kv_engine/engines/ep/src/dcp/stream.cc#2715):
2715 void PassiveStream::handleSnapshotEnd(VBucketPtr& vb, uint64_t byseqno) {
|
2716 if (byseqno == cur_snapshot_end.load()) {
|
2717 auto& ckptMgr = *vb->checkpointManager;
|
2718 if (cur_snapshot_type.load() == Snapshot::Disk &&
|
2719 vb->isBackfillPhase()) {
|
2720 vb->setBackfillPhase(false);
|
2721 const auto id = ckptMgr.getOpenCheckpointId() + 1;
|
2722 ckptMgr.checkAndAddNewCheckpoint(id, *vb);
|
2723 } else {
|
2724 size_t mem_threshold = engine->getEpStats().mem_high_wat.load();
|
2725 size_t mem_used =
|
2726 engine->getEpStats().getEstimatedTotalMemoryUsed();
|
2727 /* We want to add a new replica checkpoint if the mem usage is above
|
2728 high watermark (85%) */
|
2729 if (mem_threshold < mem_used) {
|
2730 const auto id = ckptMgr.getOpenCheckpointId() + 1;
|
2731 ckptMgr.checkAndAddNewCheckpoint(id, *vb);
|
2732 }
|
..
|
Again, disk-snapshot at blocks 2718-2722 (1) and 2724-2732 (2). Note that:
- we do close the open checkpoint at line 2722 when we call CheckpointManager::checkAndAddNewCheckpoint(), but only if vb->isBackfillPhase() (line 2719). Given that in our scenario ("Handling slow stream" / Cursor Dropping) the Producer is sending disk-snapshots with vb-high-seqno>0, then we don't enter this block;
- we enter this block if cur_snapshot_type.load() == Snapshot::Disk and vb->isBackfillPhase()==false, which is the case in our scenario. We do close the open checkpoint at line 2731 when we call CheckpointManager::checkAndAddNewCheckpoint(), but only if mem_used > mem_threshold (line 2729).
Looking again at the charts above and given the logic in PassiveStream::handleSnapshotEnd(), it is evident that:
- mem_used < mem_threshold on node .7 . We miss the call to CheckpointManager::checkAndAddNewCheckpoint() at line 2731. The patch for
MB-30019fixes also this case; - mem_used > mem_threshold on node .5, but we don't release vb_replica_checkpoint_memory as expected.
I'm carrying on the investigation to explain the latter case.
Attachments
Issue Links
For Gerrit Dashboard: MB-30017 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
95561,4 | MB-30017: Add BackfillManager::buffer.nextReadSize to stats | vulcan | kv_engine | Status: ABANDONED | 0 | -1 |
95617,3 | MB-30017: Add BackfillManager::buffer.nextReadSize to stats | master | kv_engine | Status: MERGED | +2 | +1 |
95628,1 | MB-30017: DcpConsumer::notifyVbucketReady test | master | kv_engine | Status: ABANDONED | 0 | -1 |
95665,6 | MB-30017: Add the DcpConsumer buffered ready queue to stats | master | kv_engine | Status: MERGED | +2 | +1 |
95689,1 | MB-30017: Disconnect if mem_used > repl_thro_threshold | master | kv_engine | Status: ABANDONED | 0 | -1 |
95754,4 | MB-30017: Check that we close replica-checkpoint on disk-snapshot-end | master | kv_engine | Status: MERGED | +2 | +1 |