With memory used being so high, a number of "safety' systems have kicked in, overall replication has been massively slowed down.
On one node (cbcollect_info_ns_1@172.23.97.39_20190910-055023) we see that it is very slowly supplying DCP from a disk backfill, here we can clearly see that the backfill is being paused due to the lack of memory for it to bring data in from disk.
2019-09-09T22:33:26.328717-07:00 INFO (bucket-1) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2019-09-09T22:33:27.328810-07:00 INFO (bucket-1) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2019-09-09T22:33:28.328953-07:00 INFO (bucket-1) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2019-09-09T22:33:29.329039-07:00 INFO (bucket-1) DCP backfilling task temporarily suspended because the current memory usage is too high
|
2019-09-09T22:33:30.329078-07:00 INFO (bucket-1) DCP backfilling task temporarily suspended because the current memory usage is too high
|
We're backfilling because the severe lack of memory has caused DCP to drop cursors
2019-09-09T22:33:21.000737-07:00 INFO (bucket-1) Triggering memory recovery as checkpoint_memory (3783 MB) exceeds cursor_dropping_checkpoint_mem_upper_mark (50%, 3072 MB). Attempting to free 4256 MB of memory.
|
The same node is also unable to consume items
2019-09-09T22:33:03.743991-07:00 WARNING 323: (bucket-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.97.38->ns_1@172.23.97.39:bucket-1 - vb:80 Got error 'no memory' while trying to process mutation with seqno:569034
|
2019-09-09T22:33:03.789213-07:00 WARNING 323: (bucket-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.97.38->ns_1@172.23.97.39:bucket-1 - vb:70 Got error 'no memory' while trying to process mutation with seqno:546229
|
2019-09-09T22:33:03.848016-07:00 WARNING 323: (bucket-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.97.38->ns_1@172.23.97.39:bucket-1 - vb:2 Got error 'no memory' while trying to process mutation with seqno:563000
|
2019-09-09T22:33:03.854111-07:00 WARNING 323: (bucket-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.97.38->ns_1@172.23.97.39:bucket-1 - vb:18 Got error 'no memory' while trying to process mutation with seqno:557148
|
2019-09-09T22:33:03.918303-07:00 WARNING 323: (bucket-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.97.38->ns_1@172.23.97.39:bucket-1 - vb:20 Got error 'no memory' while trying to process mutation with seqno:555992
|
The other node is similar.
Note replication is possibly progressing, it's happening slowly as the various checks to avoid going above the bucket quota in the system keep pausing and retrying.
E.g. progress looks to be happening e.g. vb:20 logged 'no memory' for seqno:555992, but at the cbcollect time we can see in stats.log that vb:20 is now at 556321
Hey Jim Walker Although the test was on a magma backend - and so not supported in MH. Could you take a look just to ensure that its not uncovering an issue we might hit in MH, with couchstore backend? thanks