Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Untriaged
-
0
-
Yes
-
KV 2023-4, Magma-Jan18-2024
Description
What is the issue?
It appears that when asking for sequential backfills KV is running multiple at a time. On a node with three connections it appears we are doing 192 - 64 per connection. The expectation is we should only be doing one per connection.
Cbbackupmgr context
In cbbackupmgr we use sequential backfills to ensure that we are only sent (roughly, see MB-39503) one vBucket at a time per connection. This is very important when backing up to cloud - which is what we do in Capella - as we have to buffer mutations in memory until we have 5MB to send because this is the minimum part size.
On AV-68082 it was found that on 7.6.0-1834 cbbackupmgr memory usage is greatly increased. From the logs most of the memory was allocated in Rift where we create a 5MB buffer per vBucket. Sequential backfill should ensure we only have one per connection, and we were opening just 9 connections. Looking at the progress output from cbbackupmgr we saw that we had ~620 snapshot markers but only ~40 closed streams. Snapshot markers are only sent when a backfill is started which implies that ~580 backfills were ongoing.
Looking at dcp_num_running_backfills for one of the nodes I found that 192 backfills were in progress during the backup.
Logs
https://supportal.couchbase.com/snapshot/f9013fb24c5abd05808924b77ebd4dda::1
Backup took place ~10:40 on 28th Nov
KV nodes:
- s3://cb-customers-secure/my-organization/2023-11-28/collectinfo-2023-11-28t105251-ns_1@svc-d-node-001.6iesxqocqafwm6k.sandbox.nonprod-project-avengers.com-redacted.zip
- s3://cb-customers-secure/my-organization/2023-11-28/collectinfo-2023-11-28t105251-ns_1@svc-d-node-002.6iesxqocqafwm6k.sandbox.nonprod-project-avengers.com-redacted.zip
- s3://cb-customers-secure/my-organization/2023-11-28/collectinfo-2023-11-28t105251-ns_1@svc-d-node-003.6iesxqocqafwm6k.sandbox.nonprod-project-avengers.com-redacted.zip