Details
-
Bug
-
Status: Closed
-
Test Blocker
-
Resolution: Duplicate
-
Cheshire-Cat
-
Triaged
-
1
-
Unknown
Description
Probem
Magma backup run failed on build 7.0.0-4797. The backup process was running for 14 hours until I killed it. http://perf.jenkins.couchbase.com/job/rhea-5node2/952/
07:01:02 2021-03-27T07:01:02 [INFO] Running: ./opt/couchbase/bin/cbbackupmgr backup --archive /workspace/backup --repo default --host [http://172.23.97.26|http://172.23.97.26/] --username Administrator --password password --threads 16 --storage sqlite
|
|
21:39:19 Build was aborted
|
I am able to reproduce this issue on VMs. During my investigation, my runs hit two different issues. I have collected logs and uploaded.
- backup was stuck:* Backup progress reached 99% and then stuck. I waited for several minutes and didn't see any progress. The run above should hit this issue. That's why it's running for 14 hours. The log file for this issue is cbbackupmgr-collectinfo-backup-2021-03-29t115703.zip.
root@ubuntu:/tmp/magma_backup# ./opt/couchbase/bin/cbbackupmgr backup --archive /tmp/backup --repo default --host [http://172.23.105.72|http://172.23.105.72/] --username Administrator --password password --threads 16 --storage sqlite
|
|
Backing up to '2021-03-29T11_48_51.331337778-07_00'
|
|
Transferring key value data for bucket 'bucket-1' at 123.39KiB/s (about 2s remaining) 99655 items / 31.87MiB
|
|
[=================================================================================================================================================================================================================================== ] 99.13%
|
|
^Cinterrupt
|
Steps to reproduce
Expectation
For backup to be successful
Attachments
Issue Links
- duplicates
-
GOCBC-1073 DCP can get stuck during closing
-
- Resolved
-
- relates to
-
MB-45322 [Magma] Backup performance runs hit operation timedout on 7.0.0-4797
-
- Closed
-
Bo-Chun Wang As there are two different issues here, can you open a second defect please as we do not want to confuse things.
Focusing on the logs for 2021-03-29T115703, the last message before the program was terminated was:
2021-03-29T11:51:00.369-07:00 WARN: (DCP) (bucket-1) (vb 915) Stream has been inactive for 1m0s, last seqno 46 -- couchbase.(*DCPAsyncWorker).monitorActivity.func1() at dcp_async_worker.go:252
2021-03-29T11:51:00.369-07:00 WARN: (DCP) (bucket-1) (vb 647) Stream has been inactive for 1m0s, last seqno 68 -- couchbase.(*DCPAsyncWorker).monitorActivity.func1() at dcp_async_worker.go:252
2021-03-29T11:51:00.369-07:00 WARN: (DCP) (bucket-1) (vb 835) Stream has been inactive for 1m0s, last seqno 44 -- couchbase.(*DCPAsyncWorker).monitorActivity.func1() at dcp_async_worker.go:252
2021-03-29T11:51:00.369-07:00 WARN: (DCP) (bucket-1) (vb 739) Stream has been inactive for 1m0s, last seqno 84 -- couchbase.(*DCPAsyncWorker).monitorActivity.func1() at dcp_async_worker.go:252
2021-03-29T11:51:00.369-07:00 WARN: (DCP) (bucket-1) (vb 1010) Stream has been inactive for 1m0s, last seqno 52 -- couchbase.(*DCPAsyncWorker).monitorActivity.func1() at dcp_async_worker.go:252
2021-03-29T11:51:00.369-07:00 WARN: (DCP) (bucket-1) (vb 76) Stream has been inactive for 1m0s, last seqno 56 -- couchbase.(*DCPAsyncWorker).monitorActivity.func1() at dcp_async_worker.go:252
2021-03-29T11:53:20.987-07:00 (Signal Handler) Signal `interrupt` received, exiting
goroutine 35 [running]:
This suggests that cbbackupmgr has not hanged and is wait on data for these vBuckets.
Following vb 76 we can see that the stream was opened as follows:
2021-03-29T11:49:00.388-07:00 (DCP) (bucket-1) (vb 76) Creating DCP stream | {"uuid":26490595435443,"start_seqno":0,"end_seqno":102,"snap_start":0,"snap_end":0,"retries":0}
At the time cbbackupmgr detected that no traffice has been send for vb 76 for one minute and the last seqno it had seen was 56.
There can be a number of cause for this, including network issues and server side problem. Can you please provide the server side logs so we can investigate this further.