Details
-
Bug
-
Status: Closed
-
Test Blocker
-
Resolution: Duplicate
-
Cheshire-Cat
-
Untriaged
-
1
-
Unknown
Description
In our performance runs, we saw backup failed because operation has timed out.
Build: 7.0.0-4797
root@ubuntu:/tmp/magma_backup# ./opt/couchbase/bin/cbbackupmgr backup -archive /tmp/backup --repo default --host http://172.23.105.72 -username Administrator --password password --threads 16 --storage sqlite
Backing up to '2021-03-29T11_57_52.872118914-07_00'
Transferring key value data for bucket 'bucket-1' at 0B/s (about 1h59m42s remaining) 0 items / 0B
[== ] 1.06%
Error backing up cluster: operation has timed out
Logs
Server logs:
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.72.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.73.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.78.zip
Attachments
Issue Links
- duplicates
-
GOCBC-1073 DCP can get stuck during closing
-
- Resolved
-
- relates to
-
MB-45320 [Magma] Backup performance tests failed on 7.0.0-4797
-
- Closed
-
The error in the backup log is
2021-03-29T11:57:58.029-07:00 WARN: (Couchbase) Unexpected error 'operation timed out after 5s' while trying to get sequence numbers, will retry in 5s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:40
2021-03-29T11:58:09.070-07:00 (Plan) (Data) Successfully decided which key value data to transfer for bucket 'bucket-1' | {"number":11,"duration":"16.106882723s"}
2021-03-29T11:58:09.070-07:00 (Plan) (Data) Transferring new key value data for bucket 'bucket-1'
2021-03-29T11:58:09.070-07:00 (REST) (Attempt 1) (GET) Dispatching request to 'http://172.23.105.72:8091/pools/default/buckets/bucket-1'
2021-03-29T11:58:09.079-07:00 (REST) (Attempt 1) (GET) (200) Received response from 'http://172.23.105.72:8091/pools/default/buckets/bucket-1'
2021-03-29T11:58:09.079-07:00 (REST) (Attempt 1) (GET) Dispatching request to 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes'
2021-03-29T11:58:09.083-07:00 (REST) (Attempt 1) (GET) (200) Received response from 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes'
2021-03-29T11:58:09.084-07:00 (REST) (Attempt 1) (POST) Dispatching request to 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes/%40ensureManifest/0'
2021-03-29T11:58:09.090-07:00 (REST) (Attempt 1) (POST) (200) Received response from 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes/%40ensureManifest/0'
2021-03-29T11:59:10.716-07:00 (Cmd) Error backing up cluster: failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket-1': failed to transfer bucket data for bucket 'bucket-1': failed to transfer key value data: failed to transfer key value data: failed to initialise worker 11: failed to get gocbcore DCP agent: agent failed to connect to the cluster: unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"WaitUntilReady","Opaque":"","TimeObserved":60000110502,"RetryReasons":["NOT_READY"],"RetryAttempts":65,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}
This suggest that cbbackupmgr got a timeout when trying to fetch the seqno from Couchbase Server. I believe the seqno number come directly form the underlying storage engine, so this might be a magma issue. The memcached.log will log slow operations, so it be worth check them, will need server logs for that case too.