Details
-
Bug
-
Status: Closed
-
Test Blocker
-
Resolution: Duplicate
-
Cheshire-Cat
-
Untriaged
-
1
-
Unknown
Description
In our performance runs, we saw backup failed because operation has timed out.
Build: 7.0.0-4797
root@ubuntu:/tmp/magma_backup# ./opt/couchbase/bin/cbbackupmgr backup -archive /tmp/backup --repo default --host http://172.23.105.72 -username Administrator --password password --threads 16 --storage sqlite
Backing up to '2021-03-29T11_57_52.872118914-07_00'
Transferring key value data for bucket 'bucket-1' at 0B/s (about 1h59m42s remaining) 0 items / 0B
[== ] 1.06%
Error backing up cluster: operation has timed out
Logs
Server logs:
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.72.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.73.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.78.zip
Attachments
Issue Links
- duplicates
-
GOCBC-1073 DCP can get stuck during closing
-
- Resolved
-
- relates to
-
MB-45320 [Magma] Backup performance tests failed on 7.0.0-4797
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
In our performance runs, we saw backup failed because operation has timed out.
Build: 7.0.0-4797 Server logs: [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.72.zip] [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.73.zip] [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.78.zip] root@ubuntu:/tmp/magma_backup# ./opt/couchbase/bin/cbbackupmgr backup --archive /tmp/backup --repo default --host [http://172.23.105.72|http://172.23.105.72/] --username Administrator --password password --threads 16 --storage sqlite Backing up to '2021-03-29T11_57_52.872118914-07_00' Transferring key value data for bucket 'bucket-1' at 0B/s (about 1h59m42s remaining) 0 items / 0B [== ] 1.06% Error backing up cluster: operation has timed out !Screen Shot 2021-03-29 at 2.04.14 PM.png! |
In our performance runs, we saw backup failed because operation has timed out.
Build: 7.0.0-4797 Server logs: [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.72.zip] [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.73.zip] [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.78.zip] root@ubuntu:/tmp/magma_backup# ./opt/couchbase/bin/cbbackupmgr backup --archive /tmp/backup --repo default --host [http://172.23.105.72|http://172.23.105.72/] --username Administrator --password password --threads 16 --storage sqlite Backing up to '2021-03-29T11_57_52.872118914-07_00' Transferring key value data for bucket 'bucket-1' at 0B/s (about 1h59m42s remaining) 0 items / 0B [== ] 1.06% Error backing up cluster: operation has timed out !Screen Shot 2021-03-29 at 2.04.14 PM.png|width=800,height=250! |
Summary | Backup performance runs hit operation timedout on 7.0.0-2208 | Backup performance runs hit operation timedout on 7.0.0-4797 |
Description |
In our performance runs, we saw backup failed because operation has timed out.
Build: 7.0.0-4797 Server logs: [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.72.zip] [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.73.zip] [https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.78.zip] root@ubuntu:/tmp/magma_backup# ./opt/couchbase/bin/cbbackupmgr backup --archive /tmp/backup --repo default --host [http://172.23.105.72|http://172.23.105.72/] --username Administrator --password password --threads 16 --storage sqlite Backing up to '2021-03-29T11_57_52.872118914-07_00' Transferring key value data for bucket 'bucket-1' at 0B/s (about 1h59m42s remaining) 0 items / 0B [== ] 1.06% Error backing up cluster: operation has timed out !Screen Shot 2021-03-29 at 2.04.14 PM.png|width=800,height=250! |
In our performance runs, we saw backup failed because operation has timed out.
Build: 7.0.0-4797 root@ubuntu:/tmp/magma_backup# ./opt/couchbase/bin/cbbackupmgr backup --archive /tmp/backup --repo default --host [http://172.23.105.72|http://172.23.105.72/] --username Administrator --password password --threads 16 --storage sqlite Backing up to '2021-03-29T11_57_52.872118914-07_00' Transferring key value data for bucket 'bucket-1' at 0B/s (about 1h59m42s remaining) 0 items / 0B [== ] 1.06% Error backing up cluster: operation has timed out !Screen Shot 2021-03-29 at 2.04.14 PM.png|width=800,height=250! +Logs+ Server logs: https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.72.zip https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.73.zip https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2021-03-29T214013-ns_1%40172.23.105.78.zip |
Fix Version/s | CheshireCat.Next [ 16908 ] | |
Fix Version/s | Cheshire-Cat [ 15915 ] |
Component/s | couchbase-bucket [ 10173 ] | |
Component/s | storage-engine [ 10175 ] | |
Component/s | tools [ 10223 ] |
Epic Link |
|
Assignee | Daniel Owen [ owend ] | Bo-Chun Wang [ bo-chun.wang ] |
Summary | Backup performance runs hit operation timedout on 7.0.0-4797 | [Magma] Backup performance runs hit operation timedout on 7.0.0-4797 |
Component/s | tools [ 10223 ] | |
Component/s | couchbase-bucket [ 10173 ] | |
Component/s | storage-engine [ 10175 ] |
Epic Link |
|
Fix Version/s | Cheshire-Cat [ 15915 ] | |
Fix Version/s | CheshireCat.Next [ 16908 ] |
Resolution | Duplicate [ 3 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Link |
This issue duplicates |
Status | Resolved [ 5 ] | Closed [ 6 ] |
Fix Version/s | 7.0.0 [ 17233 ] |
Fix Version/s | Cheshire-Cat [ 15915 ] |
The error in the backup log is
2021-03-29T11:57:58.029-07:00 WARN: (Couchbase) Unexpected error 'operation timed out after 5s' while trying to get sequence numbers, will retry in 5s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:40
2021-03-29T11:58:09.070-07:00 (Plan) (Data) Successfully decided which key value data to transfer for bucket 'bucket-1' | {"number":11,"duration":"16.106882723s"}
2021-03-29T11:58:09.070-07:00 (Plan) (Data) Transferring new key value data for bucket 'bucket-1'
2021-03-29T11:58:09.070-07:00 (REST) (Attempt 1) (GET) Dispatching request to 'http://172.23.105.72:8091/pools/default/buckets/bucket-1'
2021-03-29T11:58:09.079-07:00 (REST) (Attempt 1) (GET) (200) Received response from 'http://172.23.105.72:8091/pools/default/buckets/bucket-1'
2021-03-29T11:58:09.079-07:00 (REST) (Attempt 1) (GET) Dispatching request to 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes'
2021-03-29T11:58:09.083-07:00 (REST) (Attempt 1) (GET) (200) Received response from 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes'
2021-03-29T11:58:09.084-07:00 (REST) (Attempt 1) (POST) Dispatching request to 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes/%40ensureManifest/0'
2021-03-29T11:58:09.090-07:00 (REST) (Attempt 1) (POST) (200) Received response from 'http://172.23.105.72:8091/pools/default/buckets/bucket-1/scopes/%40ensureManifest/0'
2021-03-29T11:59:10.716-07:00 (Cmd) Error backing up cluster: failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket-1': failed to transfer bucket data for bucket 'bucket-1': failed to transfer key value data: failed to transfer key value data: failed to initialise worker 11: failed to get gocbcore DCP agent: agent failed to connect to the cluster: unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"WaitUntilReady","Opaque":"","TimeObserved":60000110502,"RetryReasons":["NOT_READY"],"RetryAttempts":65,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}
This suggest that cbbackupmgr got a timeout when trying to fetch the seqno from Couchbase Server. I believe the seqno number come directly form the underlying storage engine, so this might be a magma issue. The memcached.log will log slow operations, so it be worth check them, will need server logs for that case too.